# Init Dataframe for football stats match prediction

Used to init our dataframe with the data collected, and export a csv file.

***Authors***: Arradi Naoufal, Bienfait Méo, Bouchakour Younes, Sergent Pierre-Louis, Tadjer Badr

### Columns

**Shots**:
- total
- ongoal
- offgoal
- insidebox
- outsidebox

**Passes**:
- total
- percentage(of completed pass)

**Attacks**:
- total
- dangerous

**Others**:
- Fouls
- Corners
- Possession_time
- Yellow_cards
- Red_cards
- Saves
- Substitutions
- Tackles
- Penalties
- Injuries

**Output**: win = 1, draw = 0, loss = -1
### About data

We will use all the games of season 2020/2021 of the Scotland football league. In order to have relevant data for our model, we will get all the stats of each game and then we will compare them by making the difference between the two teams to keep track of the dominance of each team in specific areas. Then we'll be able to train our data to predict winners based on the stats of each games and establish the most important metrics to win a game of football.

**Example**:
||round|passes|shots|attacks|penalties|red_cards|output|
|-|-|-|-|-|-|-|-|
|team a|1|200|3|10|0|0|1|
|team b|1|-200|-3|-10|0|0|-1|

Here the *team a* did 200 **more** passes than his opponent, and had 3 **more** shots.
The *team b* did 200 **less** passes than his opponent, and had 3 **less** shots.



In [1]:
import pandas as pd

from sport_monks_api import SportmonksAPI

data = SportmonksAPI().rows_data

Api call: page1 for rounds/season/17141
Api call: page1 for fixtures/between/2020-08-01/2021-05-16
Api call: page2 for fixtures/between/2020-08-01/2021-05-16
Api call: page3 for fixtures/between/2020-08-01/2021-05-16


[[273,
  194968,
  -11,
  -2,
  -9,
  -4,
  -6,
  -202,
  -10.469999999999999,
  -18,
  -15,
  6,
  -4,
  -22,
  1,
  1,
  2,
  3,
  -8,
  None,
  None,
  -1],
 [62,
  194968,
  11,
  2,
  9,
  4,
  6,
  202,
  10.469999999999999,
  18,
  15,
  -6,
  4,
  22,
  -1,
  -1,
  -2,
  -3,
  8,
  None,
  None,
  1],
 [496,
  194968,
  -5,
  0,
  -5,
  -1,
  -5,
  -44,
  -5.460000000000001,
  -42,
  -17,
  None,
  -3,
  -8,
  1,
  None,
  2,
  -1,
  -6,
  None,
  None,
  1],
 [258,
  194968,
  5,
  0,
  5,
  1,
  5,
  44,
  5.460000000000001,
  42,
  17,
  None,
  3,
  8,
  -1,
  None,
  -2,
  1,
  6,
  None,
  None,
  -1],
 [282,
  194968,
  4,
  -2,
  6,
  3,
  2,
  201,
  16.080000000000005,
  27,
  2,
  2,
  0,
  22,
  -3,
  None,
  0,
  0,
  -2,
  1,
  None,
  0],
 [734,
  194968,
  -4,
  2,
  -6,
  -3,
  -2,
  -201,
  -16.080000000000005,
  -27,
  -2,
  -2,
  0,
  -22,
  3,
  None,
  0,
  0,
  2,
  -1,
  None,
  0],
 [66,
  194968,
  -2,
  2,
  -4,
  2,
  -3,
  -78,
  -8.39,
  -62,
  -9,

In [2]:
columns = ["team_ids", "round_ids", "shots_total", "shots_ongoal", "shots_offgoal", "shots_insidebox", "shots_outsidebox",
"passes_total", "passes_percentage", "attacks_total", "attacks_dangerous", "fouls", "corners",
"possession_time", "yellow_cards", "red_cards", "saves", "substitutions", "tackles", "penalties", "injuries", "results"]

df = pd.DataFrame(data=data, columns=columns)

In [3]:
df

Unnamed: 0,team_ids,round_ids,shots_total,shots_ongoal,shots_offgoal,shots_insidebox,shots_outsidebox,passes_total,passes_percentage,attacks_total,...,corners,possession_time,yellow_cards,red_cards,saves,substitutions,tackles,penalties,injuries,results
0,273,194968,-11,-2.0,-9.0,-4.0,-6.0,-202,-10.47,-18,...,-4.0,-22,1.0,1.0,2.0,3.0,-8,,,-1
1,62,194968,11,2.0,9.0,4.0,6.0,202,10.47,18,...,4.0,22,-1.0,-1.0,-2.0,-3.0,8,,,1
2,496,194968,-5,0.0,-5.0,-1.0,-5.0,-44,-5.46,-42,...,-3.0,-8,1.0,,2.0,-1.0,-6,,,1
3,258,194968,5,0.0,5.0,1.0,5.0,44,5.46,42,...,3.0,8,-1.0,,-2.0,1.0,6,,,-1
4,282,194968,4,-2.0,6.0,3.0,2.0,201,16.08,27,...,0.0,22,-3.0,,0.0,0.0,-2,1.0,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
451,180,240678,-3,,-5.0,1.0,-2.0,-123,-14.10,12,...,-2.0,-14,3.0,,-1.0,-1.0,8,,,1
452,309,240678,-3,-2.0,-1.0,-3.0,0.0,28,-0.36,-8,...,-1.0,2,0.0,,,0.0,-2,,,-1
453,246,240678,3,2.0,1.0,3.0,0.0,-28,0.36,8,...,1.0,-2,0.0,,,0.0,2,,,1
454,496,240678,1,-2.0,3.0,-4.0,6.0,121,12.75,26,...,2.0,18,-1.0,,1.0,0.0,2,,,0


In [4]:
df.to_csv("./data.csv")