## Replicating FIFA Football Intelligence - Passing Networks (Players within a Team)

---
> ### 1. SET UP DEVELOPMENT ENVIRONMENT

**1.0 Import required Python software into current development environment (i.e. this notebook)**
```
import pandas as pd
```

In [19]:
import pandas as pd

**1.1 Configure notebook for code autocompletion + displaying plots + displaying max & min rows of panda data objects**
```
%config Completer.use_jedi = False
%matplotlib inline
pd.options.display.max_rows, pd.options.display.min_rows = 20, 20
```

In [20]:
%config Completer.use_jedi = False
%matplotlib inline
pd.options.display.max_rows, pd.options.display.min_rows = 20, 20

---
> ### 2. LOAD & CHECK THE FOOTBALL DATA

**2.0** Read in the `match_data.csv` file located in the `data` directory (folder):
```
raw_data = pd.read_csv("data/match_data.csv")
```

In [21]:
raw_data = pd.read_csv("data/match_data.csv")

**2.1** Make a copy of raw data to work on called `df`:

```
df = raw_data.copy()
```

In [22]:
df = raw_data.copy()

**2.2** View the `df` object, which is a `pandas` dataframe (df), basically a tabular, 2 dimensional data structure with rows & columns:
```
df
```

In [23]:
df

Unnamed: 0,start_min,start_sec,end_min,end_sec,match_half,player1,player1_team,player2,player2_team,event,event_detail,press,start_x,start_y,end_x,end_y,press_x,press_y
0,0,0,0.0,0.0,1,fernandes,man_u,,,set_piece,kickoff,,52.5,34.0,,,,
1,0,0,0.0,1.0,1,fernandes,man_u,ronaldo,man_u,completed_pass,,,52.5,34.0,55.8,31.3,,
2,0,1,0.0,2.0,1,ronaldo,man_u,matic,man_u,completed_pass,,odegaard,55.8,31.3,65.6,33.4,55.7,30.4
3,0,2,0.0,3.0,1,matic,man_u,dalot,man_u,completed_pass,,nketiah,65.6,33.4,72.2,55.2,63.4,36.4
4,0,3,0.0,5.0,1,dalot,man_u,,,incomplete_pass,attempted_longball,,72.2,55.2,,,,
5,0,5,,,1,gabriel,arsenal,,,won_intercepted_longball,intercepted_longball,,29.7,43.0,,,,
6,0,6,,,1,gabriel,arsenal,,,lost_miscontrolled,,,29.7,44.0,,,,
7,0,7,,,1,elneny,arsenal,,,recovered_looseball,,fernandes,37.6,40.0,,,37.0,40.0
8,0,8,0.0,10.0,1,elneny,arsenal,cedric,arsenal,completed_pass,,fernandes,37.6,40.0,28.8,18.3,37.4,42.6
9,0,10,0.0,14.0,1,cedric,arsenal,,,incomplete_pass,,sancho,28.8,18.3,,,31.5,13.0


**2.3** Check the dimensions of the `df` (<no. of rows>, <no. of columns>), should be (1912, 18):
```
df.shape
```

In [24]:
df.shape

(1912, 18)

---
> ### 3. PREP DATA FOR GENERATING THE PASSING NETWORKS

**3.0** Have a look at what's in the `event` column:
```
df["event"]
```

In [25]:
df["event"]

0                      set_piece
1                 completed_pass
2                 completed_pass
3                 completed_pass
4                incomplete_pass
5       won_intercepted_longball
6             lost_miscontrolled
7            recovered_looseball
8                 completed_pass
9                incomplete_pass
                  ...           
1902              completed_pass
1903              completed_pass
1904              completed_pass
1905                     dribble
1906              completed_pass
1907              completed_pass
1908              completed_pass
1909                     dribble
1910              completed_pass
1911                     dribble
Name: event, Length: 1912, dtype: object

**3.1** Use the `value_counts()` function to count how many of each type of event is in the `event` column:
```
df["event"].value_counts()
```

In [26]:
df["event"].value_counts()

completed_pass                  795
dribble                         338
incomplete_pass                 126
recovered_looseball             105
set_piece                        76
won_pressured_opposition         47
clearance                        42
lost_miscontrolled               34
goal_attempt                     27
duel_aerial_lost                 25
                               ... 
penalty                           3
out_after_last_touch              3
ball_out_of_bounds                3
won_5050                          3
touch                             2
duel_aerial_draw                  2
challenge_aerial_ineffective      1
tackle                            1
offside                           1
lost_attempted_cross              1
Name: event, Length: 45, dtype: int64

**3.2** For analysing Passing Networks we're only interested in successful passes, so let's start to see how we filter the data just for these by first seeing which rows in the `event` column contain the text string `"completed_pass"`:
```
df["event"] == "completed_pass"
```

In [27]:
df["event"] == "completed_pass"

0       False
1        True
2        True
3        True
4       False
5       False
6       False
7       False
8        True
9       False
        ...  
1902     True
1903     True
1904     True
1905    False
1906     True
1907     True
1908     True
1909    False
1910     True
1911    False
Name: event, Length: 1912, dtype: bool

**3.3** Let's use this True or False filter to create a subset of the full match data just with the rows/events representing a `"completed_pass"`. Save down this subset as a new variable called `"completed_passes"`:
```
completed_passes = df[  df["event"] == "completed_pass" ].copy()
```

In [28]:
completed_passes = df[ df["event"] == "completed_pass" ].copy()

**3.4** Check what's in the new `"completed_passes"` data:
```
completed_passes
```

In [29]:
completed_passes

Unnamed: 0,start_min,start_sec,end_min,end_sec,match_half,player1,player1_team,player2,player2_team,event,event_detail,press,start_x,start_y,end_x,end_y,press_x,press_y
1,0,0,0.0,1.0,1,fernandes,man_u,ronaldo,man_u,completed_pass,,,52.5,34.0,55.8,31.3,,
2,0,1,0.0,2.0,1,ronaldo,man_u,matic,man_u,completed_pass,,odegaard,55.8,31.3,65.6,33.4,55.7,30.4
3,0,2,0.0,3.0,1,matic,man_u,dalot,man_u,completed_pass,,nketiah,65.6,33.4,72.2,55.2,63.4,36.4
8,0,8,0.0,10.0,1,elneny,arsenal,cedric,arsenal,completed_pass,,fernandes,37.6,40.0,28.8,18.3,37.4,42.6
18,0,28,0.0,30.0,1,cedric,arsenal,white,arsenal,completed_pass,,,57.6,7.6,56.1,10.2,,
19,0,30,0.0,32.0,1,white,arsenal,gabriel,arsenal,completed_pass,,,56.1,10.2,47.6,35.7,,
21,0,33,0.0,35.0,1,gabriel,arsenal,tavares,arsenal,completed_pass,,,48.7,37.7,67.9,66.2,,
22,0,37,0.0,38.0,1,tavares,arsenal,xhaka,arsenal,completed_pass,,elanga,67.9,66.2,62.5,62.1,67.9,66.0
24,0,40,0.0,41.0,1,xhaka,arsenal,gabriel,arsenal,completed_pass,,,57.4,62.5,37.8,62.3,,
25,0,42,0.0,44.0,1,gabriel,arsenal,white,arsenal,completed_pass,,,37.8,62.3,33.6,25.8,,


**3.5** Choose one of the teams `"arsenal"` or `"man_u"` to create the Passing Network for and store this in a new variable called `team`:
```
team = "arsenal"
```

In [30]:
team = "arsenal"

**3.6** Have a look at which of the rows in the `"player1_team"` column of `completed_passes` are equal to the value of your `team` variable:
```
completed_passes["player1_team"] == team
```

In [31]:
completed_passes["player1_team"] == team

1       False
2       False
3       False
8        True
18       True
19       True
21       True
22       True
24       True
25       True
        ...  
1896     True
1900    False
1901    False
1902    False
1903    False
1904    False
1906    False
1907    False
1908    False
1910    False
Name: player1_team, Length: 795, dtype: bool

**3.7** Create a new variable called `team_passes` containing just the rows from the `completed_passes` data where the value in the `"player1_team"` column is the same as the value of your `team` variable, i.e. either `"arsenal"` or `"man_u"`:
```
team_passes = completed_passes[completed_passes["player1_team"] == team].copy()
```

In [32]:
team_passes = completed_passes[ completed_passes["player1_team"] == team ] .copy()

**3.8** Check `"team_passes"` to see if the additional filter has worked as expected:
```
team_passes
```

In [33]:
team_passes

Unnamed: 0,start_min,start_sec,end_min,end_sec,match_half,player1,player1_team,player2,player2_team,event,event_detail,press,start_x,start_y,end_x,end_y,press_x,press_y
8,0,8,0.0,10.0,1,elneny,arsenal,cedric,arsenal,completed_pass,,fernandes,37.6,40.0,28.8,18.3,37.4,42.6
18,0,28,0.0,30.0,1,cedric,arsenal,white,arsenal,completed_pass,,,57.6,7.6,56.1,10.2,,
19,0,30,0.0,32.0,1,white,arsenal,gabriel,arsenal,completed_pass,,,56.1,10.2,47.6,35.7,,
21,0,33,0.0,35.0,1,gabriel,arsenal,tavares,arsenal,completed_pass,,,48.7,37.7,67.9,66.2,,
22,0,37,0.0,38.0,1,tavares,arsenal,xhaka,arsenal,completed_pass,,elanga,67.9,66.2,62.5,62.1,67.9,66.0
24,0,40,0.0,41.0,1,xhaka,arsenal,gabriel,arsenal,completed_pass,,,57.4,62.5,37.8,62.3,,
25,0,42,0.0,44.0,1,gabriel,arsenal,white,arsenal,completed_pass,,,37.8,62.3,33.6,25.8,,
31,0,53,0.0,54.0,1,cedric,arsenal,saka,arsenal,completed_pass,possession_regain,sancho,64.8,1.0,79.7,4.7,69.0,2.0
45,1,11,1.0,14.0,1,xhaka,arsenal,cedric,arsenal,completed_pass,,mctominay,47.6,33.8,36.9,16.4,48.6,33.0
47,1,20,1.0,21.0,1,cedric,arsenal,white,arsenal,completed_pass,,,46.5,15.8,41.4,18.5,,


**Question** - how many completed passes did each team make in this match?


---
> ### 4. GENERATE THE PASSING NETWORKS

**4.0** Create a Passing Network for your chosen team by calling the `pd.crosstab()` function and giving the function 2x inputs, first the `team_passes["player1"]` column, and second the `team_passes["player2"]` column:
```
pd.crosstab( team_passes["player1"], team_passes["player2"]  )
```

In [34]:
pd.crosstab(team_passes["player1"], team_passes["player2"])   

player2,cedric,elneny,gabriel,holding,martinelli,nketiah,odegaard,ramsdale,saka,smith_rowe,tavares,tomiyasu,white,xhaka
player1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
cedric,0,9,0,1,0,1,14,1,6,0,0,0,16,7
elneny,13,0,7,0,0,3,12,1,3,3,2,3,2,15
gabriel,2,4,0,1,0,2,1,7,1,1,8,0,5,8
holding,1,0,1,0,0,0,0,0,0,0,0,0,0,0
martinelli,0,1,0,0,0,1,1,0,1,0,1,0,0,1
nketiah,2,2,0,0,1,0,4,0,4,1,1,0,0,4
odegaard,8,10,3,0,3,5,0,1,9,1,2,1,2,4
ramsdale,0,4,3,0,0,0,1,0,1,1,0,0,15,1
saka,11,5,0,0,0,3,4,0,0,1,0,0,1,1
smith_rowe,0,2,2,0,0,2,2,1,1,0,3,0,1,5


**4.1** Further customise this function call by using the `normalize=` parameter, which will return values as a proportion of e.g. all the values in the matrix, each row, or each column, by passing this parameter `"all"`, `"index"`, or `"columns"` respectively. Chain on the `round()` function with the input integer `3` to specify rounding the values to 3 decimal places, and then multiply by 100 to display as percentage points:

```
pd.crosstab(team_passes["player1"], team_passes["player2"], normalize="all").round(3)*100  

Extra options:
-"index", "columns"

```

In [39]:
pd.crosstab(team_passes["player1"], team_passes["player2"], normalize="all").round(3)*100  

player2,cedric,elneny,gabriel,holding,martinelli,nketiah,odegaard,ramsdale,saka,smith_rowe,tavares,tomiyasu,white,xhaka
player1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
cedric,0.0,2.0,0.0,0.2,0.0,0.2,3.1,0.2,1.3,0.0,0.0,0.0,3.5,1.5
elneny,2.9,0.0,1.5,0.0,0.0,0.7,2.7,0.2,0.7,0.7,0.4,0.7,0.4,3.3
gabriel,0.4,0.9,0.0,0.2,0.0,0.4,0.2,1.5,0.2,0.2,1.8,0.0,1.1,1.8
holding,0.2,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
martinelli,0.0,0.2,0.0,0.0,0.0,0.2,0.2,0.0,0.2,0.0,0.2,0.0,0.0,0.2
nketiah,0.4,0.4,0.0,0.0,0.2,0.0,0.9,0.0,0.9,0.2,0.2,0.0,0.0,0.9
odegaard,1.8,2.2,0.7,0.0,0.7,1.1,0.0,0.2,2.0,0.2,0.4,0.2,0.4,0.9
ramsdale,0.0,0.9,0.7,0.0,0.0,0.0,0.2,0.0,0.2,0.2,0.0,0.0,3.3,0.2
saka,2.4,1.1,0.0,0.0,0.0,0.7,0.9,0.0,0.0,0.2,0.0,0.0,0.2,0.2
smith_rowe,0.0,0.4,0.4,0.0,0.0,0.4,0.4,0.2,0.2,0.0,0.7,0.0,0.2,1.1


**4.2 OPTIONAL EXTENSION** Save this Passing Network as a csv by first storing in a new variable, e.g. `matrix`, and then using the new variable's `"to_csv()"` function to create a new csv file:

```
matrix = pd.crosstab(team_passes["player1"], team_passes["player2"])
matrix.to_csv("FIFAIntel_matrix.csv")
```

In [36]:
matrix = pd.crosstab(team_passes["player1"], team_passes["player2"])
#matrix.to_csv("FIFAIntel_matrix.csv")

---

_Sports Python Educational Project content, licensed under Attribution-NonCommercial-ShareAlike 4.0 International_