## Replicating FIFA Football Intelligence - Goalkeeping Distributions (Player-level)
**Compare how De Gea and Ramsdale distribute the ball**

---
> ### 1. SET UP DEVELOPMENT ENVIRONMENT

**1.0 Import required Python software into current development environment (i.e. this notebook)**
```
import pandas as pd
import matplotlib.pyplot as plt
from PIL import Image
```

**1.1 Configure notebook for code autocompletion + displaying plots + displaying max columns and rows of panda data objects**
```
%config Completer.use_jedi = False
%matplotlib inline
pd.options.display.max_rows, pd.options.display.min_rows = 20, 20
```

---
> ### 2. LOAD & PREP DATA

**2.0** Read in the `match_data.csv` file located in the `data` directory (folder)
```
raw_data = pd.read_csv("data/match_data.csv")
```

**2.1** Make a copy of raw data to work on called `df`

```
df = raw_data.copy()
```

**2.2** View the `df` object. This is a `pandas` dataframe object, basically data in a table so has rows and columns like an Excel spreadsheet:
```
df
```

**2.3** Check the dimensions of the `df` object (<no. of rows>, <no. of columns>) - it should be (1912, 18):
```
df.shape
```

**2.4** Load the `pitch.png` graphic located in the `data` directory (folder) and store in a variable called `pitch`
```
pitch = Image.open("data/pitch.png")
```

**2.5** Check the `pitch` object using `imshow()` function available from the `matplotlib` plotting library:
```
plt.imshow(pitch)
```

---
> ### 3.PREP DATA FOR GENERATING THE GOALIE DISTRIBUTION VISUALISATIONS

**3.0** Generate a list of all the players, i.e. the text strings, contained in the `player1` column in the `df` data using the `unique()` function:
```
df["player1"].unique()
```

**3.1** Create a new variable called `goalie` that contains the text string representing either goalie, i.e. `"de_gea"` or `"ramsdale"`:
```
goalie = "de_gea"
```

**3.2** Create a new variable called `goalie_df` that contains just the rows from the `df` data where the text in the `"player1"` column is the same as the text in the `goalie` variable, i.e. contains `"de_gea"` or `"ramsdale"`:
```
goalie_df = df[df["player1"] == goalie].copy()
```

**3.3** Check the `goalie_df` cut of data to see if the filter for a specific goalie has worked as expected:
```
goalie_df
```

**3.4** Have a look at what's in the `event` column of the `goalie_df` data by using the `value_counts()` function to show the different types of events in the data, and how many rows there are of each event type: 
```
goalie_df["event"].value_counts()
```

**3.5** Make a cut of the `goalie_df` that only contains the data on the goalie's ball distribution by filtering `goalie_df` for the rows which have `"completed_pass"`, `"incomplete_pass"`, or `"clearance"` in the `event` column, and saving this down as a new variable called `dist`:
```
dist = goalie_df[ goalie_df["event"].isin(["completed_pass", "incomplete_pass", "clearance"])].copy()
```

**3.6** Check the `dist` cut of data to see if the additional filtering for just the goalie's ball distributions has worked as expected:
```
dist
```

**3.7** Use the `drop()` function to remove a selection of unnecessary columns based on their index position in the dataframe - specifying `inplace=True` is critical for making the drop stick rather than just being a temporary view
```
dist.drop( dist.columns[[ 0,1,2,3,4,6,8,11,16,17 ]], axis=1, inplace=True)
```

**3.8** Check `dist` again to see if the unnecessary columns got dropped:
```
dist
```

**3.9** Use the `matplotlib` plotting library to test how to make a basic plot of an arrow on our pitch graphic, `pitch` using the `arrow()` function and test data:
```
fig, ax = plt.subplots()
ax.imshow(pitch, extent=[0, 105, 0, 68])
plt.arrow(x=52.5, y=34, dx=20, dy=10, width=0.5)
```

**3.10** Create 2x new columns in the `dist` data called `dx` and `dy` that are the calculated by taking away the `start_x` column from the `end_x` column, and taking away the `start_y` column from the`end_y` column respectively:
```
dist["dx"] = dist["end_x"] - dist["start_x"]
dist["dy"] = dist["end_y"] - dist["start_y"]
```

**3.11** Check `dist` again to make sure these new columns were made correctly:
```
dist
```

---
> ### 4. GENERATE THE GOALIE DISTRIBUTION VISUALISATIONS

**4.0** Use the `matplotlib` plotting library again this time to create arrows and markers that plot on the `pitch` graphic the start and end location of each the distribution actions taken by the chosen goalie. Using a `for loop` and the `iterrows()` function the distribution will be coloured maroon if they are `"completed_passes"` or turquoise if not/anything else, i.e. `"incomplete_pass"`, and `"clearance"`. 

TIP: Check out the range of official named colors you can use with matplotlib https://matplotlib.org/stable/gallery/color/named_colors.html#css-colors

```
fig, ax = plt.subplots(figsize=(12,8))
plt.axis( [0,105,0,68])
ax.imshow(pitch, extent=[0,105,0,68])

for index, row in dist.iterrows():
    if row["event"] == "completed_pass":
        plt.arrow(x=row["start_x"], y = row["start_y"], dx= row["dx"], dy=row["dy"], color="maroon", head_width=1, lw=2)
        plt.scatter(x= row["start_x"] , y= row["start_y"], s=200, c="maroon", edgecolors="white", lw=1.5)
    else:
        plt.arrow(x=row["start_x"], y = row["start_y"], dx= row["dx"], dy=row["dy"], color="darkturquoise", head_width=1, lw=2)
        plt.scatter(x= row["start_x"] , y= row["start_y"], s=200, c="darkturquoise", edgecolors="white", lw=1.5)
```
Extra options:    
-plt.tight_layout()   
-plt.savefig("FIFAIntel_GoalieDist.png")

**4.1 OPTIONAL/EXTENSION** Create & run function to make visualisation of Ramsdale's distribution
```
def ramsdale_dist():
    import pandas as pd
    import matplotlib.pyplot as plt
    from PIL import Image
    
    df = pd.read_csv("data/match_data.csv")
    pitch = Image.open("data/pitch.png")
    goalie_df = df[df["player1"] == "ramsdale"].copy()
    dist = goalie_df[ goalie_df["event"].isin(["completed_pass", "incomplete_pass", "clearance"])].copy()
    dist["dx"] = dist["end_x"] - dist["start_x"]
    dist["dy"] = dist["end_y"] - dist["start_y"]
    
    fig, ax = plt.subplots(figsize=(12,8))
    plt.axis( [0,105,0,68])
    ax.imshow(pitch, extent=[0,105,0,68])
    
    for index, row in dist.iterrows():
        if row["event"] == "completed_pass":
            plt.arrow(x=row["start_x"], y = row["start_y"], dx= row["dx"], dy=row["dy"], color="maroon", head_width=1, lw=2)
            plt.scatter(x= row["start_x"] , y= row["start_y"], s=200, c="maroon", edgecolors="white", lw=1.5)
        else:
            plt.arrow(x=row["start_x"], y = row["start_y"], dx= row["dx"], dy=row["dy"], color="darkturquoise", head_width=1, lw=2)
            plt.scatter(x= row["start_x"] , y= row["start_y"], s=200, c="darkturquoise", edgecolors="white", lw=1.5)
    plt.tight_layout()
    plt.show()
```

`ramsdale_dist()`

---

_Sports Python Educational Project content, licensed under Attribution-NonCommercial-ShareAlike 4.0 International_