## Workflow - Pass Completion Rate (Player)

---
> ### 1. SET UP DEVELOPMENT ENVIRONMENT

**1.0 Import required Python software into current development environment (i.e. this notebook)**
```
import pandas as pd
```

**1.1 Configure notebook for code autocompletion + displaying plots + displaying max columns and rows of panda data objects**
```
%config Completer.use_jedi = False
%matplotlib inline
pd.options.display.max_columns = None
pd.options.display.max_rows = None
```

---
> ### 2. LOAD & PREP DATA

**2.0 Data Load** - read in the `match_events.csv` file located in the `data` directory (folder)
```
raw_data = pd.read_csv("data/match_events.csv")
```

**2.1 Data Prep** - make a copy of raw data to work on called `df`

```
df = raw_data.copy()
```

**2.2 Data Prep** - use the `head()` function to check the first 5 rows of the `df` object, which is a `pandas` dataframe (df), basically a 2 dimensional data structure with rows & columns 
```
df.head()
```

**2.3 Data Prep** - check the dimensions of the `df` (<no. of rows>, <no. of columns>). Should be (1854, 18).
```
df.shape
```

---
> ### 3. EXPLORATORY DATA ANALYSIS (EDA)

**3.0 EDA** - select just the `player1` column from `df` and chain the `value_counts()` function onto this to generate a frequency table 
```
df["player1"].value_counts()
```

**3.1 EDA** - copy the code from `3.0` and edit to also select the `event` column as well
```
df[["player1", "event"]].value_counts()
```

**3.2 EDA** - create a new variable (object) called `pass_filter` that can be used to filter the `df` for either `"completed_pass"` and `"incomplete_pass"` events
```
pass_filter = (df["event"] == "completed_pass") | (df["event"] == "incomplete_pass")
```

**3.3 EDA** - use the `pass_filter` to view the `df` filtered for `"completed_pass"` and `"incomplete_pass"` events, also chaining on the `head()` function 
```
df[pass_filter].head()
```

**3.4 EDA** - copy the code in `3.3` and chain a `groupby()` function to this which specifies `player1` and `event` as the columns for grouping the data, also chaining a `size()` function after 
```
df[pass_filter].groupby(["player1", "event"]).size()
```

**3.5 EDA** - copy the code from `3.4` and further chain an `unstack()` function to this
```
df[pass_filter].groupby( ["player1", "event"]).size().unstack()
```

**3.6 EDA** - copy the code from `3.5` and further chain a `sort_values()` function to this, specifying the `completed_pass` as the column to use for sorting, and to sort in descending order, i.e. `ascending = False`
```
df[pass_filter].groupby(["player1", "event"]).size().unstack().sort_values("completed_pass", ascending=False)
```

**3.7 EDA** - copy the code from `3.6` and further chain a `sort_values()` function to this, specifying the `completed_pass` as the column to use for sorting, and to sort in descending order, i.e. `ascending = False`
```
df[pass_filter].groupby(["player1", "event"]).size().unstack().sort_values("completed_pass", ascending=False)
```

---
> ### 4. DATA ANALYSIS & VISUALISATION

**4.0 ANALYSIS/VIZ** - copy the code from `3.7` and chain a `plot()` function onto this, specifying kind as a `"bar"`, and for the bar to be stacked
```
df[pass_filter].groupby(["player1", "event"]).size().unstack().sort_values("completed_pass", ascending=False).plot(kind="bar", stacked=True)
```

**4.1 ANALYSIS** - copy the code from `4.0` but cut the `sort_values()` and `plot()` functions. After the `unstack()` function chain an `assign()` function specified to create a new column called `rate` which that calculates each player's completed pass rate
```
df[ pass_filter ].groupby( ["player1", "event"]).size().unstack().assign(rate = lambda x: x["completed_pass"] / (x["completed_pass"] + x["incomplete_pass"]) * 100).sort_values("rate", ascending=False)
```

In [None]:
http://127.0.0.1:8888/lab?token=5156adb5fccf9f1a2fa930f4b5f7ae53fe517333576b2ad6

---

_Sports Python Educational Project content, licensed under Attribution-NonCommercial-ShareAlike 4.0 International_