## Workflow - Possession % (Team)

---
> ### 1. SET UP DEVELOPMENT ENVIRONMENT

**1.0 Import required Python software into current development environment (i.e. this notebook)**
```
import pandas as pd
```

**1.1 Configure notebook for code autocompletion + displaying plots + displaying max columns and rows of panda data objects**
```
%config Completer.use_jedi = False
%matplotlib inline
pd.options.display.max_columns = None
pd.options.display.max_rows = None
```

---
> ### 2. LOAD & PREP DATA

**2.0 Data Load** - read in the `match_events.csv` file located in the `data` directory (folder)
```
raw_data = pd.read_csv("data/match_events.csv")
```

**2.1 Data Prep** - make a copy of raw data to work on called `df`

```
df = raw_data.copy()
```

**2.2 Data Prep** - use the `head()` function to check the first 5 rows of the `df` object, which is a `pandas` dataframe (df), basically a 2 dimensional data structure with rows & columns 
```
df.head()
```

**2.3 Data Prep** - check the dimensions of the `df` (<no. of rows>, <no. of columns>). Should be (1854, 18).
```
df.shape
```

---
> ### 3. EXPLORATORY DATA ANALYSIS (EDA)

**3.0 EDA** - create a new variable (object) called `pass_filter` that can be used to filter the `df` for just the `"completed_pass"` events
```
pass_filter = (df["event"] == "completed_pass")
```

**3.1 EDA** - use the `pass_filter` to view the `df` filtered for just the `"completed_pass"` events, also chaining on the `head()` function 
```
df[pass_filter].head()
```

**3.2 EDA** - copy the code from `3.1` and also specify selecting only the `player1_team` and `event` columns of this pass-filtered `df`
```
df[pass_filter][["player1_team", "event"]].head()
```

**3.3 EDA** - copy the code from `3.2` and chain on another function called `groupby()`, specifying `player1_team` as the column to group the data by, finally chaining a `size()` function after
```
df[pass_filter][["player1_team", "event"]].groupby("player1_team").size()
```

---
> ### 4. DATA ANALYSIS & VISUALISATION

**4.0 VIZ** - copy the code from `3.3` and further chain a function called `plot()`, specifying the kind of plot as a `pie` and automatic % calculation in the format `%.0f%%`
```
df[pass_filter][["player1_team", "event"]].groupby("player1_team").size().plot(kind="pie", autopct="%.0f%%")
```

**4.1 VIZ** - copy the code from `4.0` and specify an additional input in the `plot()` function, namely the `colors` of the plot as a list object (`[]`) with 2 named colors that will be applied to the 2 categories in alphabetical order | TIP: Check out the range of official named colors you can use with matplotlib https://matplotlib.org/stable/gallery/color/named_colors.html#css-colors

```
df[pass_filter][["player1_team", "event"]].groupby("player1_team").size().plot(kind="pie", autopct="%.0f%%", colors=["red", "blue"])
```

---

_Sports Python Educational Project content, licensed under Attribution-NonCommercial-ShareAlike 4.0 International_