## Data Preparation - EFF dataset

The [Player Efficiency (EFF)](https://en.wikipedia.org/wiki/Efficiency_(basketball)) is a basketball statistical measurement created by Martin Manley and similar to [PER](./2-per.ipynb) 

It balances key major player interventions and relates it according to the number of games played.

The formula for EFF is defined in [<a href="ref1">1</a>] as:

\begin{aligned}
& \text{EFF} = \frac{points + rebounds + assists + steals + blocks - \text{fgMissed} - \text{ftMissed} - \text{turnovers}}{GP}
\end{aligned}

The terms above follow the nomenclature defined by the original datasets.

For this case, we will be using a slight variation of the formula, where we replace the number of games played by the number of minutes played.
This is done to avoid the bias of players that play more time than others with the same number of games.
This change proved to be more effective in modeling, as we registered an approximate 9% improvement in the model's accuracy for year 9.

The reason to using this metric is that it includes different statistics than PER, thus it could yield different results.
It can help mitigate some of the issues appointed to PER.
Both metrics will also be tested together.

In [13]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import os

#include utils directory
import sys
sys.path.append('..')

from utils.files import *
DATA_PATH = os.path.join('..', 'data')

from utils.metrics import *


#### Calculating EFF

The following cell calculates the EFF for each player in the dataset.
We then aggregate each teams EFF by either sum or average, as explained in the [PER notebook](./2-per.ipynb).

In [14]:
# Players
players_teams_df = pd.read_csv(os.path.join(DATA_PATH, DATA_PLAYERS_TEAMS))
pt_df = preparePlayersTeamsDf(players_teams_df)

new_pt_df = pd.DataFrame()
for col in ['playerID', 'year', 'tmID']:
    new_pt_df[col] = pt_df[col]

teams_df = pd.read_csv(os.path.join(DATA_PATH, DATA_TEAMS))
teams_df = prepareTeamsDf(teams_df)

getEFF(new_pt_df, pt_df)

display(new_pt_df)

# get the average stats for players the previous year
merged_df = teams_df[['year', 'tmID', 'playoff', 'confID']].copy()
for index, row in merged_df.iterrows():
    merged_df.loc[index, 'EFF'] = new_pt_df[(new_pt_df['year'] == row['year'] - 1) & (new_pt_df['tmID'] == row['tmID'])]['EFF'].sum()

display(merged_df)


Unnamed: 0,playerID,year,tmID,EFF
0,abrossv01w,2,MIN,0.379433
1,abrossv01w,3,MIN,0.277019
2,abrossv01w,4,MIN,0.368177
3,abrossv01w,5,MIN,0.304348
4,abrossv01w,6,MIN,0.332046
...,...,...,...,...
1871,zakalok01w,3,PHO,-0.027027
1872,zarafr01w,6,SEA,0.243736
1873,zellosh01w,10,DET,0.429778
1874,zirkozu01w,4,WAS,0.333333


Unnamed: 0,year,tmID,playoff,confID,EFF
0,9,ATL,N,EA,0.000000
1,10,ATL,Y,EA,4.642656
2,1,CHA,N,EA,0.000000
3,2,CHA,Y,EA,3.983925
4,3,CHA,Y,EA,3.436282
...,...,...,...,...,...
137,6,WAS,N,EA,3.818451
138,7,WAS,Y,EA,3.629630
139,8,WAS,N,EA,4.787053
140,9,WAS,N,EA,4.997584


In [15]:
merged_df['playoff'] = merged_df['playoff'].eq('Y').mul(1)
temp = merged_df['confID'].copy()
merged_df = merged_df.select_dtypes(['number']) # Remove later
merged_df['confID'] = temp
merged_df.dropna(axis=0, inplace=True)
merged_df.head()

print(merged_df.shape)
merged_df.head()

(142, 4)


Unnamed: 0,year,playoff,EFF,confID
0,9,0,0.0,EA
1,10,1,4.642656,EA
2,1,0,0.0,EA
3,2,1,3.983925,EA
4,3,1,3.436282,EA


In [16]:
# Save the result to a new CSV file
merged_df.to_csv(os.path.join(DATA_PATH, DATA_MERGED), index=False)

### References

<a id="ref1"></a> [1] NBA Efficiency Calculator (EFF) – Captain Calculator. (n.d.). Captain Calculator. https://captaincalculator.com/sports/basketball/efficiency/