# Player Evaluation
This Notebook aims to evaluate overall player productivity, both offensively and defensivly. (By following Dr. Stephen Shea's Basketball Analytics)

Player data is downloaded at https://www.basketball-reference.com


**Problem Statement**: Many player metrics emphasize quantity or quality or value offense over defense. We want a more complete measure of a player's offensive and defensive ability at the same time, one that captures both quality and quantity. This is not to say that the final number is a single statistitc that can tell us everything about a player
### Assumptions
1. Taking more shots does not necessarily help your team. Taking and making more shots does
2. Quality is not independent of quantity
3. There is value in passing off a difficult shot to set up a teammate
4. A turnover is at least as bad as a missed field goal
5. Offensive rebounds extend offensive possessions, and their value in doing so should be weighted appropriately against activities that end possessions (such as a turnover)

## Formula
#### See Reference for acronyms
AV = DPS + EOP

where  
$DPS = 2 * DSG$  
$EOP = G * (0.76*Assists + Points) * OE$  
  
$OE = \frac{FG+A}{FGA-ORB+A+TO}$  
  
$DSG = -(0.82 * 0.735 * EFG\%[Net]) - (0.42 * ORB\%[Net]) + (1.06 * TO\%[Net])$

## Reference
AV = Approximate Value  
DPS = Defensive Points Saved  
EOP = Efficient Offensive Production  
DSG = Defensive Stops Gained  
OE = Offensive Efficiency

=============================================================================  
We want a more complete measure of offensive efficiency, one that captures both quality and quantity.  
### Assumptions
1. Taking more shots does not necessarily help your team. Taking and making more shots does
2. Quality is not independent of quantity
3. There is value in passing off a difficult shot to set up a teammate
4. A turnover is at least as bad as a missed field goal
5. Offensive rebounds extend offensive possessions, and their value in doing so should be weighted appropriately against activities that end possessions (such as a turnover)

In [3]:
# load libraries
import pandas as pd
import numpy as np

In [4]:
# load data
# 2017-18
playerPerGame17 = pd.read_csv('2017_18_PlayerPerGame.csv')
playerAdvanced17 = pd.read_csv('2017_18_PlayerAdvanced.csv')
playerTotal17 = pd.read_csv('2017_18_PlayerTotals.csv')
player36Mins17 = pd.read_csv('2017_18_PlayerPer36Minutes.csv')
player100Poss17 = pd.read_csv('2017_18_PlayerPer100Poss.csv')

# 2016-17
playerPerGame16 = pd.read_csv('2016_17_PlayerPerGame.csv')
playerAdvanced16 = pd.read_csv('2016_17_PlayerAdvanced.csv')
playerTotal16 = pd.read_csv('2016_17_PlayerTotals.csv')
player36Mins16 = pd.read_csv('2016_17_PlayerPer36Minutes.csv')
player100Poss16 = pd.read_csv('2016_17_PlayerPer100Poss.csv')

# 2015-16
playerPerGame15 = pd.read_csv('2015_16_PlayerPerGame.csv')
playerAdvanced15 = pd.read_csv('2015_16_PlayerAdvanced.csv')
playerTotal15 = pd.read_csv('2015_16_PlayerTotals.csv')
player36Mins15 = pd.read_csv('2015_16_PlayerPer36Minutes.csv')
player100Poss15 = pd.read_csv('2015_16_PlayerPer100Poss.csv')

playerTotal12 = pd.read_csv('2012_13_PlayerTotals.csv')

### Data Preparation
Before the actual analysis, we need to clean and prepare the data.  
There are x steps here:  
1. Remove duplicates
2. Remove players who played a total of less than 20 games in a season  
  
I am going to use data from the per game and advanced table

In [5]:
# a nice thing about these data from basektball-reference.com is that
# it already calculated overall (aggregated) statistics for players
# who played for more than one team in that season
playerPerGame15 = playerPerGame15.drop_duplicates(['Player'], keep="first")
playerPerGame16 = playerPerGame16.drop_duplicates(['Player'], keep="first")
playerPerGame17 = playerPerGame17.drop_duplicates(['Player'], keep="first")

playerAdvanced15 = playerAdvanced15.drop_duplicates(['Player'], keep="first")
playerAdvanced16 = playerAdvanced16.drop_duplicates(['Player'], keep="first")
playerAdvanced17 = playerAdvanced17.drop_duplicates(['Player'], keep="first")

playerTotal15 = playerTotal15.drop_duplicates(['Player'], keep="first")
playerTotal16 = playerTotal16.drop_duplicates(['Player'], keep="first")
playerTotal17 = playerTotal17.drop_duplicates(['Player'], keep="first")

In [6]:
# keep only players who played at least 20 games and 15 minutes a game
min_games = 20
min_minutes = 15 * min_games

In [7]:
playerValue15 = playerTotal15[(playerTotal15.G >= min_games) & (playerTotal15.MP >= min_minutes)].copy()
playerValue16 = playerTotal16[(playerTotal16.G >= min_games) & (playerTotal16.MP >= min_minutes)].copy()
playerValue17 = playerTotal17[(playerTotal17.G >= min_games) & (playerTotal17.MP >= min_minutes)].copy()
#playerTotal12_filtered = playerTotal12[(playerTotal12.G >= min_games) & (playerTotal12.MP >= min_minutes)].copy()

In [8]:
# replace all NaN elements with 0
playerValue15.fillna(0, inplace = True)
playerValue16.fillna(0, inplace = True)
playerValue17.fillna(0, inplace = True)
#playerTotal12_filtered.fillna(0, inplace = True)

In [9]:
# calcualte players Offensive Efficiency
def getOE(player):
    return (player["FG"]+player["AST"]) / (player["FGA"]-player["ORB"]+player["AST"]+player["TOV"])

# calculate players Efficient Offensive Production
def getRawEOP(player):
    return (0.76*player["AST"] + player["PTS"]) * player["OE"]

In [10]:
playerTotal15_filtered.columns

NameError: name 'playerTotal15_filtered' is not defined

In [32]:
# add new column OE
playerValue15 = playerValue15.assign(OE = lambda x: getOE(x))
playerValue16 = playerValue16.assign(OE = lambda x: getOE(x))
playerValue17 = playerValue17.assign(OE = lambda x: getOE(x))

In [33]:
# add new column EOP
playerValue15 = playerValue15.assign(EOP = lambda x: getRawEOP(x))
playerValue16 = playerValue16.assign(EOP = lambda x: getRawEOP(x))
playerValue17 = playerValue17.assign(EOP = lambda x: getRawEOP(x))

In [34]:
playerValue15.sort_values(by="PTS", ascending = False).head(10)

Unnamed: 0,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,...,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,OE,EOP
205,177,James Harden\hardeja01,SG,26,HOU,82,82,3125,710,1617,...,438,501,612,139,51,374,229,2376,0.520472,1478.724661
116,105,Stephen Curry\curryst01,PG,27,GSW,79,79,2700,805,1598,...,362,430,527,169,15,262,161,2375,0.574386,1594.218473
137,126,Kevin Durant\duranke01,SF,27,OKC,72,72,2578,698,1381,...,544,589,361,69,85,250,137,2029,0.543914,1252.829091
264,221,LeBron James\jamesle01,SF,31,CLE,76,76,2709,737,1416,...,454,565,514,104,49,249,143,1920,0.604932,1397.780774
325,266,Damian Lillard\lillada01,PG,25,POR,75,75,2676,618,1474,...,257,302,512,65,28,242,165,1879,0.517636,1174.0612
553,452,Russell Westbrook\westbru01,PG,27,OKC,80,80,2750,656,1444,...,481,626,834,163,20,342,200,1878,0.60202,1512.178424
175,153,Paul George\georgpa01,SF,25,IND,81,81,2819,605,1449,...,484,563,329,152,29,265,230,1874,0.47556,1010.108635
126,115,DeMar DeRozan\derozde01,SG,26,TOR,78,78,2804,614,1377,...,285,349,315,81,21,175,167,1830,0.515252,1066.263228
510,421,Isaiah Thomas\thomais02,PG,26,BOS,82,79,2644,591,1382,...,197,243,509,91,9,220,167,1823,0.532688,1177.154479
516,425,Klay Thompson\thompkl01,SG,25,GSW,80,80,2666,651,1386,...,271,306,166,60,49,138,152,1771,0.493656,936.543637


Because of how EOP is defined, point guards in general are rated more highly than big men as they generate more offense than the typical big men.  
Big mean usually contribute to team success through their stellar interior defense. Next, we will look at DSG (Defensive Stops Gained) to accurately assess players from another aspect.

In [35]:
playerAdvanced15.columns

Index(['Rk', 'Player', 'Pos', 'Age', 'Tm', 'G', 'MP', 'PER', 'TS%', '3PAr',
       'FTr', 'ORB%', 'DRB%', 'TRB%', 'AST%', 'STL%', 'BLK%', 'TOV%', 'USG%',
       'Unnamed: 19', 'OWS', 'DWS', 'WS', 'WS/48', 'Unnamed: 24', 'OBPM',
       'DBPM', 'BPM', 'VORP'],
      dtype='object')

In [36]:
playerPerGame15.columns

Index(['Rk', 'Player', 'Pos', 'Age', 'Tm', 'G', 'GS', 'MP', 'FG', 'FGA', 'FG%',
       '3P', '3PA', '3P%', '2P', '2PA', '2P%', 'eFG%', 'FT', 'FTA', 'FT%',
       'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'PS/G'],
      dtype='object')

In [None]:
# import the columns we need from other datasets into playerValue*