<b>Using analysis based on a non-traditional statistic, make a case for a player who is generally underrated. </b>

This notebook outlines the gathering and analysis of data that led to my conclusion that Clint Capella is underrated. All data was gathered from www.basketball-reference.com.  You do not have to understand the code in order to follow along.  

In [166]:
import pandas as pd
import numpy as np
pd.set_option('display.max_rows', 1000)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# This data was aquired from www.basketball-reference.com
# It contains the top 500 players based on PER rating
df = pd.read_csv('nba_advanced_data.csv')

# only keep players who played more than 50 games
df = df.loc[df['G'] > 50]

# drop unneccesary rows
df['PER'] = df['PER ?']
df.reset_index(inplace= True)
df.drop(['index','Rk', 'PER ?','Pos','Age', 'Tm','Unnamed: 19','Unnamed: 24'], axis = 1, inplace = True)
df.head()

Unnamed: 0,Player,G,MP,TS%,3PAr,FTr,ORB%,DRB%,TRB%,AST%,STL%,BLK%,TOV%,USG%,OWS,DWS,WS,WS/48,OBPM,DBPM,BPM,VORP,PER
0,James Harden,72,2551,0.619,0.498,0.502,1.8,15.2,8.6,45.1,2.4,1.7,15.1,36.1,11.6,3.8,15.4,0.289,9.6,1.3,10.9,8.3,29.8
1,Anthony Davis,75,2727,0.612,0.111,0.409,7.7,24.8,16.5,10.8,2.0,5.6,8.6,30.0,8.8,4.9,13.7,0.241,2.8,2.3,5.2,4.9,28.9
2,LeBron James,82,3026,0.621,0.257,0.336,3.7,22.3,13.1,44.4,1.9,2.0,16.1,31.6,11.0,3.0,14.0,0.221,7.6,2.0,9.6,8.9,28.6
3,Stephen Curry,51,1631,0.675,0.58,0.35,2.7,14.4,9.0,30.3,2.4,0.4,13.3,31.0,7.2,1.8,9.1,0.267,9.9,-1.3,8.6,4.4,28.2
4,Giannis Antetokounmpo,75,2756,0.598,0.1,0.457,6.7,25.3,16.0,23.7,2.0,3.3,11.7,31.2,8.3,3.6,11.9,0.207,3.9,1.9,5.8,5.4,27.3


The data includes advanced statistics for 500 players from the 2017-18 NBA season, sorted by PER rating.  As you can see, there are some memorable names at the top!

In [167]:
df.shape

(309, 23)

309 players played more than 50 games and will be included in this analysis.

In [168]:
df['PER Rank'] = df['PER'].rank(ascending = False)
df['WS Rank'] = df['WS'].rank(ascending = False)
df['VORP Rank'] = df['VORP'].rank(ascending = False)
df['TS% Rank'] = df['TS%'].rank(ascending = False)

I've created four new columns, each contains the players' league ranking on the four statistics I believe are most indicative of value to a team:

<b>PER - Player efficiency rating</b> - an attempt to boil down a player's overall impact on a game to one number.  <br>
<b>WS - Win shares</b> - attempts to divy up credit for team success to the individuals on a team.<br>
<b>VORP - Value over replacement player</b> - "an estimate of each player's overall contribution to the team, measured vs. what a theoretical "replacement player" would provide, where the "replacement player" is defined as a player on minimum salary or not a normal member of a team's rotation." - basketball reference<br>
<b>TS% - True shooting percentage</b> - "True shooting percentage is a measure of shooting efficiency that takes into account field goals, 3-point field goals, and free throws." - basketball reference


In [169]:
salary = pd.read_csv('nba_salary.csv', header = 1)

I've chosen to use the 2017-18 salary data as a measurement of value, although I acknowledge it is an imperfect measurement due to the draft process.  In a true open market, value can be measured in the form of dollars, but the NBA draft system imposes maximums on the amount owners can pay players depending on their years of service.  As a consequence, young superstars are often paid far below their value.

On the other hand, pundits and fans are able to assign value in the form of praise for young players such as Karl Anthony Towns, who plays, but doesn't get paid like a superstar.  So, while we wouldn't call KAT "underrated" due to the amount of praise he receives, his salary is far below the amount he deserves. 

In [170]:
# reset index for merging
salary.index = salary['Player']
df.index = df['Player']

# merge statistics and salary data
df_new = df.merge(salary,on = 'Player', how = 'left')

# reformat salary data as integers
df_new['2017-18'] = df_new['2017-18'].str.replace(',', '')
df_new['2017-18'] = df_new['2017-18'].str.replace('$', '')
df_new['2017-18'] = df_new['2017-18'].astype(int)

# rank salaries
df_new['2017-18_salary_rank'] = df_new['2017-18'].rank(ascending = False)

# drop unnecessary columns
df_new.drop(['2018-19','2019-20','2020-21','2021-22','2022-23','Signed Using', 'Guaranteed'], axis = 1, inplace = True)

In [171]:
# filter and retain only the necessary rows and columns
best = df_new.loc[(df_new['PER Rank'] < 50) & (df_new['WS Rank'] < 50) & (df_new['VORP Rank'] < 50) & (df_new['TS% Rank'] < 50)]
best = best[['Player','PER Rank','WS Rank','VORP Rank','TS% Rank','PER','VORP','TS%','WS','2017-18','2017-18_salary_rank']]

Now I'm going to filter my dataframe so that it only selects players who rank in the top 50 for each of the four key metrics, and then I'll sort it by salary from lowest to highest.

In [172]:
best.sort_values('2017-18_salary_rank', ascending = False)

Unnamed: 0,Player,PER Rank,WS Rank,VORP Rank,TS% Rank,PER,VORP,TS%,WS,2017-18,2017-18_salary_rank
12,Clint Capela,13.0,12.0,31.5,7.0,24.5,2.6,0.65,10.2,2334520,221.0
9,Karl-Anthony Towns,10.0,2.5,6.0,10.0,24.9,5.5,0.646,14.0,6216840,133.0
47,Darren Collison,48.0,30.5,46.5,42.0,18.8,2.1,0.61,7.6,10000000,97.5
8,Kyrie Irving,8.5,21.0,16.0,42.0,25.0,4.0,0.61,8.9,18868625,41.0
16,Enes Kanter,17.0,30.5,46.5,19.5,24.0,2.1,0.63,7.6,20566802,33.0
29,Rudy Gobert,29.0,25.5,28.5,4.0,20.7,2.9,0.657,8.1,21974719,30.0
30,Steven Adams,30.5,15.0,22.5,19.5,20.6,3.3,0.63,9.7,22471910,27.5
33,DeAndre Jordan,34.0,17.0,33.5,8.0,20.2,2.5,0.648,9.4,22642350,24.5
1,Anthony Davis,2.0,4.0,10.0,38.5,28.9,4.9,0.612,13.7,23775506,17.5
14,Chris Paul,14.5,12.0,15.0,49.0,24.4,4.3,0.604,10.2,24599495,14.0


Wow!  We can see that <b>Clint Capela</b> tops the list as the most underrated player determined by salary. He has the 221st highest salary out of the 309 players included.  <b>His salary for the 2017-18 season was only $2,334,520, yet he ranked in the top 15 for PER, WS, and TS%, and he's top 40 in VORP.</b>  In addition, the Houston Rockets had the best record of any team over the course of the season.  It's clear that it was in part due to his contributions.

I would argue that he is also underrated in terms of fan praise and support.  He's most definitely not a household name, although, he did start to earn some recognition for his performance in the 2018 playoffs. Regardless, you really only need to look at the other names on this list to conclude that he's doing some special work.  Pay the man!