# What Makes an NBA Champion?

#### Daniel Abboudi and Sean Campi
##### Data Bootcamp, NYU Stern 4/25/2021

In [392]:
import pandas as pd
import numpy as np
from sklearn.cluster import AgglomerativeClustering as agglom
from scipy.spatial.distance import cdist as dist
from sklearn.manifold import TSNE
import plotly.express as px

Data is from [NBA.com](https://www.nba.com/stats/teams/)
<br>
Includes Advanced Stats, Traditional Stats (standardized over 100 possessions), and Opponent Stats (standardized over 100 possessions)
<br>
<br>
*Note: Unfortunately, NBA.com does not allow for scraping or downloading of their data. We copy and pasted the relevant data into three excel files and uploaded them to the project's github repository*

In [182]:
# Import the data sets
adv = pd.read_excel('https://github.com/danielabboudi/DB_Project/raw/main/NBA_Advanced.xlsx')
trad = pd.read_excel('https://github.com/danielabboudi/DB_Project/raw/main/NBA_Traditional_per100.xlsx')
opp = pd.read_excel('https://github.com/danielabboudi/DB_Project/raw/main/NBA_Opponent_per100.xlsx')

In [183]:
# Merge the data sets
merge1 = pd.merge(trad,adv,how='left',left_on=['Season','TEAM'],right_on=['Season','TEAM'])
df = pd.merge(merge1,opp,how='left',left_on=['Season','TEAM'],right_on=['Season','TEAM'])

In [321]:
# Convert percentages into decimals
df['FG%'] = df['FG%']/100
df['3P%'] = df['3P%']/100
df['FT%'] = df['FT%']/100
df['OREB%'] = round(df['OREB%']/100)
df['DREB%'] = round(df['DREB%']/100)
df['TOV%'] = round(df['TOV%']/100,4)
df['TS%'] = round(df['TS%']/100,4)

df['OPPFG%'] = df['OPPFG%']/100
df['OPP3P%'] = df['OPP3P%']/100
df['OPPFT%'] = df['OPPFT%']/100

KeyError: 'FG%'

Statistician Dean Oliver famously determined that there are ["four factors"](https://www.basketball-reference.com/about/factors.html) that contribute to winning NBA games.
   1. Shooting - eFG% or TS% (shooting efficiency, weighting 3-pointers higher than 2-pointers)
   2. Turnovers - TOV% (turnovers per total plays in a game)
   3. Rebounding - OREB% and DREB% (rebounds per available total rebounds in a game)
   4. Free Throws - FTR (free throw rate per field goals attempted)
<br>

We may want to expand some of these metrics to get a better comparison between teams. For example, two teams can have identical TS%, but with one team making more of their 3-point attempts and the other highly efficient on 2-pointers at the rim and making a lot of free throws. Statistically, those two teams might look similar based on TS%, but they are very different stylistically.

In [185]:
# Calculate new columns for additional metrics
df['2P%'] = round((df['FGM']-df['3PM'])/(df['FGA']-df['3PA']),4)                 # Expanding shooting
df['FTR'] = round(df['FTA']/df['FGA'],4)
df['3PFREQ'] = round(df['3PA']/df['FGA'],4)                                      # Expanding shooting
df['FGAFREQ'] = round(df['FGA']/(df['FGA']+df['TOV']+0.44*df['FTA']),4)          # Expanding shooting
df['STL%'] = round(df['STL']/df['OPPTOV'],4)                                     # Expanding turnovers

df['OPPTS%'] = round(df['OPPPTS']/(2*(df['OPPFGA']+0.44*df['OPPFTA'])),4)
df['OPP2P%'] = round((df['OPPFGM']-df['OPP3PM'])/(df['OPPFGA']-df['OPP3PA']),4)
df['OPPFTR'] = round(df['OPPFTA']/df['OPPFGA'],4)
df['OPP3PFREQ'] = round(df['OPP3PA']/df['OPPFGA'],4)
df['OPPFGAFREQ'] = round(df['OPPFGA']/(df['OPPFGA']+df['OPPTOV']+0.44*df['OPPFTA']),4)
df['OPPFTAFREQ'] = round((0.44*df['OPPFTA'])/(df['OPPFGA']+df['OPPTOV']+0.44*df['OPPFTA']),4)
df['OPPTOV%'] = round(df['OPPTOV']/(df['OPPFGA']+df['OPPTOV']+0.44*df['OPPFTA']),4)
df['OPPSTL%'] = round(df['OPPSTL']/df['TOV'],4)
df['OPPOREB%'] = 1-df['DREB%']
df['OPPDREB%'] = 1-df['OREB%']
df['OPPAST/TO'] = df['OPPAST']/df['OPPTOV']

In [186]:
# Select the columns we want based on the four factors
general = ['Season','TEAM','WIN%','OFFRTG','DEFRTG','NETRTG','PACE']
shooting = ['TS%','2P%','3P%','FGAFREQ','3PFREQ',]
turnovers = ['TOV%','AST/TO','STL%']
rebounding = ['OREB%','DREB%']
free_throws = ['FT%','FTR']

opp_shooting = ['OPPTS%','OPP2P%','OPP3P%','OPPFGAFREQ','OPP3PFREQ',]
opp_turnovers = ['OPPTOV%','OPPAST/TO','OPPSTL%']
opp_rebounding = ['OPPOREB%','OPPDREB%']
opp_free_throws = ['OPPFT%','OPPFTR']

df = df[general+shooting+turnovers+rebounding+free_throws+opp_shooting+opp_turnovers+opp_rebounding+opp_free_throws]

In [187]:
# Create a dictionary of NBA Champions
champions = {2001: 'Los Angeles Lakers',
             2002: 'Los Angeles Lakers',
             2003: 'San Antonio Spurs',
             2004: 'Detroit Pistons',
             2005: 'San Antonio Spurs',
             2006: 'Miami Heat',
             2007: 'San Antonio Spurs',
             2008: 'Boston Celtics',
             2009: 'Los Angeles Lakers',
             2010: 'Los Angeles Lakers',
             2011: 'Dallas Mavericks',
             2012: 'Miami Heat',
             2013: 'Miami Heat',
             2014: 'San Antonio Spurs',
             2015: 'Golden State Warriors',
             2016: 'Cleveland Cavaliers',
             2017: 'Golden State Warriors',
             2018: 'Golden State Warriors',
             2019: 'Toronto Raptors',
             2020: 'Los Angeles Lakers'}

In [188]:
# Pull Champions from our Dictionary
df['Champion'] = 0
for i in range(0,len(df),1):
    if df['Season'][i] == 2021:
        df['Champion'][i] = 0 
    elif df['TEAM'][i] == champions[df['Season'][i]]:
        df['Champion'][i] = 1
    else:
        df['Champion'][i] = 0



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [189]:
df[df['Champion']==1][['Season','TEAM']].head(5)

Unnamed: 0,Season,TEAM
1,2001,Los Angeles Lakers
30,2002,Los Angeles Lakers
59,2003,San Antonio Spurs
92,2004,Detroit Pistons
118,2005,San Antonio Spurs


In [341]:
# We don't want to over fit the model by inputting stats like OFFRTG that grade overall efficiency
# We would like the model to be able to group teams based on efficiency without the main metrics
# We will group the teams by various methods and plot them together
df2 = df.drop(['WIN%','OFFRTG','DEFRTG','NETRTG','TS%','OPPTS%','Champion','PACE'],axis=1).set_index(['Season','TEAM'])

In [342]:
%%time
tsne = pd.DataFrame(TSNE().fit_transform(
    df2),index=df2.index).reset_index()

Wall time: 5.96 s


In [343]:
tsne = tsne.merge(df,on=['Season','TEAM'],how='inner')

In [344]:
clusters = pd.DataFrame(agglom(n_clusters=5).fit_predict(df2),index=df2.index).reset_index()
clusters = clusters.rename(columns={0:'Clusters'})
tsne = tsne.merge(clusters,on=['Season','TEAM'],how='inner')

In [389]:
px.scatter(tsne.dropna(),x=0,y=1,
          hover_data=['TEAM','Season','WIN%','OFFRTG','DEFRTG','TS%','OREB%','TOV%','FTR','Champion','3PFREQ','PACE'],
           color='Clusters',size='WIN%')

In [393]:
dist = pd.DataFrame(dist(df2,df2),index=df2.index,columns=df2.index)

In [411]:
%%time
tsne1 = pd.DataFrame(TSNE().fit_transform(
    dist),index=dist.index).reset_index()

Wall time: 7.55 s


In [412]:
tsne1 = tsne1.merge(df,on=['Season','TEAM'],how='inner').merge(clusters,on=['Season','TEAM'],how='inner')

In [413]:
px.scatter(tsne1.dropna(),x=0,y=1,
          hover_data=['TEAM','Season','WIN%','OFFRTG','DEFRTG','TS%','OREB%','TOV%','FTR','Champion','3PFREQ','PACE'],
           color='Clusters',size='WIN%')

In [375]:
dist = dist.replace(0,np.inf)
dist

Unnamed: 0_level_0,Season,2001,2001,2001,2001,2001,2001,2001,2001,2001,2001,...,2021,2021,2021,2021,2021,2021,2021,2021,2021,2021
Unnamed: 0_level_1,TEAM,San Antonio Spurs,Los Angeles Lakers,Philadelphia 76ers,Sacramento Kings,Dallas Mavericks,Utah Jazz,Milwaukee Bucks,Phoenix Suns,Miami Heat,Portland Trail Blazers,...,New Orleans Pelicans,Chicago Bulls,Toronto Raptors,Sacramento Kings,Cleveland Cavaliers,Oklahoma City Thunder,Orlando Magic,Detroit Pistons,Minnesota Timberwolves,Houston Rockets
Season,TEAM,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
2001,San Antonio Spurs,inf,0.156640,0.280484,0.181613,0.186506,0.350847,0.200032,0.358899,0.388176,0.154546,...,0.681884,0.602433,0.540679,0.691364,0.396302,0.647948,0.685815,0.439473,0.530330,0.491438
2001,Los Angeles Lakers,0.156640,inf,0.340971,0.253449,0.242531,0.397632,0.177515,0.424953,0.477196,0.163710,...,0.591573,0.528966,0.495231,0.608159,0.342856,0.594331,0.611387,0.375030,0.455191,0.439701
2001,Philadelphia 76ers,0.280484,0.340971,inf,0.250222,0.289530,0.424336,0.385702,0.350927,0.353664,0.271919,...,0.840000,0.756545,0.710963,0.891044,0.498208,0.696948,0.830827,0.584013,0.695025,0.605502
2001,Sacramento Kings,0.181613,0.253449,0.250222,inf,0.165452,0.305533,0.213247,0.247325,0.286321,0.132232,...,0.762804,0.664638,0.559209,0.762957,0.432968,0.707533,0.755531,0.494575,0.574485,0.528873
2001,Dallas Mavericks,0.186506,0.242531,0.289530,0.165452,inf,0.296735,0.165069,0.280025,0.304304,0.154387,...,0.739849,0.626453,0.521688,0.733259,0.411105,0.675737,0.724876,0.460050,0.551946,0.491147
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021,Oklahoma City Thunder,0.647948,0.594331,0.696948,0.707533,0.675737,0.916579,0.633989,0.906172,0.902881,0.670560,...,0.417924,0.358618,0.575475,0.540505,0.340148,inf,0.344753,0.344749,0.411437,0.316508
2021,Orlando Magic,0.685815,0.611387,0.830827,0.755531,0.724876,0.923093,0.619225,0.951314,0.977594,0.699537,...,0.168428,0.166620,0.450786,0.245846,0.379177,0.344753,inf,0.307819,0.256789,0.350304
2021,Detroit Pistons,0.439473,0.375030,0.584013,0.494575,0.460050,0.655493,0.368408,0.680322,0.695778,0.443369,...,0.323126,0.213059,0.263798,0.357362,0.125926,0.344749,0.307819,inf,0.143300,0.132213
2021,Minnesota Timberwolves,0.530330,0.455191,0.695025,0.574485,0.551946,0.722769,0.432153,0.756518,0.784802,0.523394,...,0.259834,0.171583,0.231542,0.250632,0.237124,0.411437,0.256789,0.143300,inf,0.213940


In [362]:
dist[(2021,'Phoenix Suns')].sort_values()

Season  TEAM                 
2021    Phoenix Suns             0.000000
2019    Boston Celtics           0.104729
2021    Memphis Grizzlies        0.164402
        Indiana Pacers           0.165348
2019    Golden State Warriors    0.180177
                                   ...   
2004    Utah Jazz                1.076120
        Washington Wizards       1.097759
2003    Denver Nuggets           1.110541
2007    Orlando Magic            1.125038
2006    New York Knicks          1.172636
Name: (2021, Phoenix Suns), Length: 626, dtype: float64