# Does defense actually win championships?

In this analysis, we will attempt to answer this question with defensive team statistics on the NBA championships from the 1996-97 season up to the 2020-21 season. These stats are from different eras from the end of the Chicago Bull's Dynasty to the Lakers' dynasty and the more turbulent modern basketball era.

This will help identify consistent underlying patterns fundamental in the NBA championships even though the style of play has changed.

The relevence of this analysis is to provide evidence to coaches deciding on their next draft pick or trade should be geared towards a more defensive player to bolter their defensive efforts. It is also relevent to players as it can help illustrate how important defense is to their championship aspirations.

<!-- Following analyses will also address attacking and the combination of attacking and defense. -->

In [3]:
import sqlalchemy
import pandas as pd
from os import environ
from time import localtime

engine = sqlalchemy.create_engine("mariadb+mariadbconnector://"+environ.get("USER")+\
                                  ":"+environ.get("PSWD")+"@127.0.0.1:3306/nba")



### The first step is to collect the average team stats recorded in the playoffs from the database.

Since we are interested in team's winning the championship in relation to the defesive statistics, we will only collect the defensive statistics, wins and the team names. The teams can then be ordered by the number of wins with respect to the season the teams participated in

This includes:
- The playoff season that follows the format "004YY" where YY is the year the season starts (SEASON_ID)
- The team names of the participating teams (TEAM) and: 
- Their wins (W)
- Their average rebounds (REB)
- Their average steals (STL)
- Their average blocks (BLK)

In [43]:
fields = "SEASON_ID, Teams.Name as TEAM, W, REB, STL, BLK "

join =  "Team_standings INNER JOIN Teams on Team_standings.TEAM_ID = Teams.ID "

condition = "where SEASON_ID LIKE '004%' "

select = "SELECT "+ fields + "FROM " + join + condition + "order by SEASON_ID asc, W desc"

df = pd.read_sql(select, engine)

In [44]:
def build_year(year):
    if(2000+year> localtime().tm_year):
        return 1900+year
    return 2000+year

## Since this analysis is on championships, we consider the teams that appear in the NBA finals from the 1996-97 season to the 2020-21 season.

In [45]:
def segment_data(df):
    new_df  = pd.DataFrame()
    seasons = df['SEASON_ID'].unique()
    
    for s in seasons:
        d = df.loc[df['SEASON_ID'] == s].head(2)
        d["POSITION"] = list(range(1,len(d)+1))
        d["YEAR"] = build_year(int(s[-2:])+1)
        new_df = pd.concat([new_df,d])
        
    return new_df.sort_values(["YEAR","POSITION"],ascending= [True,False])

In [46]:
df = segment_data(df)

In [47]:
df

Unnamed: 0,SEASON_ID,TEAM,W,REB,STL,BLK,POSITION,YEAR
337,496,UTA,13,41.8,7.4,4.9,2,1997
336,496,CHI,15,43.5,8.5,4.8,1,1997
353,497,UTA,13,39.8,6.5,4.9,2,1998
352,497,CHI,15,41.0,9.0,4.5,1,1998
369,498,NYK,12,39.3,7.6,3.9,2,1999
368,498,SAS,15,40.2,7.3,5.8,1,1999
385,499,IND,13,40.5,5.4,4.7,2,2000
384,499,LAL,15,43.3,6.6,6.1,1,2000
1,400,PHI,12,43.9,8.0,4.9,2,2001
0,400,LAL,15,48.2,9.2,5.8,1,2001


In [34]:
# df.loc[df["POSITION"]==1]

Unnamed: 0,SEASON_ID,TEAM,W,REB,STL,BLK,POSITION,YEAR
336,496,CHI,15,43.5,8.5,4.8,1,1997
352,497,CHI,15,41.0,9.0,4.5,1,1998
368,498,SAS,15,40.2,7.3,5.8,1,1999
384,499,LAL,15,43.3,6.6,6.1,1,2000
0,400,LAL,15,48.2,9.2,5.8,1,2001
16,401,LAL,15,45.4,6.8,5.8,1,2002
32,402,SAS,16,45.5,7.8,6.9,1,2003
48,403,DET,16,44.8,8.0,7.1,1,2004
64,404,SAS,16,42.1,5.6,5.8,1,2005
80,405,MIA,16,42.0,7.0,4.6,1,2006


## Of the two teams, we will determine the percentage of teams that have won in the NBA finals given that the team that wins:

- Does not have any defensive stats higher than the losing team
- Has one of the defensive stats higher than the losing team
- Has two defensive stats higher than the losing team
- Has all defensive stats higher than the losing teams

#### So the winning teams with one or none of the defensive stats higher had a weaker defensive performance and those with two or three defensive stats higher had a stronger defensive performance, with a more complete defense. 

## This will show if a more complete defense likely leads to a win in the Finals. 

## We calculate the differences in the defensive stats for each year's NBA finalists. Positive differences mean that the winning team had higher defensive stats then the runner up. Negative diffences mean the winning team had lower defensive stats than the runner up

In [65]:
diffs = df.groupby("SEASON_ID").diff()

In [66]:
diffs.head()

Unnamed: 0,W,REB,STL,BLK,POSITION,YEAR
337,,,,,,
336,2.0,1.7,1.1,-0.1,-1.0,0.0
353,,,,,,
352,2.0,1.2,2.5,-0.4,-1.0,0.0
369,,,,,,


In [68]:
diffs["SEASON_ID"] = df["SEASON_ID"]
diffs["YEAR"] = df["YEAR"]
diffs["TEAM"] = df.loc[df["POSITION"]==1]["TEAM"]

In [69]:
diffs = diffs.dropna()

In [70]:
diffs = diffs[["REB","STL","BLK","SEASON_ID","TEAM","YEAR"]]

In [71]:
diffs

Unnamed: 0,REB,STL,BLK,SEASON_ID,TEAM,YEAR
336,1.7,1.1,-0.1,496,CHI,1997
352,1.2,2.5,-0.4,497,CHI,1998
368,0.9,-0.3,1.9,498,SAS,1999
384,2.8,1.2,1.4,499,LAL,2000
0,4.3,1.2,0.9,400,LAL,2001
16,2.7,-1.1,0.3,401,LAL,2002
32,0.4,0.3,2.6,402,SAS,2003
48,4.6,0.7,2.9,403,DET,2004
64,-0.2,-1.7,-0.3,404,SAS,2005
80,-0.7,0.4,0.3,405,MIA,2006


## With the differences calculated, we can count the number of stats the champions had higher than the runner up.

In [72]:
def count_higher(df,stats,number_higher):
    count = 0
    for i in df[stats].iterrows():
        higher = 0
        idx = 0
        while(idx<len(i[1])):
            if(i[1][idx] >= 0):
                higher+=1
            idx+=1
        
        if(higher == number_higher):
            count+=1
            
    return count

In [73]:
total_finals = len(diffs)
stats = ["REB","STL","BLK"]

In [114]:
def get_highs_and_percentages(diffs,stats, total_finals):
    highs = []
    percentages = []
    for h in range(len(stats)+1):
        highs.append(count_higher(diffs,stats, h))
        percentages.append(100*highs[h]/total_finals)
        
    return highs, percentages

### 

In [75]:
highs

[2, 5, 9, 9]

In [76]:
percentages

[8.0, 20.0, 36.0, 36.0]

In [77]:
labels = ["Zero", "One","Two","All"]

## We will now build a dashboard that shows the defensive statistic against the season's year. With colour seperating the team's positions

In [98]:
def year_slider():
    slider = dcc.RangeSlider(
                        id = 'years',
                        min = min(diffs["YEAR"]),
                        max = max(diffs["YEAR"]),
#                         value = max(reg_season.keys()),
                        marks = {int(year): str(year) for year in diffs["YEAR"].unique()},
#                         step = None,
                        value=[min(diffs["YEAR"]), max(diffs["YEAR"])]
#                         tooltip={'always_visible':False}
                    )
    return slider

In [86]:
from jupyter_dash import JupyterDash
from dash import html
from dash.dependencies import Input, Output
import plotly.express as px
from dash import dcc

In [92]:
min(diffs["YEAR"])

1997

In [123]:
fig = px.bar(y=labels, x=percentages,color=labels,barmode='overlay',opacity=1,orientation='h')
fig.update_layout(title ="NBA champions defensive performances from the 1997 Finals to the 2021 Finals",
                  title_x = 0.5,xaxis_title="Percentage of champions",
                  yaxis_title = "Defensive stats higher than the runner up")


app = JupyterDash(__name__)
colours = {'text': '#7FDBFF', 'background':'#333333','radio_button':'#BBBBBB'} 
text_size = {'H1':48,'H2':40,'text':28,'radio_button':20}

app.layout = html.Div(style={'backgroundColor':colours['background'],'fontFamily':'Arial'}, children=[
    html.H1(children="NBA champions defensive performances",
        style = {'textAlign': 'center',
                 'color':colours['text'],
                 'fontSize':text_size['H1']}),
    
    year_slider(),

    html.Div(children=[dcc.Graph(figure = fig, id = 'graph')])



])

@app.callback(
    Output('graph','figure'),
    Input('years','value'))
def update_figure(y):    
    d = diffs.loc[diffs["YEAR"].between(y[0],y[1])]
    
    _ , percentages = get_highs_and_percentages(d, stats,len(d))
    fig = px.bar(y=labels, x=percentages,color=labels,barmode='overlay',opacity=1,orientation='h')
    
    fig.update_layout(title ="NBA champions defensive performances from the "+
                      str(y[0]) +  " Finals to the " +str(y[1]) + " Finals",
                  title_x = 0.5,xaxis_title="Percentage of champions",
                  yaxis_title = "Defensive stats higher than the runner up")

    return fig

In [126]:
app.run_server(mode = "external")

Dash app running on http://127.0.0.1:8050/



The 'environ['werkzeug.server.shutdown']' function is deprecated and will be removed in Werkzeug 2.1.



## The teams with a more complete defense (two or more defensive stats higher tha their opponents) have won the Finals 72% of the time in the last 25 seasons.

![graph.png](attachment:graph.png)

### Teams in the NBA finals with a better average defensive performance in the playoff season appear twice as likely to win the Finals.