# Does defense actually win championships?

In this analysis, we will attempt to answer this question with defensive team statistics on the NBA championships from the 1996-97 season up to the 2020-21 season. These are different eras which would help identify if there is a consistent underlying pattern that is fundamental in the NBA even as the style of play changes.

The relevence of this analysis is to provide evidence to coaches deciding on their next draft pick or trade should be geared towards a more defensive player. It is also relevent to players as it can help illustrate how important defense is to their championship aspirations.

Following analyses will also address attacking and the combination of attacking and defense.

In [13]:
import sqlalchemy
import pandas as pd
from os import environ
from time import localtime

engine = sqlalchemy.create_engine("mariadb+mariadbconnector://"+environ.get("USER")+\
                                  ":"+environ.get("PSWD")+"@127.0.0.1:3306/nba")



### The first step is to collect the average team stats reorded in the playoffs from the database.

Since we are interested in team's winning the championship in relation to the defesive statistics, we will only collect the defensive statistics, wins and the team names. The teams can then be ordered by the number of wins with respect to the season the teams participated in

We will therefore take:
- The playoff season that follows the format "004YY" where YY is the year the season starts (SEASON_ID)
- The team names of the participating teams (TEAM) and: 
- Their wins (W)
- Their average rebounds (REB)
- Their average steals (STL)
- Their average blocks (BLK)

In [32]:
fields = "SEASON_ID, Teams.Name as TEAM, W, REB, STL, BLK "

join =  "Team_standings INNER JOIN Teams on Team_standings.TEAM_ID = Teams.ID "

condition = "where SEASON_ID LIKE '004%' "

select = "SELECT "+ fields + "FROM " + join + condition + "order by SEASON_ID asc, W desc"

df = pd.read_sql(select, engine)

In [33]:
def build_year(year):
    if(2000+year> localtime().tm_year):
        return 1900+year
    return 2000+year

## Since this analysis is on championships, we consider the teams that appear in the NBA finals from the 1996-97 season to the 2020-21 season.

In [16]:
def segment_data(df):
    new_df  = pd.DataFrame()
    seasons = df['SEASON_ID'].unique()
    
    for s in seasons:
        d = df.loc[df['SEASON_ID'] == s].head(2)
        d["POSITION"] = list(range(1,len(d)+1))
        d["YEAR"] = build_year(int(s[-2:]))
        new_df = pd.concat([new_df,d])
        
    return new_df.sort_values(["YEAR","POSITION"],ascending= [True,False])

In [42]:
df = segment_data(df)

In [43]:
df.head()

Unnamed: 0,SEASON_ID,TEAM,W,REB,STL,BLK,POSITION,YEAR
337,496,UTA,13,41.8,7.4,4.9,2,1996
336,496,CHI,15,43.5,8.5,4.8,1,1996
353,497,UTA,13,39.8,6.5,4.9,2,1997
352,497,CHI,15,41.0,9.0,4.5,1,1997
369,498,NYK,12,39.3,7.6,3.9,2,1998


## Of the two teams, we will determine the percentage of teams that have won in the NBA finals given that the team that wins:

- Does not have any defensive stats higher than the losing team
- Has one of the defensive stats higher than the losing team
- Has two defensive stats higher than the losing team
- Has all defensive stats higher than the losing teams

#### So the winning teams with one or none of the defensive stats higher had a weaker defensive performance and those with two or three defensive stats higher had a stronger defensive performance, with a more complete defense. 

## This will show if a more complete defense likely leads to a win in the Finals. 

## We calculate the differences in the defensive stats for each year's NBA finalists. Positive differences mean that the winning team had higher defensive stats then the runner up. Negative diffences mean the winning team had lower defensive stats than the runner up

In [44]:
diffs = df.groupby("SEASON_ID").diff()

In [94]:
df.head()

Unnamed: 0,SEASON_ID,TEAM,W,REB,STL,BLK,POSITION,YEAR
337,496,UTA,13,41.8,7.4,4.9,2,1996
336,496,CHI,15,43.5,8.5,4.8,1,1996
353,497,UTA,13,39.8,6.5,4.9,2,1997
352,497,CHI,15,41.0,9.0,4.5,1,1997
369,498,NYK,12,39.3,7.6,3.9,2,1998


In [50]:
diffs["SEASON_ID"] = df["SEASON_ID"]
diffs["TEAM"] = df.loc[df["POSITION"]==1]["TEAM"]

In [51]:
diffs = diffs.dropna()

In [52]:
diffs = diffs[["REB","STL","BLK","SEASON_ID","TEAM"]]

In [53]:
diffs.head()

Unnamed: 0,REB,STL,BLK,SEASON_ID,TEAM
336,1.7,1.1,-0.1,496,CHI
352,1.2,2.5,-0.4,497,CHI
368,0.9,-0.3,1.9,498,SAS
384,2.8,1.2,1.4,499,LAL
0,4.3,1.2,0.9,400,LAL


## With the differences calculated, we can count the number of stats the champions had higher than the runner up.

In [54]:
def count_higher(df,stats,number_higher):
    count = 0
    for i in df[stats].iterrows():
        higher = 0
        idx = 0
        while(idx<len(i[1])):
            if(i[1][idx] >= 0):
                higher+=1
            idx+=1
        
        if(higher == number_higher):
            count+=1
            
    return count

In [55]:
total_finals = len(diffs)
stats = ["REB","STL","BLK"]

In [82]:
highs = []
percentages = []
for h in range(len(stats)+1):
    highs.append(count_higher(diffs,stats, h))
    percentages.append(100*highs[h]/total_finals)

### 

In [83]:
highs

[2, 5, 9, 9]

In [84]:
percentages

[8.0, 20.0, 36.0, 36.0]

In [85]:
labels = ["Zero", "One","Two","All"]

## We will now build a dashboard that shows the defensive statistic against the season's year. With colour seperating the team's positions

In [61]:
from jupyter_dash import JupyterDash
from dash import html
from dash.dependencies import Input, Output
import plotly.express as px
from dash import dcc



In [92]:
fig = px.bar(y=labels, x=percentages,color=labels,barmode='overlay',opacity=1,orientation='h')
fig.update_layout(title ="NBA champions defensive performances from the 1996-97 to the 2020-21 season",
                  title_x = 0.5,
                  yaxis={'categoryorder':'max ascending'},xaxis_title="Percentage of champions",
                  yaxis_title = "Defensive stats higher than the runner up")


app = JupyterDash(__name__)
colours = {'text': '#7FDBFF', 'background':'#333333','radio_button':'#BBBBBB'} 
text_size = {'H1':48,'H2':40,'text':28,'radio_button':20}

app.layout = html.Div(style={'backgroundColor':colours['background'],'fontFamily':'Arial'}, children=[
    html.H1(children="NBA champions defensive performances from the 1996-97 to the 2020-21 season",
        style = {'textAlign': 'center',
                 'color':colours['text'],
                 'fontSize':text_size['H1']}),

    html.Div(children=[dcc.Graph(figure = fig, id = 'graph')])])

In [93]:
app.run_server(mode = "external")

Dash app running on http://127.0.0.1:8050/



The 'environ['werkzeug.server.shutdown']' function is deprecated and will be removed in Werkzeug 2.1.



## The resulting graph shows that the teams with a more complete defense have won the Finals  72% of the time in the last 25 seasons.

![graph](../graph.png)

### These results indicate that teams in the NBA finals with a better overall defensive performance that season are twice as likely to win the Finals.