# A Deep-Dive on the Effects of Head Coaching Changes in the NBA

## Introduction

The purpose of this project is to investigate the impact of head coach changes on underperforming NBA teams. In the NBA, head coaches are at the forefront of critique when a team underperforms. This usually results in a change of head coaching and, sometimes, the entire coaching staff; but, how effective is this strategy? This project aims to analyze the tangible effects of such coaching transitions on team performance in the subsequent seasons. My focus is on providing a data-driven exploration of the role head coaches play in the NBA and how these changes may influence the team's future performance.

At the end of the 2022-2023 regular season, the Toronto Raptors placed 9th in the Eastern Conference Standings, marking their 2nd missed playoffs in the 3 seasons subsequent to their 2019 Championship run. This led the Raptors parting ways with head coach Nick Nurse as well with the majority of the coaching staff in the summer of 2023. This season, the Raptors hired Darko Rajakovic, an assistant coach from the Memphis Grizzlies to be their new head coach to implement new systems, offensive and defensive philosophies, and to facilitate development of the young Toronto Raptors core.

Coming off high anticipation after the offseason, the Toronto Raptors are 2-4 to start the season. This raises the question: What level of impact can we relaistically expect from these coaching changes? This scenario provides a real-world backdrop for our comprehensive investigation intot he effects of coaching transitions across the NBA.

We seek to answer several questions through exploratory data analysis (EDA):

What is the average number of playoffs clinched within the first 3 years by teams that undergo head coaching changes? How does changing the head coach correlate with the average change in team win percentage in subsequent years? We will also delve into predictive modeling using neural networks (NN). The NN will help us predict the team's future win percentage and determine which seed they could potentially secure in their respective conference. Furthermore, we aim to predict the winningness of the team in subsequent years based on their regular-season records. These predictions will be categorized into:

- High seed team (1-4)
- Low seed team (5-8)
- Out of playoff contention

We can also measure a team's winningness by measuring playoff performance, predicting whether they will:

- Win a championship
- Clinch the conference finals
- Clinch the playoffs

Through this project, we aim to provide data-driven insights into the impact of coaching changes on NBA teams and their future performance, shedding light on the strategies employed in the dynamic world of professional basketball.

## Data Sources

This project uses [Swar's NBA API](https://github.com/swar/nba_api) for the acquisition of data accessible through https://stats.nba.com.

## Collecting the Data

In this section I'll be using the API to collect the data and construct the reporting table.  

In [120]:
# Importing libraries and API endpoints

import pandas as pd
from nba_api.stats.endpoints import commonteamroster
from nba_api.stats.static import teams
from nba_api.stats.endpoints import playoffpicture
from nba_api.stats.endpoints import teamdetails

import numpy as np
import tensorflow as tf
from tensorflow import keras
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

In [9]:
# Get a list of team info
team_info = teams.get_teams()
head_coaches_data = []

# Iterate through teams and seasons
for team in team_info:
    team_id = team['id']
    team_name = team['abbreviation']
    for season in range(2005, 2023):
        coach_data = commonteamroster.CommonTeamRoster(team_id=team_id, season=season)
        coach_data_df = coach_data.coaches.get_data_frame()
        seasons = f"{season}-{str(season+1)[-2:]}"
        if not coach_data_df.empty:
            try:
                coach_name = coach_data_df[coach_data_df['COACH_TYPE'] == 'Head Coach']['COACH_NAME'].values[0]
            except IndexError:
                coach_name = 'None'
        head_coaches_data.append({
            'Team ID': team_id,
            'Season': seasons,
            'Team': team_name,
            'Coach': coach_name
        })

head_coaches_df = pd.DataFrame(head_coaches_data)

In [10]:
# finding no data instances - data isn't really clean 
no_data = head_coaches_df[head_coaches_df['Coach'] == 'None']
no_data

Unnamed: 0,Team ID,Season,Team,Coach
63,1610612740,2014-15,NOP,
115,1610612743,2012-13,DEN,
190,1610612747,2015-16,LAL,
331,1610612755,2012-13,PHI,
355,1610612756,2018-19,PHX,
388,1610612758,2015-16,SAC,
423,1610612760,2014-15,OKC,
538,1610612766,2021-22,CHA,


In [11]:
playoff_picture_df = pd.DataFrame(columns=['Season', 'Seed', 'Team ID', 'Wins', 'Losses', 'Win PCT', 'Clinched Playoffs', 'Clinched Conference'])

for season in range(2005, 2023):
    playoff_picture = playoffpicture.PlayoffPicture(season_id='2' + str(season))

    EastConfStandings_df = playoff_picture.east_conf_standings.get_data_frame()
    EastConfPlayoffPicture_df = playoff_picture.east_conf_playoff_picture.get_data_frame()
    east_team_wins = EastConfStandings_df['WINS']
    east_team_losses = EastConfStandings_df['LOSSES']
    east_team_seed = EastConfPlayoffPicture_df['HIGH_SEED_RANK'].combine_first(EastConfPlayoffPicture_df['LOW_SEED_RANK'])
    east_team_pct = EastConfStandings_df['PCT']
    #east_team = EastConfStandings_df['TEAM']
    east_team_id = EastConfStandings_df['TEAM_ID']
    east_team_clinched_playoffs = EastConfStandings_df['CLINCHED_PLAYOFFS']
    east_team_clinched_conference = EastConfStandings_df['CLINCHED_CONFERENCE']

    east_df = pd.DataFrame({
        'Season': f"{season}-{str(season+1)[-2:]}",
        'Team ID': east_team_id,
        'Wins': east_team_wins,
        'Losses': east_team_losses, 
        'Win PCT': east_team_pct,
        'Clinched Playoffs': east_team_clinched_playoffs,
        'Clinched Conference': east_team_clinched_conference
        })
    
    WestConfStandings_df = playoff_picture.west_conf_standings.get_data_frame()
    WestConfPlayoffPicture_df = playoff_picture.west_conf_playoff_picture.get_data_frame()
    west_team_wins = WestConfStandings_df['WINS']
    west_team_losses = WestConfStandings_df['LOSSES']
    west_team_pct = WestConfStandings_df['PCT']
    #west_team = WestConfStandings_df['TEAM']
    west_team_id = WestConfStandings_df['TEAM_ID']
    west_team_clinched_playoffs = WestConfStandings_df['CLINCHED_PLAYOFFS']
    west_team_clinched_conference = WestConfStandings_df['CLINCHED_CONFERENCE']

    west_df = pd.DataFrame({
        'Season': f"{season}-{str(season+1)[-2:]}",
        'Team ID': west_team_id,
        'Wins': west_team_wins,
        'Losses': west_team_losses, 
        'Win PCT': west_team_pct,
        'Clinched Playoffs': west_team_clinched_playoffs,
        'Clinched Conference': west_team_clinched_conference
        })
    east_df.reset_index(drop=True, inplace=True)
    west_df.reset_index(drop=True, inplace=True)

    east_df['Seed'] = east_df.index + 1
    west_df['Seed'] = west_df.index + 1

    playoff_picture_df = pd.concat([playoff_picture_df, east_df, west_df])

playoff_picture_df

  playoff_picture_df = pd.concat([playoff_picture_df, east_df, west_df])


Unnamed: 0,Season,Seed,Team ID,Wins,Losses,Win PCT,Clinched Playoffs,Clinched Conference
0,2005-06,1,1610612765,64,18,0.780,1,1
1,2005-06,2,1610612748,52,30,0.634,1,0
2,2005-06,3,1610612739,50,32,0.610,1,0
3,2005-06,4,1610612751,49,33,0.598,1,0
4,2005-06,5,1610612764,42,40,0.512,1,0
...,...,...,...,...,...,...,...,...
10,2022-23,11,1610612742,38,44,0.463,0,0
11,2022-23,12,1610612762,37,45,0.451,0,0
12,2022-23,13,1610612757,33,49,0.402,0,0
13,2022-23,14,1610612745,22,60,0.268,0,0


In [4]:
data = []
teams = teams.get_teams()
team_id = [team['id'] for team in teams]
all_seasons = [f"{year - 1}-{str(year)[-2:]}" for year in range(2005, 2024)]

for id in team_id:
    champ = teamdetails.TeamDetails(team_id=id)
    champ_df = champ.team_awards_championships.get_data_frame()
    
    if 'YEARAWARDED' in champ_df:
        won_titles = champ_df[champ_df['YEARAWARDED'] >= 2005]
        if not won_titles.empty:
            seasons = won_titles['YEARAWARDED'].apply(lambda year: f"{year - 1}-{str(year)[-2:]}")
            data.extend([(id, season, 1) for season in seasons])

# Fill in missing entries with 0
for id in team_id:
    for season in all_seasons:
        if not any((entry[0] == id and entry[1] == season) for entry in data):
            data.append((id, season, 0))

title_df = pd.DataFrame(data, columns=['Team ID', 'Season', 'Won Title'])
title_df


Unnamed: 0,Team ID,Season,Won Title
0,1610612738,2007-08,1
1,1610612739,2015-16,1
2,1610612742,2010-11,1
3,1610612743,2022-23,1
4,1610612744,2014-15,1
...,...,...,...
565,1610612766,2018-19,0
566,1610612766,2019-20,0
567,1610612766,2020-21,0
568,1610612766,2021-22,0


In [13]:
combined_df = pd.merge(head_coaches_df, playoff_picture_df, on=['Team ID', 'Season'], how='left')
combined_df = pd.merge(combined_df, title_df, how='left')
combined_df = combined_df.sort_values(by='Season')
combined_df

Unnamed: 0,Team ID,Season,Team,Coach,Seed,Wins,Losses,Win PCT,Clinched Playoffs,Clinched Conference,Won Title
0,1610612737,2005-06,ATL,Mike Woodson,14,26,56,0.317,0,0,0
396,1610612759,2005-06,SAS,Gregg Popovich,1,63,19,0.768,1,1,0
36,1610612739,2005-06,CLE,Mike Brown,3,50,32,0.610,1,0,0
378,1610612758,2005-06,SAC,Rick Adelman,8,44,38,0.537,1,0,0
360,1610612757,2005-06,POR,Nate McMillan,15,21,61,0.256,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...
431,1610612760,2022-23,OKC,Mark Daigneault,10,40,42,0.488,0,0,0
449,1610612761,2022-23,TOR,Darko Rajakovic,9,41,41,0.500,0,0,0
467,1610612762,2022-23,UTA,Will Hardy,12,37,45,0.451,0,0,0
503,1610612764,2022-23,WAS,Wes Unseld Jr,12,35,47,0.427,0,0,0


In [24]:
import os
path = os.getcwd()

combined_df.to_csv('/Users/kelvin/Documents/projects-learning/repositories/sports-analytics/coaches.csv', index=False)

## Feature Engineering

**v1 (Current)**
- Coach experience (YoE)
- Coach track record (W/L)
- Historical team performance (T-3 seasons average)

v2
- Player stats

In [119]:
num_prev_seasons = 3

combined_df['Avg Wins T-3 Seasons'] = combined_df.groupby('Team')['Wins'].rolling(window=num_prev_seasons).mean().reset_index(0, drop=True)
combined_df['Avg Losses T-3 Seasons'] = combined_df.groupby('Team')['Losses'].rolling(window=num_prev_seasons).mean().reset_index(0, drop=True)
combined_df['Avg Win PCT T-3 Seasons'] = combined_df.groupby('Team')['Win PCT'].rolling(window=num_prev_seasons).mean().reset_index(0, drop=True)

combined_df['Wins'] = pd.to_numeric(combined_df['Wins'], errors='coerce')
combined_df['Losses'] = pd.to_numeric(combined_df['Losses'], errors='coerce')

combined_df = combined_df.sort_values(by = ['Coach', 'Season'])
combined_df['Agg. Coach Wins'] = combined_df.groupby('Coach')['Wins'].cumsum()
combined_df['Agg. Coach Losses'] = combined_df.groupby('Coach')['Losses'].cumsum()
combined_df['Helper'] = 1
combined_df['Coach Experience'] = combined_df.groupby('Coach')['Helper'].cumsum()
combined_df = combined_df.drop(['Helper', 'Team ID'], axis = 1)
combined_df

Unnamed: 0,Season,Team,Coach,Seed,Wins,Losses,Win PCT,Clinched Playoffs,Clinched Conference,Won Title,Avg Wins T-3 Seasons,Avg Losses T-3 Seasons,Avg Win PCT T-3 Seasons,Coach Experience,Agg. Coach Wins,Agg. Coach Losses
233,2022-23,MIL,Adrian Griffin,1,58,24,0.707,1,1,0,,,,1,58,24
346,2009-10,PHX,Alvin Gentry,3,54,28,0.659,1,0,0,,,,1,54,28
347,2010-11,PHX,Alvin Gentry,10,40,42,0.488,0,0,0,,,,2,94,70
348,2011-12,PHX,Alvin Gentry,10,33,33,0.500,0,0,0,42.333333,34.333333,0.549000,3,127,103
64,2015-16,NOP,Alvin Gentry,12,30,52,0.366,0,0,0,,,,4,157,155
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
503,2022-23,WAS,Wes Unseld Jr,12,35,47,0.427,0,0,0,34.666667,44.000000,0.442000,3,104,132
467,2022-23,UTA,Will Hardy,12,37,45,0.451,0,0,0,38.666667,38.000000,0.506667,1,37,45
69,2020-21,NOP,Willie Green,11,31,41,0.431,0,0,0,35.333333,40.000000,0.465667,1,31,41
70,2021-22,NOP,Willie Green,9,36,46,0.439,0,0,0,32.333333,43.000000,0.429000,2,67,87


In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)