### Modeling Basketball Scores

The goal of this notebook is to create a model that can predict the scores of basketball games given each teams' offensive and defensive abilities and home advantage. In the future, I hope to also incorporate other factors such as

1. Individual players offensive and defensive abilities
2. Number of days of rest since last game
3. Distance traveled since last game

In [42]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline

In [43]:
games = pd.read_excel('games.xlsx')
games

Unnamed: 0,date,time,away_team,away_score,home_team,home_score,ot,attendance,arena
0,Tue Oct 18 2022,7:30p,Philadelphia 76ers,117,Boston Celtics,126,,19156.0,TD Garden
1,Tue Oct 18 2022,10:00p,Los Angeles Lakers,109,Golden State Warriors,123,,18064.0,Chase Center
2,Wed Oct 19 2022,7:00p,Orlando Magic,109,Detroit Pistons,113,,20190.0,Little Caesars Arena
3,Wed Oct 19 2022,7:00p,Washington Wizards,114,Indiana Pacers,107,,15027.0,Gainbridge Fieldhouse
4,Wed Oct 19 2022,7:30p,Houston Rockets,107,Atlanta Hawks,117,,17878.0,State Farm Arena
...,...,...,...,...,...,...,...,...,...
1225,Sun Apr 9 2023,3:30p,Utah Jazz,117,Los Angeles Lakers,128,,18997.0,Crypto.com Arena
1226,Sun Apr 9 2023,3:30p,New Orleans Pelicans,108,Minnesota Timberwolves,113,,18978.0,Target Center
1227,Sun Apr 9 2023,3:30p,Memphis Grizzlies,100,Oklahoma City Thunder,115,,16601.0,Paycom Center
1228,Sun Apr 9 2023,3:30p,Los Angeles Clippers,119,Phoenix Suns,114,,17071.0,Footprint Center


In [44]:
# Change ot column to count number of overtimes
games["ot"].replace({pd.NA:0, "OT":1, "2OT":2, "3OT":3, "4OT":4, "5OT":5, "6OT":6}, inplace=True)

In [45]:
games

Unnamed: 0,date,time,away_team,away_score,home_team,home_score,ot,attendance,arena
0,Tue Oct 18 2022,7:30p,Philadelphia 76ers,117,Boston Celtics,126,0,19156.0,TD Garden
1,Tue Oct 18 2022,10:00p,Los Angeles Lakers,109,Golden State Warriors,123,0,18064.0,Chase Center
2,Wed Oct 19 2022,7:00p,Orlando Magic,109,Detroit Pistons,113,0,20190.0,Little Caesars Arena
3,Wed Oct 19 2022,7:00p,Washington Wizards,114,Indiana Pacers,107,0,15027.0,Gainbridge Fieldhouse
4,Wed Oct 19 2022,7:30p,Houston Rockets,107,Atlanta Hawks,117,0,17878.0,State Farm Arena
...,...,...,...,...,...,...,...,...,...
1225,Sun Apr 9 2023,3:30p,Utah Jazz,117,Los Angeles Lakers,128,0,18997.0,Crypto.com Arena
1226,Sun Apr 9 2023,3:30p,New Orleans Pelicans,108,Minnesota Timberwolves,113,0,18978.0,Target Center
1227,Sun Apr 9 2023,3:30p,Memphis Grizzlies,100,Oklahoma City Thunder,115,0,16601.0,Paycom Center
1228,Sun Apr 9 2023,3:30p,Los Angeles Clippers,119,Phoenix Suns,114,0,17071.0,Footprint Center


In [57]:
len(games[games['away_team']=='Phoenix Suns'])

41

In [53]:
np.sort(games['away_team'].unique())

array(['Atlanta Hawks', 'Boston Celtics', 'Brooklyn Nets',
       'Charlotte Hornets', 'Chicago Bulls', 'Cleveland Cavaliers',
       'Dallas Mavericks', 'Denver Nuggets', 'Detroit Pistons',
       'Golden State Warriors', 'Houston Rockets', 'Indiana Pacers',
       'Los Angeles Clippers', 'Los Angeles Lakers', 'Memphis Grizzlies',
       'Miami Heat', 'Milwaukee Bucks', 'Minnesota Timberwolves',
       'New Orleans Pelicans', 'New York Knicks', 'Oklahoma City Thunder',
       'Orlando Magic', 'Philadelphia 76ers', 'Phoenix Suns',
       'Portland Trail Blazers', 'Sacramento Kings', 'San Antonio Spurs',
       'Toronto Raptors', 'Utah Jazz', 'Washington Wizards'], dtype=object)

In [102]:
# Function for fitting parameters for offense, defense, and home advantage.
def fit_parameters(game_data,
                   column_names=['home_team','home_score','away_score','away_team','ot','neutral'],
                   model_type='mult',
                   reg_time=48,
                   ot_time=5,
                   no_neutral=True,
                   delta_thresh=1e-7):
    """
    This function takes in scores of games and calculates parameters for offense,
    defense, and home advantage iteratively.
    
    Input:
    game_data: A Pandas DataFrame that contains columns representing teams, scores,
    whether or not the game was played at a neutral location, and the number of overtimes.
    
    column_names: Defines the names of the columns that correspond to each necessary feature.
    The default column_names are:
        home_team: Home team. Can be name or numerical identifier.
        home_score: Score for home team.
        away_score: Score for away team.
        away_team: Away team. Can be name or numerical identifier.
        ot: Number of overtimes played.
        neutral: 1 if the game was played at a neutral location. 0 if not.
    The DataFrame can contain other columns and the order does not matter.
    
    model_type: Type of model to use. Options are:
        mult: Multiplicative model that calculates home score as
            (home offense)*(away defense)*(home advantage)
            and calculates away score as
            (away offense)*(home defense)/(home advantage)
        add: Additive model that calculates home score as
            (home offense) + (away defense) + (home advantage)
            and calculates away score as
            (away offense) + (home defense) - (home advantage)
    
    reg_time: Number of minutes in a regulation game. Default is 48 (NBA length).
    
    ot_time: Number of minutes in an overtime. Default is 5.
    
    no_neutral: Specifies if none of the gamees were at a neutral location. Default is True.
    
    delta_thresh: In order for the iterative fitting of parameters to stop, all changes must
    be below the delta_thresh. Default is 1e-5.
    
    Output:
    A dictionary that has one entry for each team that includes offense and defense
    parameters and one entry for home advantage.
    """
    # Make copy of games_data DataFrame
    game_copy = game_data.copy()
    
    # Get the necessary column names
    h_team, h_score, a_score, a_team, ot, neutral = column_names
    
    # Get list of unique teams
    teams = np.sort(game_copy[h_team].unique())
    
    # Turn team names into numerical IDs
    team_dict = {team: num for num, team in enumerate(teams)}
    game_copy.replace(team_dict, inplace=True)
    
    # Get total points scored and allowed for each team
    pts_scored = np.array([np.sum(game_copy.loc[game_copy[a_team]==team,a_score]) + \
                           np.sum(game_copy.loc[game_copy[h_team]==team,h_score]) for team in team_dict.values()])
    pts_allowed = np.array([np.sum(game_copy.loc[game_copy[a_team]==team,h_score]) + \
                            np.sum(game_copy.loc[game_copy[h_team]==team,a_score]) for team in team_dict.values()])
    
    # Get total home points and total away points by all teams
    if no_neutral:
        home_total = np.sum(game_copy[h_score])
        away_total = np.sum(game_copy[a_score])
    else:
        home_total = np.sum(game_copy.loc[game_copy[neutral]==0,h_score])
        away_total = np.sum(game_copy.loc[game_copy[neutral]==0,a_score])
        
    # Multiplicative model
    if model_type == 'mult':
        # Initialize the parameters
        off_par = np.sqrt([pts / len(game_copy[(game_copy[a_team]==team)|(game_copy[h_team]==team)]) \
                   for team, pts in zip(team_dict.values(), pts_scored)])
        
        def_par = np.sqrt([pts / len(game_copy[(game_copy[a_team]==team)|(game_copy[h_team]==team)]) \
                   for team, pts in zip(team_dict.values(), pts_allowed)])
        
        h_adv = 1.0
        
        end_loop = 0
        # Create variable to track which parameter is changing
        off_def_adv = 0
        while end_loop < 3:
            # Use model to calculate scores for each game
            if no_neutral:
                game_copy[a_score] = off_par[game_copy[a_team]] * def_par[game_copy[h_team]] / h_adv * \
                (1.0 + game_copy['ot']*ot_time/reg_time)
                game_copy[h_score] = off_par[game_copy[h_team]] * def_par[game_copy[a_team]] * h_adv * \
                (1.0 + game_copy['ot']*ot_time/reg_time)
            else:
                game_copy[a_score] = off_par[game_copy[a_team]] * def_par[game_copy[h_team]] / \
                np.array([1.0 if neut==1 else h_adv for neut in game_copy[neutral]]) * \
                (1.0 + game_copy['ot']*ot_time/reg_time)
                game_copy[h_score] = off_par[game_copy[h_team]] * def_par[game_copy[a_team]] * \
                np.array([1.0 if neut==1 else h_adv for neut in game_copy[neutral]]) * \
                (1.0 + game_copy['ot']*ot_time/reg_time)
                
            # Adjust offensive parameters
            if off_def_adv == 0:
                # Calculate points scored by each team according to the model
                pts_scored_m = np.array([np.sum(game_copy.loc[game_copy[a_team]==team,a_score]) + \
                                         np.sum(game_copy.loc[game_copy[h_team]==team,h_score]) \
                                         for team in team_dict.values()])
                
                # Save old parameter values
                off_par_old = off_par
                
                # Rescale offensive parameters to match actual goals scored
                off_par = off_par * pts_scored / pts_scored_m
                
                # Check if changes are within threshold
                if np.max(np.abs(off_par - off_par_old)) < delta_thresh:
                    end_loop += 1
                else:
                    end_loop = 0
                
            # Adjust defensive parameters
            if off_def_adv == 1:
                pts_allowed_m = [np.sum(game_copy.loc[game_copy[a_team]==team,h_score]) + \
                                 np.sum(game_copy.loc[game_copy[h_team]==team,a_score]) for team in team_dict.values()]
                
                # Save old parameter values
                def_par_old = def_par
                
                # Rescale offensive parameters to match actual goals scored
                def_par = def_par * pts_allowed / pts_allowed_m
                
                # Check if changes are within threshold
                if np.max(np.abs(def_par - def_par_old)) < delta_thresh:
                    end_loop += 1
                else:
                    end_loop = 0
                
            # Adjust home advantage parameter
            if off_def_adv == 2:
                # Calculate total home and away points according to the model
                if no_neutral:
                    home_total_m = np.sum(game_copy[h_score])
                    away_total_m = np.sum(game_copy[a_score])
                else:
                    home_total_m = np.sum(game_copy.loc[game_copy[neutral]==0,h_score])
                    away_total_m = np.sum(game_copy.loc[game_copy[neutral]==0,a_score])
                    
                # Save old parameter value
                h_adv_old = h_adv
            
                # Rescale home advantage parameter
                h_adv = h_adv * np.sqrt(home_total/away_total) / np.sqrt(home_total_m/away_total_m)
                
                # Check if changes are within threshold
                if np.abs(h_adv - h_adv_old) < delta_thresh:
                    end_loop += 1
                else:
                    end_loop = 0
                
            # Increment off_def_adv
            off_def_adv = (off_def_adv + 1) % 3
        
    # Additive model
    if model_type == 'add':
        # Initialize the parameters
        off_par = np.array([pts / len(game_copy[(game_copy[a_team]==team)|(game_copy[h_team]==team)]) / 2 \
                   for team, pts in zip(team_dict.values(), pts_scored)])
        
        def_par = np.array([pts / len(game_copy[(game_copy[a_team]==team)|(game_copy[h_team]==team)]) / 2 \
                   for team, pts in zip(team_dict.values(), pts_allowed)])
        
        h_adv = 0.0
        
        
    return pd.DataFrame(data={'team':teams,'offense':off_par,'defense':def_par}), h_adv

In [103]:
par_dict, h_adv = fit_parameters(games)

In [106]:
par_dict['ratio'] = par_dict['offense'] / par_dict['defense']

In [111]:
par_dict.sort_values(by='ratio', ascending=False)

Unnamed: 0,team,offense,defense,ratio
1,Boston Celtics,10.879372,10.364058,1.049721
5,Cleveland Cavaliers,10.37535,9.964687,1.041212
22,Philadelphia 76ers,10.660077,10.33349,1.031605
16,Milwaukee Bucks,10.865555,10.60555,1.024516
14,Memphis Grizzlies,10.819151,10.563789,1.024173
7,Denver Nuggets,10.742033,10.536109,1.019545
19,New York Knicks,10.710582,10.510289,1.019057
25,Sacramento Kings,11.160726,11.030157,1.011837
23,Phoenix Suns,10.529895,10.413279,1.011199
18,New Orleans Pelicans,10.544731,10.469742,1.007162


In [105]:
h_adv

1.0109701272460965

In [137]:
par_dict.loc[par_dict['team']=='Milwaukee Bucks','defense'].iloc[0]*\
par_dict.loc[par_dict['team']=='Detroit Pistons','offense'].iloc[0]

108.95612218956335