# NBA Win Margin Data Analysis

This Study utilizes NBA game data to test if NBA games can be selected that if be would be profitable to bet in the NBA.  


### Betting on NBA games

Sports books require customers to bet at odds of -110.  That means that betters must lay 110 units to win 100.  This ratio requires the better to win 52.38% of their bets to break even betting against the house. 

>>>$ .5238 * 100 =  .4762 * 110 $


This analysis attempts to use past NBA box score data to create a subset of predictions that improve on the 52.38% accuracy against the spread on unseen data.  

In this first notebook the data is transformed so that no information about the game that is unknown before the game is included in the analysis.  Historical moving averages and cumulative season data are computed and game data is merged so that each row is a single game.


In [1]:
import pandas as pd
from datetime import datetime

%matplotlib inline

In [2]:
nba2006 = pd.read_csv('./data/2006-2007_NBA_Box_Score_Team_Stats.csv')
nba2007 = pd.read_csv('./data/2007-2008_NBA_Box_Score_Team_Stats.csv')
nba2008 = pd.read_csv('./data/2008-2009_NBA_Box_Score_Team_Stats.csv')
nba2009 = pd.read_csv('./data/2009-2010_NBA_Box_Score_Team_Stats.csv')
nba2010 = pd.read_csv('./data/2010-2011_NBA_Box_Score_Team_Stats.csv')
nba2011 = pd.read_csv('./data/2011-2012_NBA_Box_Score_Team_Stats.csv')
nba2012 = pd.read_csv('./data/2012-2013_NBA_Box_Score_Team_Stats.csv')
nba2013 = pd.read_csv('./data/2013-2014_NBA_Box_Score_Team-Stats.csv')
nba2014 = pd.read_csv('./data/2014-2015_NBA_Box_Score_Team-Stats.csv')
nba2015 = pd.read_csv('./data/2015-2016_NBA_Box_Score_Team-Stats.csv')
nba2016 = pd.read_csv('./data/2016-2017_NBA_Box_Score_Team-Stats.csv')
nba2017 = pd.read_csv('./data/2017-2018_NBA_Box_Score_Team-Stats.csv')

A couple of column names changed in 2014

In [3]:
nba2006.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2619 entries, 0 to 2618
Data columns (total 50 columns):
DATASET             2618 non-null object
DATE                2618 non-null object
TEAMS               2618 non-null object
VENUE               2618 non-null object
1Q                  2618 non-null float64
2Q                  2618 non-null float64
3Q                  2618 non-null float64
4Q                  2618 non-null float64
OT1                 180 non-null float64
OT2                 34 non-null float64
OT3                 2 non-null float64
OT4                 0 non-null float64
F                   2618 non-null float64
MIN                 2618 non-null float64
FG                  2618 non-null float64
FGA                 2618 non-null float64
3P                  2618 non-null float64
3PA                 2618 non-null float64
FT                  2618 non-null float64
FTA                 2618 non-null float64
OR                  2618 non-null float64
DR                  2618

In [4]:
nba2017.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2624 entries, 0 to 2623
Data columns (total 51 columns):
DATASET             2624 non-null object
DATE                2624 non-null object
TEAMS               2624 non-null object
VENUE               2624 non-null object
1Q                  2624 non-null int64
2Q                  2624 non-null int64
3Q                  2624 non-null int64
4Q                  2624 non-null int64
OT1                 130 non-null float64
OT2                 14 non-null float64
OT3                 2 non-null float64
OT4                 0 non-null float64
F                   2624 non-null int64
MIN                 2624 non-null float64
FG                  2624 non-null int64
FGA                 2624 non-null int64
3P                  2624 non-null int64
3PA                 2624 non-null int64
FT                  2624 non-null int64
FTA                 2624 non-null int64
OR                  2624 non-null int64
DR                  2624 non-null int64
TOT     

##### File Differences

There are a few differences in the yearly files.  Starting in 2013 some of the column names changed and in 2015, one more column name changed.  In addition to that, in 2013 the column TO TO was added.  The code below addresses those changes by renaming and dropping variables so that the years are equivalent.


In [5]:
diff = {'OPENING SPREAD' : 'spread',
        'OPENING TOTAL'  : 'total',
        'CLOSING ODDS'   : 'closing',
        'Unnamed: 36'    : 'unnamed: 35',
        'Unnamed: 37'    : 'unnamed: 36',
        'Unnamed: 38'    : 'unnamed: 37',
        'Unnamed: 39'    : 'unnamed: 38'
       }

nba2013.rename(columns = diff, inplace = True)
nba2014.rename(columns = diff, inplace = True)
nba2015.rename(columns = diff, inplace = True)
nba2015.rename(columns = {'MAIN REF': 'MAIN REFEREE',
                          'CREW'   : 'CREW REFEREES' },inplace = True)
nba2016.rename(columns = diff, inplace = True)
nba2016.rename(columns = {'MAIN REF': 'MAIN REFEREE',
                          'CREW'   : 'CREW REFEREES'}, inplace = True) 
nba2017.rename(columns = diff, inplace = True)
nba2017.rename(columns = {'MAIN REF': 'MAIN REFEREE',
                          'CREW'   : 'CREW REFEREES'}, inplace = True)                

nba2013.drop(columns = ['TO TO'], inplace =True)
nba2014.drop(columns = ['TO TO'], inplace = True)
nba2015.drop(columns = ['TO TO'], inplace = True)
nba2016.drop(columns = ['TO TO'], inplace = True)
nba2017.drop(columns = ['TO TO'], inplace = True)



### NBA csv File Structure

Currently, the NBA data has two records for each game one with the home team data and one with the away team data.  The unit of analysis for this study is the game so the data needs to be rearranged to use the game as the unit of analysis.  Therefore, each even row must be appended to the odd row above it to make the game the unit of analysis

In [6]:


def merge_home_away(df):
    
    
  
    #make all column names lower snake case
    df.columns = [col.lower().replace(' ', '_') for col in df.columns]
    
    #drop any empty data 
    df.date.dropna(inplace = True)
    #replace missing data with zeros
    missing_list = ['ot1', 'ot2', 'ot3', 'ot4']
    
    
    #some missing moneylines going to set them to zero for now look explore later
    df['moneyline'].fillna(0, inplace = True)
    df['movements'].fillna('none', inplace = True)
    
    
    for ot in missing_list:
        df[ot].fillna(0, inplace = True)
    
    #drop unused columns
    df.drop(columns = ['box_score', 
                       'odds',
                       'venue',
                       'halftime',
                       'opening_odds',
                       'movements',
                       'poss',
                       'closing'], inplace = True)
    
    
    
    #split up the rows
    df_road =  df[df.index %2 == 0]
    df_home =  df[df.index %2 == 1] 
    

    #drop unnessecary columns in df1
    
    df_home.drop(columns = ['main_referee'
                        
                           ], inplace = True)
    
    #drop redundant columns in df2
    df_road.drop(columns = ['dataset',
                            'date',
                            'pts',
                            'total',
                            'moneyline'
                    
                           ], inplace = True)
    
    #rename columns df1 and df2 
    
    home =  {'starting_lineup' : 'home_starter1',
             'unnamed:_35'     : 'home_starter2',
             'unnamed:_36'     : 'home_starter3',
             'unnamed:_37'     : 'home_starter4',
             'unnamed:_38'     : 'home_starter5',
             'f'               : 'home_score'
            }
   
    df_home.rename(columns = home, inplace = True) 
    
    away =  {'teams'     : 'away_team',
             '1q'        : 'away_1q',
             '2q'        : 'away_2q',
             '3q'        : 'away_3q',
             '4q'        : 'away_4q',
             'ot1'       : 'away_ot1',
             'ot2'       : 'away_ot2',
             'ot3'       : 'away_ot3',
             'ot4'       : 'away_ot4',
             'f'         : 'away_score',
             'min'       : 'away_min',
             'fg'        : 'away_fg',
             'fga'       : 'away_fga',
             '3p'        : 'away_3p',
             '3pa'       : 'away_3pa',
             'ft'        : 'away_ft',
             'fta'       : 'away_fta',
             'or'        : 'away_or',
             'dr'        : 'away_dr',
             'tot'       : 'away_total_reb',
             'a'         : 'away_assists',
             'pf'        : 'away_fouls',
             'st'        : 'away_steals',
             'or'        : 'away_or',
             'to'        : 'away_turnovers',
             'bl'        : 'away_blocks',
             'poss'      : 'away_poss',
             'pace'      : 'away_pace',
             'oeff'      : 'away_off_eff',
             'deff'      : 'away_def_eff',
             'rest_days' : 'away_rest',
             'spread'    : 'away_spread',
             'starting_lineup' : 'away_starter1',
             'unnamed:_35'     : 'away_starter2',
             'unnamed:_36'     : 'away_starter3',
             'unnamed:_37'     : 'away_starter4',
             'unnamed:_38'     : 'away_starter5',
             'crew_referees'   : 'ref_3',
             'main_referee'    : 'ref_1'

             }
    df_road.rename(columns = away, inplace = True)
    

    
    #reset the indexs to merge the files
    df_home.reset_index(inplace = True)
    df_road.reset_index(inplace = True)
                         
    #merge data so game becomes rather than team unit of analysis                      
    new =  pd.concat([df_home,df_road], axis = 1)
  
    #establish data as a date/time variable
    new['date'] = pd.to_datetime(new['date'])
    
    #add cover information
    new['line_cv'] = new.home_score - new.away_score + new.spread
    new['away_line_cv'] = new.away_score- new.spread - new.home_score
    new['cover'] = new['line_cv'].map(lambda x: 1 if x >0 else 0)
    new['away_cover'] = new['away_line_cv'].map(lambda x : 1 if x > 0 else 0) 
    
    
    #create total info
    new['total_diff'] = new.home_score + new.away_score - new.total
    new['over'] = new['total_diff'].map(lambda x: 
                                       1 if x< 0 else 0)
    new['under'] = new['total_diff'].map(lambda x:
                                        1 if x > 0 else 0)
    new['total_score'] = new.home_score + new.away_score
    
    #add win and score difference info
    
    
    new['home_win_margin'] = (new['home_score'] - new['away_score'])
    new['win'] = new['home_win_margin'].map(lambda x: 1 if x > 0 else 0)
    new['away_win_margin'] = -new['home_win_margin']
    new['away_win'] = new['win'].map(lambda x: 0 if x == 1 else 1)
    
    
    
    #call second function
    new2 = create_home_mov_ave(new)
    new3 = create_away_mov_ave(new2)
    return new3

### Calulating my dependent variable.  

Line_cv is the amount the game went over or under the spread, and then cover is 1 for a home team spread victory and 0 for an away team spread victory.  This leave us with to possible methods of prediction.  We can use regression to predict the cover amount or we can use categorization to predict which category it fall into.  We also could try and predict individual team scores using regression and use those as a prediction

In [7]:
def create_home_mov_ave(df):
    

    
    #add underscores to teams
    df['teams'] = [str(team).replace(' ', '_') for team in df['teams']]
    
    #loop through teams
    teams = ['Atlanta','Boston','Charlotte','Chicago',
             'Cleveland','Dallas','Denver','Detroit',
             'Golden_State','Houston','Indiana','LA_Clippers',
             'LA_Lakers','Memphis','Miami','Milwaukee',
             'Minnesota','New_Jersey','New_Orleans','New_York',
             'Orlando','Philadelphia','Phoenix','Portland',
             'Sacramento','San_Antonio','Seattle','Toronto',
             'Utah','Washington']
    

    columns = ['ot1', 'home_score', 'min', 'fg', 
               'fga', '3p', '3pa', 'ft','fta', 'or', 'dr', 'tot', 
               'a', 'pf', 'st', 'to', 'pts','home_score',
               'pace', 'oeff', 'deff', 'bl', 'win', 'home_win_margin',
               'away_score', 'line_cv', 'spread', 'total_diff', 
               'over', 'under']
    #create empty data frame to put in results
    home_vars = pd.DataFrame()
    
    for team in teams:
        #subset by team and sort by date
        df_team = df[df['teams'] == team]
        df.sort_values('date', inplace = True)
      
        #create 5 game moving averages
        for column in columns:
            column_new = 'mov_5_' + column
            df_team[column_new] = df_team[column].rolling(5).mean().shift(1)
        
            #last results game moving average
            column_l = 'last_' + column
            df_team[column_l] = df_team[column].shift(1)    
        
        
        #create average home scores and win totals
        home_win_pcts = []
        home_win_pct = 0
        
        #create average home win margin scores
        home_ave_win_margins = []
        home_ave_win_margin = 0
        
        for i in range(len(df_team['win'])):
            
            #cumulative win percentages
            home_win_pct += df_team.iloc[i, 82]
            home_win_pcts.append(home_win_pct/(i+1))
            
            #average home win margin
            home_ave_win_margin += df_team.iloc[i,81 ]
            home_ave_win_margins.append(home_ave_win_margin/(i+1))
       
        #assign the home win percents
        df_team['home_win_pct'] = home_win_pcts    
        df_team['home_win_pct'] = df_team.home_win_pct.shift(1)
        
        #Assign average winning margin
        df_team['home_ave_win_margin'] = home_ave_win_margins
        df_team['home_ave_win_margin'] = df_team.home_ave_win_margin.shift(1)
        
        home_vars = home_vars.append(df_team)
    return home_vars 



In [8]:
def create_away_mov_ave(df):
    

    
    #add underscores to teams
    df['away_team'] = [str(team).replace(' ', '_') for team in df['away_team']]
    
    #loop through teams
    teams = ['Atlanta','Boston','Charlotte','Chicago',
             'Cleveland','Dallas','Denver','Detroit',
             'Golden_State','Houston','Indiana','LA_Clippers',
             'LA_Lakers','Memphis','Miami','Milwaukee',
             'Minnesota','New_Jersey','New_Orleans','New_York',
             'Orlando','Philadelphia','Phoenix','Portland',
             'Sacramento','San_Antonio','Seattle','Toronto',
             'Utah','Washington']
    

    columns = ['away_ot1', 
               'away_score', 'away_min', 'away_fg', 'away_fga', 
               'away_3p', 'away_3pa', 'away_ft','away_fta',
               'away_or', 'away_dr', 'away_total_reb', 
               'away_assists', 'away_fouls', 'away_steals', 
               'away_turnovers','away_blocks', 'away_pace',
               'away_off_eff', 'away_cover', 'away_line_cv',
               'away_def_eff', 'away_win', 'away_win_margin',
               'away_spread']
    #create empty data frame to put in results
    away_vars = pd.DataFrame()
    
    for team in teams:
        
        df_team = df[df['away_team'] == team]
        df.sort_values('date', inplace = True)
        for column in columns:
            
            #five game moving average
            column_new = 'mov_5_' + column
            df_team[column_new] = df_team[column].rolling(5).mean().shift(1)
            
            #last game results
            column_l = 'last' + column
            df_team[column_l] = df_team[column].rolling(3).mean().shift(1)
        
        #create average away win totals
        away_win_pcts = []
        away_win_pct = 0
        
        #create average home win margin scores
        away_ave_win_margins = []
        away_ave_win_margin = 0
        
        for i in range(len(df_team['win'])):
            
            #cumulative win percentages
            away_win_pct += df_team.iloc[i, 84]
            away_win_pcts.append(away_win_pct/(i+1))
            
            #average home win margin
            away_ave_win_margin += df_team.iloc[i,83 ]
            away_ave_win_margins.append(away_ave_win_margin/(i+1))
       
        #assign the home win percents
        df_team['away_win_pct'] = away_win_pcts    
        df_team['away_win_pct'] = df_team.away_win_pct.shift(1)
        
        #Assign average winning margin
        df_team['away_ave_win_margin'] = away_ave_win_margins
        df_team['away_ave_win_margin'] = df_team.away_ave_win_margin.shift(1)
        
        
        away_vars = away_vars.append(df_team)
    
        
    return away_vars



In [9]:
def merge_years(df1,df2):
    return pd.concat([df1,df2], axis = 0)

In [10]:
nba2006_trans = merge_home_away(nba2006)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveat

In [11]:

nba2007_trans = merge_home_away(nba2007)
nba2008_trans = merge_home_away(nba2008)
nba2009_trans = merge_home_away(nba2009)
nba2010_trans = merge_home_away(nba2010)
nba2011_trans = merge_home_away(nba2011)
nba2012_trans = merge_home_away(nba2012)
nba2013_trans = merge_home_away(nba2013)
nba2014_trans = merge_home_away(nba2014)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveat

In [12]:
nba2015_trans = merge_home_away(nba2015)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveat

In [13]:


nba2016_trans = merge_home_away(nba2016)
nba2017_trans = merge_home_away(nba2017)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveat

In [14]:
#combine data set and add a variable for test versus train

nba_combined = merge_years(nba2006_trans,nba2007_trans)
nba_combined = merge_years(nba_combined,nba2008_trans)
nba_combined = merge_years(nba_combined,nba2009_trans)
nba_combined = merge_years(nba_combined,nba2010_trans)
nba_combined = merge_years(nba_combined,nba2011_trans)
nba_combined = merge_years(nba_combined,nba2012_trans)
nba_combined = merge_years(nba_combined,nba2013_trans)
nba_combined = merge_years(nba_combined,nba2014_trans)
nba_combined = merge_years(nba_combined,nba2015_trans)
nba_combined['test'] = 0

#test set
nba_combined_test = merge_years(nba2016_trans, nba2017_trans)
nba_combined_test['test'] = 1
nba_combined = merge_years(nba_combined,nba_combined_test)

In [15]:
nba_combined

Unnamed: 0,index,dataset,date,teams,1q,2q,3q,4q,ot1,ot2,...,lastaway_def_eff,mov_5_away_win,lastaway_win,mov_5_away_win_margin,lastaway_win_margin,mov_5_away_spread,lastaway_spread,away_win_pct,away_ave_win_margin,test
622,1245.0,2006-2007 Regular Season,2007-01-24,Boston,28.0,16.0,18.0,14.0,0.0,0.0,...,,,,,,,,,,0
960,1921.0,2006-2007 Regular Season,2007-03-14,Boston,26.0,24.0,36.0,23.0,0.0,0.0,...,,,,,,,,-26.500000,1.000000,0
590,1181.0,2006-2007 Regular Season,2007-01-20,Charlotte,25.0,23.0,27.0,29.0,0.0,0.0,...,,,,,,,,-9.250000,0.500000,0
1058,2117.0,2006-2007 Regular Season,2007-03-28,Charlotte,25.0,24.0,26.0,26.0,0.0,0.0,...,110.433333,,0.333333,,-11.333333,,4.166667,-5.833333,0.333333,0
615,1231.0,2006-2007 Regular Season,2007-01-23,Chicago,27.0,14.0,29.0,24.0,0.0,0.0,...,117.000000,,0.000000,,-18.000000,,4.500000,-5.125000,0.250000,0
791,1583.0,2006-2007 Regular Season,2007-02-20,Chicago,26.0,22.0,28.0,30.0,0.0,0.0,...,112.633333,0.2,0.000000,-11.2,-13.666667,5.4,6.000000,-5.700000,0.400000,0
51,103.0,2006-2007 Regular Season,2006-11-07,Cleveland,20.0,18.0,33.0,19.0,5.0,0.0,...,107.333333,0.0,0.000000,-17.4,-15.666667,6.1,7.333333,-5.500000,0.333333,0
1187,2375.0,2006-2007 Regular Season,2007-04-14,Cleveland,28.0,28.0,29.0,25.0,0.0,0.0,...,104.533333,0.2,0.333333,-11.4,-8.000000,7.0,8.833333,-3.000000,0.428571,0
844,1689.0,2006-2007 Regular Season,2007-02-26,Dallas,27.0,33.0,27.0,23.0,0.0,0.0,...,112.566667,0.2,0.333333,-14.4,-16.666667,9.0,10.166667,-2.750000,0.375000,0
268,537.0,2006-2007 Regular Season,2006-12-06,Denver,24.0,20.0,31.0,21.0,0.0,0.0,...,117.500000,0.2,0.333333,-16.2,-16.000000,10.8,12.333333,-1.777778,0.333333,0


### Dropping columns that are no longer necessary.  

The NBA data that relates to the particular game that has just been played is not longer necessary so we can drop those columns to make sure that they are not used in the analysis as we do not have access to that data.  Although there are a few current variables that we will have access to i.e. referee data and starter data.  However, all other data can be removed

In [16]:
nba_combined.drop(columns = ['away_1q', 'away_2q', 'away_3q', 'away_4q',
               'away_ot1', 'away_ot2', 'away_ot3', 'away_ot4', 
               'away_min', 'away_fg', 'away_fga', 
               'away_3p', 'away_3pa', 'away_ft','away_fta',
               'away_or', 'away_dr', 'away_total_reb', 
               'away_assists', 'away_fouls', 'away_steals', 
               'away_turnovers','away_blocks', 
               'away_off_eff','away_def_eff', '1q', '2q', '3q', '4q',
               'ot1', 'ot2', 'ot3', 'ot4', 'min', 'fg', 
               'fga', '3p', '3pa', 'ft','fta', 'or', 'dr', 'tot', 
               'a', 'pf', 'st', 'to', 'pts', 'poss',
               'pace', 'oeff', 'deff', 'bl',
               'crew_referee', 'index', 'home_score',
               'starting_lineups', 'home_starter2', 'home_starter3',
               'home_starter4', 'away_score', 'win', 'away_win',
               'away_win_margin', 'home_score_margin', 
               'away_score_margin'              
                            ], inplace = True)

In [17]:
nba_combined.to_csv('./data/nba_combined.csv', index = False)