# Pre-Processing Stage

Below I completed the following pre-processing steps:

*Created Target Variable (H Spread Outcome) - Did the home team win, lose, tie the spread?
*Ran Correlation with Actual Home Score Values to determine the most correlated variables to use
*Filtered the variables down to only the most correlated (above .25 correlation score)
*Double checked the data to determine the amount of outliers for each column of data
*Double checked the game distribution between teams (Home games and Away game matchups) to ensure even distribution
*Dropped columns that may overcomplicate the model (Team name, Year, Date)
*Ensured all final data was available at the start of each game
*Setup Train_Test_Split in prepartion to start modeling. 

In [1]:
import pandas as pd
import numpy as np
from sklearn import tree, metrics
from sklearn.model_selection import train_test_split
import seaborn as sns
import matplotlib.pyplot as plt
from io import StringIO  
from IPython.display import Image  
!pip install pydotplus 
import pydotplus 
from sklearn.pipeline import make_pipeline
from sklearn.impute import SimpleImputer
from sklearn.feature_selection import SelectKBest, f_regression
import datetime



# Dataset Overview / Remove Null Values 

In [2]:
df2 = pd.read_excel(r'/Users/markclampitt/Documents/Springboard/Capstone2/Data\Upcoming_Final_Game_df.xlsx')

In [3]:
df2.shape

(977, 1593)

# Row 0 is the Georgia_Alabama Championship Game and will be used to test model once developed - thus no score entered. 

In [4]:
df2.head()

Unnamed: 0,Date,Year,Home Team,Home Team Ranking_x,Away Team,Away Team Ranking_x,Home Team Spread_x,Away Team Spread_x,Total_x,Home Score_x_x,...,Previous H&A Under Odds_Away_Team,Rolling 5 Past H&A Push Odds_Away_Team,Rolling 5 Past H&A Push Odds Avg_Away_Team,Rolling 5 Past H&A Push Odds Max_Away_Team,Rolling 5 Past H&A Push Odds Min_Away_Team,Rolling 3 Past H&A Push Odds_Away_Team,Rolling 3 Past H&A Push Oddss Avg_Away_Team,Rolling 3 Past H&A Push OddsMax_Away_Team,Rolling 3 Past H&A Push Odds Min_Away_Team,Previous H&A Push Odds_Away_Team
0,2022-01-10,2022,Georgia,3,Alabama,1,-2.5,2.5,52.0,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2021-12-04,2021,Alabama,3,Georgia,1,6.5,-6.5,49.0,41.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2021-11-27,2021,LSU,99,Texas A&M,15,6.5,-6.5,27.0,27.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,2021-11-27,2021,Auburn,99,Alabama,3,19.5,-19.5,55.5,22.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2021-11-27,2021,Vanderbilt,99,Tennessee,99,31.5,-31.5,63.5,21.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [5]:
df2.describe()

Unnamed: 0,Year,Home Team Ranking_x,Away Team Ranking_x,Home Team Spread_x,Away Team Spread_x,Total_x,Home Score_x_x,Away Score_x_x,Rolling 5 Past Away Score,Rolling 5 Past Away Score Avg,...,Previous H&A Under Odds_Away_Team,Rolling 5 Past H&A Push Odds_Away_Team,Rolling 5 Past H&A Push Odds Avg_Away_Team,Rolling 5 Past H&A Push Odds Max_Away_Team,Rolling 5 Past H&A Push Odds Min_Away_Team,Rolling 3 Past H&A Push Odds_Away_Team,Rolling 3 Past H&A Push Oddss Avg_Away_Team,Rolling 3 Past H&A Push OddsMax_Away_Team,Rolling 3 Past H&A Push Odds Min_Away_Team,Previous H&A Push Odds_Away_Team
count,977.0,977.0,977.0,977.0,977.0,977.0,976.0,976.0,907.0,907.0,...,970.0,942.0,942.0,942.0,942.0,956.0,956.0,956.0,956.0,970.0
mean,2012.903787,58.841351,57.990788,-2.82651,2.82651,51.39304,26.869877,24.196721,120.943771,24.188754,...,0.528866,0.055202,0.01104,0.055202,0.0,0.031381,0.01046,0.031381,0.0,0.008247
std,5.187959,44.375891,44.404771,13.250883,13.250883,8.025355,13.421241,13.896771,42.124725,8.424945,...,0.499424,0.228495,0.045699,0.228495,0.0,0.174436,0.058145,0.174436,0.0,0.090487
min,2004.0,1.0,1.0,-41.5,-36.0,27.0,0.0,0.0,28.0,5.6,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,2008.0,11.0,11.0,-12.5,-6.5,45.5,17.0,14.0,92.0,18.4,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,2013.0,99.0,99.0,-3.0,3.0,50.5,26.0,23.0,117.0,23.4,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,2017.0,99.0,99.0,6.5,12.5,56.0,37.0,34.0,147.0,29.4,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,2022.0,99.0,99.0,36.0,41.5,82.5,74.0,72.0,278.0,55.6,...,1.0,1.0,0.2,1.0,0.0,1.0,0.333333,1.0,0.0,1.0


In [6]:
Prediction_Game = df2.loc[0]

In [7]:
Prediction_Game

Date                                           2022-01-10 00:00:00
Year                                                          2022
Home Team                                                  Georgia
Home Team Ranking_x                                              3
Away Team                                                  Alabama
                                                      ...         
Rolling 3 Past H&A Push Odds_Away_Team                         0.0
Rolling 3 Past H&A Push Oddss Avg_Away_Team                    0.0
Rolling 3 Past H&A Push OddsMax_Away_Team                      0.0
Rolling 3 Past H&A Push Odds Min_Away_Team                     0.0
Previous H&A Push Odds_Away_Team                               0.0
Name: 0, Length: 1593, dtype: object

In [8]:
df3= df2[1:]

In [9]:
df3.shape

(976, 1593)

# Created Target Variable

In [10]:
df3['H Spread Outcome'] = df3.apply(lambda x: 'H_Spread_W' if (x['Home Score_x_x'] + x['Home Team Spread_x']) > x['Away Score_x_x'] 
                                  else ('H_Spread_L' if (x['Home Score_x_x'] + x['Home Team Spread_x']) < x['Away Score_x_x']
                                  else 'H_Spread_Push' if x['Home Score_x_x'] + x['Home Team Spread_x'] == x['Away Score_x_x']
                                  else ""
                                       
                                       
                                       
                                       ),axis=1) 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df3['H Spread Outcome'] = df3.apply(lambda x: 'H_Spread_W' if (x['Home Score_x_x'] + x['Home Team Spread_x']) > x['Away Score_x_x']


In [11]:
df3['H Spread Outcome'].value_counts()

H_Spread_L       484
H_Spread_W       464
H_Spread_Push     28
Name: H Spread Outcome, dtype: int64

In [12]:
484/976 #Home Spread Loss %

0.4959016393442623

In [13]:
464/976 #Home Spread Win %

0.47540983606557374

In [14]:
28/976 #Home Spread Push(tie) %

0.028688524590163935

# Correlation Tests

In [None]:
# As the home and away team scores of the game determine whether or not the home team wins or losses the spread , I ran a correlation test to determine which variables are most correlated with the scores - which will be used in the final model. 



In [20]:
df3.dtypes

Date                                           datetime64[ns]
Year                                                    int64
Home Team                                              object
Home Team Ranking_x                                     int64
Away Team                                              object
                                                    ...      
Rolling 3 Past H&A Push Oddss Avg_Away_Team           float64
Rolling 3 Past H&A Push OddsMax_Away_Team             float64
Rolling 3 Past H&A Push Odds Min_Away_Team            float64
Previous H&A Push Odds_Away_Team                      float64
H Spread Outcome                                       object
Length: 1594, dtype: object

In [103]:
columns_list=df3.columns


In [104]:
Prediction_Variable = df3['Home Score_x_x']
Home_High_Corr_Values = []

In [105]:
for i in columns_list:
     if df3[i].dtypes == 'float64':
        corr = df3[i].corr(Prediction_Variable) 
        print(i,":",corr)
        if corr > .3:
            Home_High_Corr_Values.append(i)
        if corr < -.3:
            Home_High_Corr_Values.append(i)

Home Team Spread_x : -0.4705450569905662
Away Team Spread_x : 0.4705450569905662
Total_x : 0.3223280984872463
Home Score_x_x : 1.0
Away Score_x_x : -0.06515300572081069
Rolling 5 Past Away Score : -0.10797820700921772
Rolling 5 Past Away Score Avg : -0.10797820700921767
Rolling 5 Past Away Score Max : -0.05625579841822005
Rolling 5 Past Away Score Min : -0.08960629256777507
Rolling 3 Past Away Score : -0.09416260693057879
Rolling 3 Past Away Score Avg : -0.09416260693057875
Rolling 3 Past Away Score Max : -0.055792149105279285
Rolling 3 Past Away Score Min : -0.07453356827482498
Past Away Score : -0.07097380100304561
Rolling 5 Past Away HI Pass yrds : 0.06730476047881796
Rolling 5 Past Away HI Pass yrds Avg : 0.06730476047881793
Rolling 5 Past Away HI Pass yrds Max : 0.10315211853874473
Rolling 5 Past Away HI Pass yrds Min : 0.024423690098601042
Rolling 3 Past Away HI Pass yrds : 0.08224198660280532
Rolling 3 Past Away HI Pass yrds Avg : 0.08224198660280536
Rolling 3 Past Away HI Pass 

Rolling 3 Past Away Completion % Avg : -0.04445362076322818
Rolling 3 Past Away Completion % Max : -0.017094088043737538
Rolling 3 Past Away Completion % Min : -0.06045781959031684
Previous Away Completion % : -0.06014809598286207
Rolling 5 Past Away Total Yards : -0.014917542566106218
Rolling 5 Past Away Total Yards Avg : -0.014917542566106229
Rolling 5 Past Away Total Yards Max : 0.007690329752011826
Rolling 5 Past Away Total Yards Min : -0.010320380930277663
Rolling 3 Past Away Total Yards : -0.005764940624395611
Rolling 3 Past Away Total Yards Avg : -0.0057649406243956034
Rolling 3 Past Away Total Yards Max : 0.022761903218058172
Rolling 3 Past Away Total Yards Min : -0.011763013976520043
Previous Away Total Yards : -0.040654288784787176
Rolling 5 Past Away Passing : 0.0629729280095878
Rolling 5 Past Away Passing Avg : 0.06297292800958777
Rolling 5 Past Away Passing Max : 0.10324847525652665
Rolling 5 Past Away Passing Min : 0.028036381389861174
Rolling 3 Past Away Passing : 0.0710

Rolling 3 Past Home Score : 0.2754325932566715
Rolling 3 Past Home Score Avg : 0.2754325932566716
Rolling 3 Past Home Score Max : 0.2596890561040711
Rolling 3 Past Home Score Min : 0.2291674336248376
Past Home Score : 0.18236083632649344
Rolling 5 Past Home HI Pass yrds : 0.2502214058997243
Rolling 5 Past Home HI Pass yrds Avg : 0.2502214058997243
Rolling 5 Past Home HI Pass yrds Max : 0.19080658485149682
Rolling 5 Past Home HI Pass yrds Min : 0.25199260402285456
Rolling 3 Past Home HI Pass yrds : 0.24161206256719378
Rolling 3 Past Home HI Pass yrds Avg : 0.24161206256719375
Rolling 3 Past Home HI Pass yrds Max : 0.2155177832918897
Rolling 3 Past Home HI Pass yrds Min : 0.2413097299623424
Previous Home HI Pass yrds : 0.1850363680509152
Rolling 5 Past Home HI Rush yrds : 0.0857587368980979
Rolling 5 Past Home HI Rush yrds Avg : 0.0857587368980979
Rolling 5 Past Home HI Rush yrds Max : 0.044458959330495415
Rolling 5 Past Home HI Rush yrds Min : 0.10533718344968157
Rolling 3 Past Home HI 

Rolling 5 Past Home Penalties Max : 0.0490933442860279
Rolling 5 Past Home Penalties Min : 0.025963827730652426
Rolling 3 Past Home Penalties : 0.09838428789752038
Rolling 3 Past Home Penalties Avg : 0.09838428789752038
Rolling 3 Past Home Penalties Max : 0.09661840676289296
Rolling 3 Past Home Penalties Min : 0.04195834998540956
Previous Home Penalties : 0.07034654984213966
Rolling 5 Past Home Penalty Yards : 0.08542385900466357
Rolling 5 Past Home Penalty Yards Avg : 0.08542385900466361
Rolling 5 Past Home Penalty Yards Max : 0.06376268787105648
Rolling 5 Past Home Penalty Yards Min : 0.03665720906940843
Rolling 3 Past Home Penalty Yards : 0.0917555099511354
Rolling 3 Past Home Penalty Yards Avg : 0.09175550995113547
Rolling 3 Past Home Penalty Yards Max : 0.09336708111129632
Rolling 3 Past Home Penalty Yards Min : 0.03874205604403175
Previous Home Penalty Yards : 0.08127610045001993
Rolling 5 Past Home Yards per Penalty : 0.07430286444884789
Rolling 5 Past Home Yards per Penalty Avg

Rolling 3 Past H&A Season T Wins Min_Home_Team : 0.15443966260184022
Previous H&A Season T Wins_Home_Team : 0.1864560154064544
Rolling 5 Past H&A Season T Losses_Home_Team : -0.2931468646265042
Rolling 5 Past H&A Season T Losses Avg_Home_Team : -0.29314686462650413
Rolling 5 Past H&A Season T Losses Max_Home_Team : -0.2928694431188027
Rolling 5 Past H&A Season T Losses Min_Home_Team : -0.21380139791211697
Rolling 3 Past H&A Season T Losses_Home_Team : -0.275251559944329
Rolling 3 Past H&A Season T Losses Avg_Home_Team : -0.2752515599443289
Rolling 3 Past H&A Season T Losses Max_Home_Team : -0.2806529620841787
Rolling 3 Past H&A Season T Losses Min_Home_Team : -0.21117812759378352
Previous H&A Season Losses_Home_Team : -0.23780425682228204
Rolling 5 Past H&A Conf T Wins_Home_Team : 0.2760064907130442
Rolling 5 Past H&A Conf T Wins Avg_Home_Team : 0.27600649071304423
Rolling 5 Past H&A Conf T Wins Max_Home_Team : 0.28463220580600956
Rolling 5 Past H&A Conf T Wins Min_Home_Team : 0.178867

Rolling 3 Past H&A Total Yards_Home_Team : 0.29502939939670375
Rolling 3 Past H&A Total Yards Avg_Home_Team : 0.29502939939670364
Rolling 3 Past H&A Total Yards Max_Home_Team : 0.2756536800272772
Rolling 3 Past H&A Total Yards Min_Home_Team : 0.22515238091574324
Previous H&A Total Yards_Home_Team : 0.23210706418980911
Rolling 5 Past H&A Passing_Home_Team : 0.23534253405133762
Rolling 5 Past H&A Passing Avg_Home_Team : 0.2353425340513377
Rolling 5 Past H&A Passing Max_Home_Team : 0.19596340084807184
Rolling 5 Past H&A Passing Min_Home_Team : 0.20238295854112376
Rolling 3 Past H&A Passing_Home_Team : 0.21824153581752045
Rolling 3 Past H&A Passing Avg_Home_Team : 0.2182415358175204
Rolling 3 Past H&A Passing Max_Home_Team : 0.2171459102736148
Rolling 3 Past H&A Passing Min_Home_Team : 0.17612737406843387
Previous H&A Passing_Home_Team : 0.16760554251384166
Rolling 5 Past H&A Yards per Pass_Home_Team : 0.3192216806624247
Rolling 5 Past H&A Yards per Pass Avg_Home_Team : 0.3192216806624249


Rolling 3 Past H&A Q2 Score Max_Home_Team : 0.13513058240508696
Rolling 3 Past H&A Q2 Score Min_Home_Team : 0.16986989344639805
Previous H&A Q2 Score_Home_Team : 0.11245390428729284
Rolling 5 Past H&A Q3 Score_Home_Team : 0.22989124883508308
Rolling 5 Past H&A Q3 Score Avg_Home_Team : 0.22989124883508316
Rolling 5 Past H&A Q3 Score Max_Home_Team : 0.149227971250734
Rolling 5 Past H&A Q3 Score Min_Home_Team : 0.20415285750913179
Rolling 3 Past H&A Q3 Score_Home_Team : 0.21262081101511743
Rolling 3 Past H&A Q3 Score Avg_Home_Team : 0.21262081101511762
Rolling 3 Past H&A Q3 Score Max_Home_Team : 0.1841999556196819
Rolling 3 Past H&A Q3 Score Min_Home_Team : 0.18728788356461562
Previous H&A Q3 Score_Home_Team : 0.16293244011080488
Rolling 5 Past H&A Q4 Score_Home_Team : 0.205557750626022
Rolling 5 Past H&A Q4 Score Avg_Home_Team : 0.20555775062602202
Rolling 5 Past H&A Q4 Score Max_Home_Team : 0.12699492935565407
Rolling 5 Past H&A Q4 Score Min_Home_Team : 0.13560890641808107
Rolling 3 Pas

Previous H&A 3rd Down Conversions %_Away_Team : -0.03837536330951131
Rolling 5 Past H&A 4th Down Conversions_Away_Team : 0.07186938243488525
Rolling 5 Past H&A 4th Down Conversions Avg_Away_Team : 0.07186938243488532
Rolling 5 Past H&A 4th Down Conversions Max_Away_Team : 0.07564792854333226
Rolling 5 Past H&A 4th Down Conversions Min_Away_Team : -0.0047020488727984215
Rolling 3 Past H&A 4th Down Conversions_Away_Team : 0.03956564676134561
Rolling 3 Past H&A 4th Down Conversions Avg_Away_Team : 0.03956564676134556
Rolling 3 Past H&A 4th Down Conversions Max_Away_Team : 0.03571629480189984
Rolling 3 Past H&A 4th Down Conversions Min_Away_Team : 0.022422466430784473
Previous H&A 4th Down Conversions_Away_Team : 0.034752242847538806
Rolling 5 Past H&A 4th Down Conversion Attempts_Away_Team : 0.10893773172905366
Rolling 5 Past H&A 4th Down Conversion Attempts Avg_Away_Team : 0.10893773172905374
Rolling 5 Past H&A 4th Down Conversion Attempts Max_Away_Team : 0.12250789161906929
Rolling 5 Pa

Rolling 3 Past H&A Q4 Score Max_Away_Team : -0.027277590226999333
Rolling 3 Past H&A Q4 Score Min_Away_Team : -0.027632097700413177
Previous H&A Q4 Score_Away_Team : -0.024368200974888277
Rolling 5 Past H&A Total Odds_Away_Team : 0.1944361883606458
Rolling 5 Past H&A Total Odds Avg_Away_Team : 0.19443618836064577
Rolling 5 Past H&A Total Odds Max_Away_Team : 0.18747025275097715
Rolling 5 Past H&A Total Odds Min_Away_Team : 0.15508777806920646
Rolling 3 Past H&A Total Odds_Away_Team : 0.1858597629907846
Rolling 3 Past H&A Total Odds Avg_Away_Team : 0.1858597629907847
Rolling 3 Past H&A Total Odds Max_Away_Team : 0.18367473227163397
Rolling 3 Past H&A Total Odds Min_Away_Team : 0.1570445013147453
Previous H&A Total Odds_Away_Team : 0.16051812399402418
Rolling 5 Past H&A Spread_Away_Team : 0.25460273247626114
Rolling 5 Past H&A Spread Avg_Away_Team : 0.25460273247626136
Rolling 5 Past H&A Spread Max_Away_Team : 0.21951698409471437
Rolling 5 Past H&A Spread Min_Away_Team : 0.23948963017353

In [106]:
Home_High_Corr_Values

['Home Team Spread_x',
 'Away Team Spread_x',
 'Total_x',
 'Home Score_x_x',
 'Rolling 5 Past H&A Score_Home_Team',
 'Rolling 5 Past H&A Score Avg_Home_Team',
 'Rolling 5 Past H&A 1st Downs_Home_Team',
 'Rolling 5 Past H&A 1st Downs Avg_Home_Team',
 'Rolling 5 Past H&A Total Yards_Home_Team',
 'Rolling 5 Past H&A Total Yards Avg_Home_Team',
 'Rolling 5 Past H&A Yards per Pass_Home_Team',
 'Rolling 5 Past H&A Yards per Pass Avg_Home_Team',
 'Rolling 5 Past H&A Spread_Home_Team',
 'Rolling 5 Past H&A Spread Avg_Home_Team',
 'Rolling 5 Past H&A Spread Min_Home_Team']

In [107]:
Home_High_Corr_Values.remove('Rolling 5 Past H&A Score_Home_Team')
Home_High_Corr_Values.remove('Rolling 5 Past H&A 1st Downs_Home_Team')
Home_High_Corr_Values.remove('Rolling 5 Past H&A Total Yards_Home_Team')
Home_High_Corr_Values.remove('Rolling 5 Past H&A Yards per Pass_Home_Team')
Home_High_Corr_Values.remove('Rolling 5 Past H&A Spread_Home_Team')
Home_High_Corr_Values.append('Rolling 5 Past H&A HI REC yrds Avg_Home_Team')
Home_High_Corr_Values.append('Rolling 5 Past H&A Conf T Wins_Home_Team')
Home_High_Corr_Values.append('Rolling 5 Past H&A 3rd Down Conversion % Avg_Home_Team')
Home_High_Corr_Values.append('Rolling 5 Past H&A Completion % Avg_Home_Team')
Home_High_Corr_Values.append('Rolling 5 Past H&A Q1 Score Avg_Home_Team')
Home_High_Corr_Values.append('Rolling 5 Past H&A Ranking Avg_Home_Team')
Home_High_Corr_Values.append('Rolling 5 Past H&A Season T Losses_Away_Team')

In [108]:
Home_High_Corr_Values

['Home Team Spread_x',
 'Away Team Spread_x',
 'Total_x',
 'Home Score_x_x',
 'Rolling 5 Past H&A Score Avg_Home_Team',
 'Rolling 5 Past H&A 1st Downs Avg_Home_Team',
 'Rolling 5 Past H&A Total Yards Avg_Home_Team',
 'Rolling 5 Past H&A Yards per Pass Avg_Home_Team',
 'Rolling 5 Past H&A Spread Avg_Home_Team',
 'Rolling 5 Past H&A Spread Min_Home_Team',
 'Rolling 5 Past H&A HI REC yrds Avg_Home_Team',
 'Rolling 5 Past H&A Conf T Wins_Home_Team',
 'Rolling 5 Past H&A 3rd Down Conversion % Avg_Home_Team',
 'Rolling 5 Past H&A Completion % Avg_Home_Team',
 'Rolling 5 Past H&A Q1 Score Avg_Home_Team',
 'Rolling 5 Past H&A Ranking Avg_Home_Team',
 'Rolling 5 Past H&A Season T Losses_Away_Team']

In [109]:
Home_High_Corr_Values.append('Away Score_x_x')
Home_High_Corr_Values.append('Rolling 5 Past H&A Score Avg_Away_Team')
Home_High_Corr_Values.append('Rolling 5 Past H&A 1st Downs Avg_Away_Team')
Home_High_Corr_Values.append('Rolling 5 Past H&A Total Yards Avg_Away_Team')
Home_High_Corr_Values.append('Rolling 5 Past H&A Yards per Pass Avg_Away_Team')
Home_High_Corr_Values.append('Rolling 5 Past H&A Spread Avg_Away_Team')
Home_High_Corr_Values.append('Rolling 5 Past H&A Spread Min_Away_Team')

Home_High_Corr_Values.append('Rolling 5 Past H&A HI REC yrds Avg_Away_Team')
Home_High_Corr_Values.append('Rolling 5 Past H&A Conf T Wins_Away_Team')
Home_High_Corr_Values.append('Rolling 5 Past H&A 3rd Down Conversion % Avg_Away_Team')
Home_High_Corr_Values.append('Rolling 5 Past H&A Completion % Avg_Away_Team')
Home_High_Corr_Values.append('Rolling 5 Past H&A Q1 Score Avg_Away_Team')
Home_High_Corr_Values.append('Rolling 5 Past H&A Ranking Avg_Away_Team')
Home_High_Corr_Values.append('Rolling 5 Past H&A Season T Losses_Home_Team')

Home_High_Corr_Values.append('Home Team Ranking_x')
Home_High_Corr_Values.append('Away Team Ranking_x')
Home_High_Corr_Values.append('Previous H&A Ranking_Home_Team')
Home_High_Corr_Values.append('Previous H&A Ranking_Away_Team')
Home_High_Corr_Values.append('H Spread Outcome')
Home_High_Corr_Values.append('Date')
Home_High_Corr_Values.append('Home Team')
Home_High_Corr_Values.append('Away Team')
Home_High_Corr_Values.append('Year')

In [110]:
df4 = df3[Home_High_Corr_Values]

In [111]:
df4.head()

Unnamed: 0,Home Team Spread_x,Away Team Spread_x,Total_x,Home Score_x_x,Rolling 5 Past H&A Score Avg_Home_Team,Rolling 5 Past H&A 1st Downs Avg_Home_Team,Rolling 5 Past H&A Total Yards Avg_Home_Team,Rolling 5 Past H&A Yards per Pass Avg_Home_Team,Rolling 5 Past H&A Spread Avg_Home_Team,Rolling 5 Past H&A Spread Min_Home_Team,...,Rolling 5 Past H&A Season T Losses_Home_Team,Home Team Ranking_x,Away Team Ranking_x,Previous H&A Ranking_Home_Team,Previous H&A Ranking_Away_Team,H Spread Outcome,Date,Home Team,Away Team,Year
1,6.5,-6.5,49.0,41.0,37.4,23.6,496.8,9.86,-22.3,-29.0,...,5.0,3,1,3.0,1.0,H_Spread_W,2021-12-04,Alabama,Georgia,2021
2,6.5,-6.5,27.0,27.0,22.8,20.0,358.2,6.18,11.1,2.0,...,21.0,99,15,99.0,11.0,H_Spread_W,2021-11-27,LSU,Texas A&M,2021
3,19.5,-19.5,55.5,22.0,24.6,22.2,399.6,7.7,-1.3,-7.0,...,16.0,99,3,99.0,1.0,H_Spread_W,2021-11-27,Auburn,Alabama,2021
4,31.5,-31.5,63.5,21.0,17.6,16.2,313.8,6.28,22.5,16.0,...,35.0,99,99,99.0,99.0,H_Spread_W,2021-11-27,Vanderbilt,Tennessee,2021
5,-14.5,14.5,63.0,34.0,31.2,25.0,456.0,8.08,2.8,-4.5,...,15.0,25,99,21.0,99.0,H_Spread_W,2021-11-26,Arkansas,Missouri,2021


In [112]:
df4.columns

Index(['Home Team Spread_x', 'Away Team Spread_x', 'Total_x', 'Home Score_x_x',
       'Rolling 5 Past H&A Score Avg_Home_Team',
       'Rolling 5 Past H&A 1st Downs Avg_Home_Team',
       'Rolling 5 Past H&A Total Yards Avg_Home_Team',
       'Rolling 5 Past H&A Yards per Pass Avg_Home_Team',
       'Rolling 5 Past H&A Spread Avg_Home_Team',
       'Rolling 5 Past H&A Spread Min_Home_Team',
       'Rolling 5 Past H&A HI REC yrds Avg_Home_Team',
       'Rolling 5 Past H&A Conf T Wins_Home_Team',
       'Rolling 5 Past H&A 3rd Down Conversion % Avg_Home_Team',
       'Rolling 5 Past H&A Completion % Avg_Home_Team',
       'Rolling 5 Past H&A Q1 Score Avg_Home_Team',
       'Rolling 5 Past H&A Ranking Avg_Home_Team',
       'Rolling 5 Past H&A Season T Losses_Away_Team', 'Away Score_x_x',
       'Rolling 5 Past H&A Score Avg_Away_Team',
       'Rolling 5 Past H&A 1st Downs Avg_Away_Team',
       'Rolling 5 Past H&A Total Yards Avg_Away_Team',
       'Rolling 5 Past H&A Yards per Pass Avg

In [113]:
df4.shape

(976, 40)

In [114]:
df4.isnull().sum()

Home Team Spread_x                                         0
Away Team Spread_x                                         0
Total_x                                                    0
Home Score_x_x                                             0
Rolling 5 Past H&A Score Avg_Home_Team                    35
Rolling 5 Past H&A 1st Downs Avg_Home_Team                41
Rolling 5 Past H&A Total Yards Avg_Home_Team              41
Rolling 5 Past H&A Yards per Pass Avg_Home_Team           41
Rolling 5 Past H&A Spread Avg_Home_Team                   35
Rolling 5 Past H&A Spread Min_Home_Team                   35
Rolling 5 Past H&A HI REC yrds Avg_Home_Team              35
Rolling 5 Past H&A Conf T Wins_Home_Team                  35
Rolling 5 Past H&A 3rd Down Conversion % Avg_Home_Team    41
Rolling 5 Past H&A Completion % Avg_Home_Team             41
Rolling 5 Past H&A Q1 Score Avg_Home_Team                 35
Rolling 5 Past H&A Ranking Avg_Home_Team                  35
Rolling 5 Past H&A Seaso

In [8]:
#Removed all rows with null values 

In [115]:
df4.dropna(inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df4.dropna(inplace=True)


In [116]:
df4.shape

(928, 40)

In [117]:
#Removing rows with null values took the dataset from 976 rows to 928. 

In [119]:
df4.isnull().sum().max()

0

In [120]:
df4['H Spread Outcome'].value_counts()

H_Spread_L       459
H_Spread_W       442
H_Spread_Push     27
Name: H Spread Outcome, dtype: int64

In [125]:
print('Home Spread Loss %: ',459/928 , '\n' 'Home Spread Win %: ', 442/928 , '\n' 'Home Spread Push(tie) %: ', 27/928)

Home Spread Loss %:  0.49461206896551724 
Home Spread Win %:  0.47629310344827586 
Home Spread Push(tie) %:  0.029094827586206896


# Checked data to identify how many rows of data were not outliers to ensure quality 

In [126]:
Describe_df = pd.DataFrame(df4.describe().T)

In [127]:
Describe_df['# Rows <> upper/lower'] = 0
Describe_df['Upper'] = 0
Describe_df['Lower'] = 0

In [128]:

for i in df4.columns:
    if df4[i].dtypes == 'float64':
        mean = df4[i].mean()
        std =df4[i].std()
        lower = mean + (-3 *std)
        upper = mean + (3*std)
        count = df4[(df3[i] > lower) & (df4[i] < upper)]['Home Team'].count()
        #print('Feature:',i, "\n"' Rows: ', count,' Mean: ', mean,"\n" ' Std: ', std, ' Upper: ', upper,' Lower: ', lower, )
        Describe_df['# Rows <> upper/lower'].loc[i] = count
        Describe_df['Upper'].loc[i] = upper
        Describe_df['Lower'].loc[i] = lower
        

  count = df4[(df3[i] > lower) & (df4[i] < upper)]['Home Team'].count()
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Describe_df['# Rows <> upper/lower'].loc[i] = count
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Describe_df['Upper'].loc[i] = upper
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Describe_df['Lower'].loc[i] = lower
  count = df4[(df3[i] > lower) & (df4[i] < upper)]['Home Team'].count()
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: htt

  count = df4[(df3[i] > lower) & (df4[i] < upper)]['Home Team'].count()
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Describe_df['# Rows <> upper/lower'].loc[i] = count
  count = df4[(df3[i] > lower) & (df4[i] < upper)]['Home Team'].count()
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Describe_df['# Rows <> upper/lower'].loc[i] = count
  count = df4[(df3[i] > lower) & (df4[i] < upper)]['Home Team'].count()
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Describe_df['# Rows <> upper/lower'].loc[i] = count
  co

In [129]:
#As shown below, we can see that for Home Team Spread_x that there are 928/928 rows within the upper and lower limits (mean +3 std / mean -3*std)

#Overall there are few outliers, which will positively impact the model's performance. 

#(The 0's are due to being non-float-64 inputs, thus ignored in calculation).

In [130]:
Describe_df

Unnamed: 0,count,mean,std,min,25%,50%,75%,max,# Rows <> upper/lower,Upper,Lower
Home Team Spread_x,928.0,-2.789332,13.285394,-41.5,-12.5,-3.0,6.5,36.0,928,37.066849,-42.645513
Away Team Spread_x,928.0,2.789332,13.285394,-36.0,-6.5,3.0,12.5,41.5,928,42.645513,-37.066849
Total_x,928.0,51.568966,8.111918,27.0,45.5,50.5,56.5,82.5,922,75.904721,27.23321
Home Score_x_x,928.0,26.99569,13.461759,0.0,17.0,27.0,37.0,74.0,927,67.380968,-13.389589
Rolling 5 Past H&A Score Avg_Home_Team,928.0,25.568534,8.52188,6.6,19.8,24.8,30.4,52.2,926,51.134175,0.002894
Rolling 5 Past H&A 1st Downs Avg_Home_Team,928.0,19.651078,3.543245,9.4,17.4,19.4,22.0,31.0,925,30.280812,9.021343
Rolling 5 Past H&A Total Yards Avg_Home_Team,928.0,372.633621,75.66851,191.4,320.15,372.0,422.05,597.4,928,599.639151,145.628091
Rolling 5 Past H&A Yards per Pass Avg_Home_Team,928.0,7.223793,1.478296,3.6,6.24,7.14,8.16,12.86,924,11.65868,2.788906
Rolling 5 Past H&A Spread Avg_Home_Team,928.0,0.259052,9.646568,-29.5,-6.2,0.45,7.1,26.1,926,29.198755,-28.680652
Rolling 5 Past H&A Spread Min_Home_Team,928.0,-11.655711,10.701513,-41.5,-18.0,-12.0,-3.5,16.5,928,20.448828,-43.760251


# Double checked the game distribution between teams below to ensure the matchups were evenly distributed

In [217]:
Team_Matchups = pd.DataFrame(columns=df4['Home Team'].unique(),index=df4['Home Team'].unique())

In [218]:
Team_Matchups

Unnamed: 0,Alabama,LSU,Auburn,Vanderbilt,Arkansas,Mississippi State,South Carolina,Missouri,Ole Miss,Tennessee,Kentucky,Georgia,Texas A&M,Florida
Alabama,,,,,,,,,,,,,,
LSU,,,,,,,,,,,,,,
Auburn,,,,,,,,,,,,,,
Vanderbilt,,,,,,,,,,,,,,
Arkansas,,,,,,,,,,,,,,
Mississippi State,,,,,,,,,,,,,,
South Carolina,,,,,,,,,,,,,,
Missouri,,,,,,,,,,,,,,
Ole Miss,,,,,,,,,,,,,,
Tennessee,,,,,,,,,,,,,,


In [219]:
Teams=df4['Home Team'].unique()

In [220]:
Teams

array(['Alabama', 'LSU', 'Auburn', 'Vanderbilt', 'Arkansas',
       'Mississippi State', 'South Carolina', 'Missouri', 'Ole Miss',
       'Tennessee', 'Kentucky', 'Georgia', 'Texas A&M', 'Florida'],
      dtype=object)

In [221]:
Teams2=Teams

In [222]:
Row = 0
Column= 0
Start = 0
Stop = 14

In [223]:
while Column < Stop:
        a  = df4[(df4['Home Team'] == Teams[Row]) & (df4['Away Team'] == Teams[Column])]
        Team_Matchups.iloc[Row,Column] = a.shape[0]
        Column = Column + 1

In [224]:
Team_Matchups

Unnamed: 0,Alabama,LSU,Auburn,Vanderbilt,Arkansas,Mississippi State,South Carolina,Missouri,Ole Miss,Tennessee,Kentucky,Georgia,Texas A&M,Florida
Alabama,0.0,9.0,9.0,2.0,9.0,9.0,1.0,1.0,9.0,8.0,3.0,3.0,5.0,5.0
LSU,,,,,,,,,,,,,,
Auburn,,,,,,,,,,,,,,
Vanderbilt,,,,,,,,,,,,,,
Arkansas,,,,,,,,,,,,,,
Mississippi State,,,,,,,,,,,,,,
South Carolina,,,,,,,,,,,,,,
Missouri,,,,,,,,,,,,,,
Ole Miss,,,,,,,,,,,,,,
Tennessee,,,,,,,,,,,,,,


In [225]:
Row =1
Column =0
while Column < Stop:
        a  = df4[(df4['Home Team'] == Teams[Row]) & (df4['Away Team'] == Teams[Column])]
        Team_Matchups.iloc[Row,Column] = a.shape[0]
        Column = Column + 1

In [226]:
Row =2
Column =0
while Column < Stop:
        a  = df4[(df4['Home Team'] == Teams[Row]) & (df4['Away Team'] == Teams[Column])]
        Team_Matchups.iloc[Row,Column] = a.shape[0]
        Column = Column + 1

In [227]:
Row =3
Column =0
while Column < Stop:
        a  = df4[(df4['Home Team'] == Teams[Row]) & (df4['Away Team'] == Teams[Column])]
        Team_Matchups.iloc[Row,Column] = a.shape[0]
        Column = Column + 1

In [228]:
Row =4
Column =0
while Column < Stop:
        a  = df4[(df4['Home Team'] == Teams[Row]) & (df4['Away Team'] == Teams[Column])]
        Team_Matchups.iloc[Row,Column] = a.shape[0]
        Column = Column + 1

In [229]:
Row =5
Column =0
while Column < Stop:
        a  = df4[(df4['Home Team'] == Teams[Row]) & (df4['Away Team'] == Teams[Column])]
        Team_Matchups.iloc[Row,Column] = a.shape[0]
        Column = Column + 1

In [230]:
Row =6
Column =0
while Column < Stop:
        a  = df4[(df4['Home Team'] == Teams[Row]) & (df4['Away Team'] == Teams[Column])]
        Team_Matchups.iloc[Row,Column] = a.shape[0]
        Column = Column + 1

In [231]:
Row =7
Column =0
while Column < Stop:
        a  = df4[(df4['Home Team'] == Teams[Row]) & (df4['Away Team'] == Teams[Column])]
        Team_Matchups.iloc[Row,Column] = a.shape[0]
        Column = Column + 1

In [232]:
Row =8
Column =0
while Column < Stop:
        a  = df4[(df4['Home Team'] == Teams[Row]) & (df4['Away Team'] == Teams[Column])]
        Team_Matchups.iloc[Row,Column] = a.shape[0]
        Column = Column + 1

In [233]:
Row =9
Column =0
while Column < Stop:
        a  = df4[(df4['Home Team'] == Teams[Row]) & (df4['Away Team'] == Teams[Column])]
        Team_Matchups.iloc[Row,Column] = a.shape[0]
        Column = Column + 1

In [234]:
Row =10
Column =0
while Column < Stop:
        a  = df4[(df4['Home Team'] == Teams[Row]) & (df4['Away Team'] == Teams[Column])]
        Team_Matchups.iloc[Row,Column] = a.shape[0]
        Column = Column + 1

In [235]:
Row =11
Column =0
while Column < Stop:
        a  = df4[(df4['Home Team'] == Teams[Row]) & (df4['Away Team'] == Teams[Column])]
        Team_Matchups.iloc[Row,Column] = a.shape[0]
        Column = Column + 1

In [236]:
Row =12
Column =0
while Column < Stop:
        a  = df4[(df4['Home Team'] == Teams[Row]) & (df4['Away Team'] == Teams[Column])]
        Team_Matchups.iloc[Row,Column] = a.shape[0]
        Column = Column + 1

In [237]:
Row =13
Column =0
while Column < Stop:
        a  = df4[(df4['Home Team'] == Teams[Row]) & (df4['Away Team'] == Teams[Column])]
        Team_Matchups.iloc[Row,Column] = a.shape[0]
        Column = Column + 1

# Team_Matchups shows the number of Home & Away games btwn each team. Rows  = Home Team , Column = Away Team

In [238]:
#We can see that there are more matchups between some teams, although that is due to how the conference is divided as there are two sub-divisions within the SEC

#e.g. There were 8 machups between Auburn & LSU where Auburn was the Home Team, and 9 where LSU was the home team. 

In [239]:
Team_Matchups

Unnamed: 0,Alabama,LSU,Auburn,Vanderbilt,Arkansas,Mississippi State,South Carolina,Missouri,Ole Miss,Tennessee,Kentucky,Georgia,Texas A&M,Florida
Alabama,0,9,9,2,9,9,1,1,9,8,3,3,5,5
LSU,9,0,9,1,9,8,3,1,9,3,3,4,5,9
Auburn,9,8,0,2,8,9,3,1,9,4,2,11,5,2
Vanderbilt,2,3,2,0,2,2,8,5,8,9,9,8,1,9
Arkansas,8,9,9,2,0,9,5,4,9,3,2,3,5,2
Mississippi State,9,9,8,3,9,0,3,1,9,2,8,2,5,2
South Carolina,3,1,5,9,5,2,0,5,1,9,8,8,4,9
Missouri,2,1,1,5,4,1,5,0,1,5,5,5,2,5
Ole Miss,8,9,8,8,9,9,3,1,0,2,1,3,6,2
Tennessee,8,3,2,8,2,2,9,5,3,0,8,9,1,8


In [240]:
df4.sample(10)

Unnamed: 0,Home Team Spread_x,Away Team Spread_x,Total_x,Home Score_x_x,Rolling 5 Past H&A Score Avg_Home_Team,Rolling 5 Past H&A 1st Downs Avg_Home_Team,Rolling 5 Past H&A Total Yards Avg_Home_Team,Rolling 5 Past H&A Yards per Pass Avg_Home_Team,Rolling 5 Past H&A Spread Avg_Home_Team,Rolling 5 Past H&A Spread Min_Home_Team,...,Rolling 5 Past H&A Season T Losses_Home_Team,Home Team Ranking_x,Away Team Ranking_x,Previous H&A Ranking_Home_Team,Previous H&A Ranking_Away_Team,H Spread Outcome,Date,Home Team,Away Team,Year
312,-29.0,29.0,52.5,51.0,35.0,22.6,474.6,8.4,-18.0,-37.0,...,0.0,1,99,1.0,99.0,H_Spread_W,2016-11-12,Alabama,Mississippi State,2016
275,6.5,-6.5,46.0,27.0,25.2,21.0,440.6,8.34,-7.3,-14.0,...,14.0,99,10,99.0,12.0,H_Spread_W,2017-10-14,LSU,Auburn,2017
915,3.0,-3.0,45.5,31.0,23.4,17.8,332.6,6.54,-3.1,-17.0,...,11.0,15,5,20.0,5.0,H_Spread_W,2005-10-01,Alabama,Florida,2005
879,7.0,-7.0,40.0,0.0,12.2,14.8,262.6,4.34,11.4,1.5,...,33.0,99,99,99.0,99.0,H_Spread_L,2006-08-31,Mississippi State,South Carolina,2006
298,-10.0,10.0,69.0,20.0,25.2,23.8,430.2,6.88,0.1,-10.0,...,23.0,99,99,99.0,99.0,H_Spread_L,2016-11-26,Ole Miss,Mississippi State,2016
59,16.0,-16.0,74.0,46.0,40.0,27.2,531.6,10.02,-22.7,-31.5,...,6.0,7,1,6.0,1.0,H_Spread_W,2020-12-19,Florida,Alabama,2020
17,1.0,-1.0,57.5,42.0,21.6,17.6,298.0,5.64,4.7,-4.5,...,3.0,18,99,12.0,99.0,H_Spread_L,2021-11-06,Kentucky,Tennessee,2021
318,2.5,-2.5,52.0,24.0,23.6,21.2,391.0,6.08,8.9,-3.0,...,14.0,99,99,99.0,99.0,H_Spread_L,2016-11-05,Kentucky,Georgia,2016
321,14.5,-14.5,51.0,24.0,12.8,16.4,303.0,6.34,8.7,2.5,...,10.0,99,18,99.0,9.0,H_Spread_W,2016-10-29,South Carolina,Tennessee,2016
699,5.5,-5.5,51.0,33.0,24.4,19.8,386.2,5.5,-4.0,-15.0,...,6.0,99,25,99.0,99.0,H_Spread_W,2009-10-31,Auburn,Ole Miss,2009


# Dropped more columns to avoid over complications given the limited dataa

In [None]:
#decided to drop the home team, away team,Date,year given this is a limited datset already and didn't want to overcomplicate the decision tree 

In [241]:
df4.drop('Home Team',inplace=True,axis=1)
df4.drop('Away Team',inplace=True,axis=1)
df4.drop('Date',inplace=True,axis=1)
df4.drop('Year',inplace=True,axis=1)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df4.drop('Home Team',inplace=True,axis=1)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df4.drop('Away Team',inplace=True,axis=1)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df4.drop('Date',inplace=True,axis=1)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df4.drop('Year',inplace=True,axis=1)


In [242]:
df4.columns

Index(['Home Team Spread_x', 'Away Team Spread_x', 'Total_x', 'Home Score_x_x',
       'Rolling 5 Past H&A Score Avg_Home_Team',
       'Rolling 5 Past H&A 1st Downs Avg_Home_Team',
       'Rolling 5 Past H&A Total Yards Avg_Home_Team',
       'Rolling 5 Past H&A Yards per Pass Avg_Home_Team',
       'Rolling 5 Past H&A Spread Avg_Home_Team',
       'Rolling 5 Past H&A Spread Min_Home_Team',
       'Rolling 5 Past H&A HI REC yrds Avg_Home_Team',
       'Rolling 5 Past H&A Conf T Wins_Home_Team',
       'Rolling 5 Past H&A 3rd Down Conversion % Avg_Home_Team',
       'Rolling 5 Past H&A Completion % Avg_Home_Team',
       'Rolling 5 Past H&A Q1 Score Avg_Home_Team',
       'Rolling 5 Past H&A Ranking Avg_Home_Team',
       'Rolling 5 Past H&A Season T Losses_Away_Team', 'Away Score_x_x',
       'Rolling 5 Past H&A Score Avg_Away_Team',
       'Rolling 5 Past H&A 1st Downs Avg_Away_Team',
       'Rolling 5 Past H&A Total Yards Avg_Away_Team',
       'Rolling 5 Past H&A Yards per Pass Avg

In [243]:
df4.drop('Away Score_x_x',inplace=True,axis=1)#Dropped actual scores, so only data that would have been available at the start of the game remains
df4.drop('Home Score_x_x',inplace=True,axis=1)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df4.drop('Away Score_x_x',inplace=True,axis=1)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df4.drop('Home Score_x_x',inplace=True,axis=1)


In [244]:
df4.columns

Index(['Home Team Spread_x', 'Away Team Spread_x', 'Total_x',
       'Rolling 5 Past H&A Score Avg_Home_Team',
       'Rolling 5 Past H&A 1st Downs Avg_Home_Team',
       'Rolling 5 Past H&A Total Yards Avg_Home_Team',
       'Rolling 5 Past H&A Yards per Pass Avg_Home_Team',
       'Rolling 5 Past H&A Spread Avg_Home_Team',
       'Rolling 5 Past H&A Spread Min_Home_Team',
       'Rolling 5 Past H&A HI REC yrds Avg_Home_Team',
       'Rolling 5 Past H&A Conf T Wins_Home_Team',
       'Rolling 5 Past H&A 3rd Down Conversion % Avg_Home_Team',
       'Rolling 5 Past H&A Completion % Avg_Home_Team',
       'Rolling 5 Past H&A Q1 Score Avg_Home_Team',
       'Rolling 5 Past H&A Ranking Avg_Home_Team',
       'Rolling 5 Past H&A Season T Losses_Away_Team',
       'Rolling 5 Past H&A Score Avg_Away_Team',
       'Rolling 5 Past H&A 1st Downs Avg_Away_Team',
       'Rolling 5 Past H&A Total Yards Avg_Away_Team',
       'Rolling 5 Past H&A Yards per Pass Avg_Away_Team',
       'Rolling 5 Past 

# Test Train Spit

In [264]:
X = df4.drop('H Spread Outcome',axis=1)
# Create constants for X, so the model knows its bounds
y = df4['H Spread Outcome']


# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 42)

In [279]:
X_train.shape

(696, 33)

In [280]:
y_train.shape

(696,)

In [281]:
X_test.shape

(232, 33)

In [282]:
y_test.shape

(232,)