## NFL Prediction Model
---
* The goal of our project was to create a machine learning model to predict the outcomes of the 2019 playoffs and Super Bowl.
* We intend to later use it for a gambling strategy. 

### Data
---
* We used pd.read_html to scrape data from footballdb.com. We originally tried to web scrape using BeautifulSoup, but switched directions due to time constraints.
* We used team offense, defense, passing, rushing, downs, games, and outcomes from the 2012-2019 seasons. 
---
#### Feature Exploration:
* Week Number, Home Team, Away Team, Home Score, Away Score, Outcome(W/L) and team specific stats data

In [2]:
# Import Libraries
import requests
import time
import urllib.request
import pandas as pd
import tpot
import tensorflow as tf
from tensorflow import keras
from keras.layers import Dense
import adanet as ad
import matplotlib.pyplot as plt

### Import data

In [3]:
# Import 2012-2019 seasons, weeks, and team stats
season_2019 = pd.read_html('https://www.footballdb.com/games/index.html?lg=NFL&yr=2019')
season_2018 = pd.read_html('https://www.footballdb.com/games/index.html?lg=NFL&yr=2018')
season_2017 = pd.read_html('https://www.footballdb.com/games/index.html?lg=NFL&yr=2017')
season_2016 = pd.read_html('https://www.footballdb.com/games/index.html?lg=NFL&yr=2016')
season_2015 = pd.read_html('https://www.footballdb.com/games/index.html?lg=NFL&yr=2015')
season_2014 = pd.read_html('https://www.footballdb.com/games/index.html?lg=NFL&yr=2014')
season_2013 = pd.read_html('https://www.footballdb.com/games/index.html?lg=NFL&yr=2013')
season_2012 = pd.read_html('https://www.footballdb.com/games/index.html?lg=NFL&yr=2012')

team_d_2019 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2019&type=reg&cat=T&group=D&conf=')
team_d_2018 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2018&type=reg&cat=T&group=D&conf=')
team_d_2017 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2017&type=reg&cat=T&group=D&conf=')
team_d_2016 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2016&type=reg&cat=T&group=D&conf=')
team_d_2015 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2015&type=reg&cat=T&group=D&conf=')
team_d_2014 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2014&type=reg&cat=T&group=D&conf=')
team_d_2013 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2013&type=reg&cat=T&group=D&conf=')
team_d_2012 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2012&type=reg&cat=T&group=D&conf=')

team_o_2019 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2019&type=reg&cat=T&group=O&conf=')
team_o_2018 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2018&type=reg&cat=T&group=O&conf=')
team_o_2017 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2017&type=reg&cat=T&group=O&conf=')
team_o_2016 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2016&type=reg&cat=T&group=O&conf=')
team_o_2015 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2015&type=reg&cat=T&group=O&conf=')
team_o_2014 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2014&type=reg&cat=T&group=O&conf=')
team_o_2013 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2013&type=reg&cat=T&group=O&conf=')
team_o_2012 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2012&type=reg&cat=T&group=O&conf=')

o_passing_2019 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2019&type=reg&cat=P&group=O&conf=')
o_passing_2018 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2018&type=reg&cat=P&group=O&conf=')
o_passing_2017 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2017&type=reg&cat=P&group=O&conf=')
o_passing_2016 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2016&type=reg&cat=P&group=O&conf=')
o_passing_2015 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2015&type=reg&cat=P&group=O&conf=')
o_passing_2014 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2014&type=reg&cat=P&group=O&conf=')
o_passing_2013 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2013&type=reg&cat=P&group=O&conf=')
o_passing_2012 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2012&type=reg&cat=P&group=O&conf=')

o_rushing_2019 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2019&type=reg&cat=R&group=O&conf=')
o_rushing_2018 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2018&type=reg&cat=R&group=O&conf=')
o_rushing_2017 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2017&type=reg&cat=R&group=O&conf=')
o_rushing_2016 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2016&type=reg&cat=R&group=O&conf=')
o_rushing_2015 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2015&type=reg&cat=R&group=O&conf=')
o_rushing_2014 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2014&type=reg&cat=R&group=O&conf=')
o_rushing_2013 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2013&type=reg&cat=R&group=O&conf=')
o_rushing_2012 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2012&type=reg&cat=R&group=O&conf=')

d_passing_2019 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2019&type=reg&cat=P&group=D&conf=')
d_passing_2018 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2018&type=reg&cat=P&group=D&conf=')
d_passing_2017 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2017&type=reg&cat=P&group=D&conf=')
d_passing_2016 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2016&type=reg&cat=P&group=D&conf=')
d_passing_2015 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2015&type=reg&cat=P&group=D&conf=')
d_passing_2014 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2014&type=reg&cat=P&group=D&conf=')
d_passing_2013 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2013&type=reg&cat=P&group=D&conf=')
d_passing_2012 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2012&type=reg&cat=P&group=D&conf=')

d_rushing_2019 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2019&type=reg&cat=R&group=D&conf=')
d_rushing_2018 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2018&type=reg&cat=R&group=D&conf=')
d_rushing_2017 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2017&type=reg&cat=R&group=D&conf=')
d_rushing_2016 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2016&type=reg&cat=R&group=D&conf=')
d_rushing_2015 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2015&type=reg&cat=R&group=D&conf=')
d_rushing_2014 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2014&type=reg&cat=R&group=D&conf=')
d_rushing_2013 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2013&type=reg&cat=R&group=D&conf=')
d_rushing_2012 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2012&type=reg&cat=R&group=D&conf=')

o_downs_2019 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2019&type=reg&cat=W&group=O&conf=')
o_downs_2018 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2018&type=reg&cat=W&group=O&conf=')
o_downs_2017 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2017&type=reg&cat=W&group=O&conf=')
o_downs_2016 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2016&type=reg&cat=W&group=O&conf=')
o_downs_2015 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2015&type=reg&cat=W&group=O&conf=')
o_downs_2014 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2014&type=reg&cat=W&group=O&conf=')
o_downs_2013 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2013&type=reg&cat=W&group=O&conf=')
o_downs_2012 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2012&type=reg&cat=W&group=O&conf=')

d_downs_2019 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2019&type=reg&cat=W&group=D&conf=')
d_downs_2018 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2018&type=reg&cat=W&group=D&conf=')
d_downs_2017 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2017&type=reg&cat=W&group=D&conf=')
d_downs_2016 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2016&type=reg&cat=W&group=D&conf=')
d_downs_2015 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2015&type=reg&cat=W&group=D&conf=')
d_downs_2014 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2014&type=reg&cat=W&group=D&conf=')
d_downs_2013 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2013&type=reg&cat=W&group=D&conf=')
d_downs_2012 = pd.read_html('https://www.footballdb.com/stats/teamstat.html?lg=NFL&yr=2012&type=reg&cat=W&group=D&conf=')


### Wrangle data

In [4]:
# Iterate through df and parse weeks and years, drop columns and export to csv
team_defense = [team_d_2012, team_d_2013, team_d_2014, team_d_2015, team_d_2016, team_d_2017, team_d_2018, team_d_2019]
year = ['2012', '2013', '2014', '2015', '2016', '2017', '2018','2019']
fin_def = []
for yr, df in zip(year, team_defense):
    for idx, weeks in enumerate(df):
        def_temp = weeks.drop(columns=['Gms'])
        def_temp['year'] = yr
        fin_def.append(def_temp)
        
final_defense = fin_def[0]
for i in range(1,len(fin_def)):
    final_defense = final_defense.append(fin_def[i], ignore_index=True)
    
final_defense.to_csv('overall_defense.csv')

In [5]:
# Iterate through df and parse weeks and years, drop columns and export to csv
team_offense = [team_o_2012, team_o_2013, team_o_2014, team_o_2015, team_o_2016, team_o_2017, team_o_2018, team_o_2019]
year = ['2012', '2013', '2014', '2015', '2016', '2017', '2018','2019']
fin_off = []
for yr, df in zip(year, team_offense):
    for idx, weeks in enumerate(df):
        off_temp = weeks.drop(columns=['Gms'])
        off_temp['year'] = yr
        fin_off.append(off_temp)
        
final_offense = fin_off[0]
for i in range(1,len(fin_off)):
    final_offense = final_offense.append(fin_off[i], ignore_index=True)
    
final_offense.to_csv('overall_offense.csv')

In [6]:
# Iterate through df and parse weeks and years, drop columns and export to csv
o_pass = [o_passing_2012, o_passing_2013, o_passing_2014, o_passing_2015, o_passing_2016, o_passing_2017, o_passing_2018, o_passing_2019]
year = ['2012', '2013', '2014', '2015', '2016', '2017', '2018','2019']
fin_op = []
for yr, df in zip(year, o_pass):
    for idx, weeks in enumerate(df):
        op_temp = weeks.drop(columns=['Gms'])
        op_temp['year'] = yr
        fin_op.append(op_temp)
        
final_op = fin_op[0]
for i in range(1,len(fin_op)):
    final_op = final_op.append(fin_op[i], ignore_index=True)
    
final_op.to_csv('off_pass.csv')

In [7]:
# Iterate through df and parse weeks and years, drop columns and export to csv
o_rush = [o_rushing_2012, o_rushing_2013, o_rushing_2014, o_rushing_2015, o_rushing_2016, o_rushing_2017, o_rushing_2018, o_rushing_2019]
year = ['2012', '2013', '2014', '2015', '2016', '2017', '2018','2019']
fin_or = []
for yr, df in zip(year, o_rush):
    for idx, weeks in enumerate(df):
        or_temp = weeks.drop(columns=['Gms'])
        or_temp['year'] = yr
        fin_or.append(or_temp)
        
final_or = fin_or[0]
for i in range(1,len(fin_or)):
    final_or = final_or.append(fin_or[i], ignore_index=True)
    
final_or.to_csv('off_rush.csv')

In [8]:
# Iterate through df and parse weeks and years, drop columns and export to csv
d_pass = [d_passing_2012, d_passing_2013, d_passing_2014, d_passing_2015, d_passing_2016, d_passing_2017, d_passing_2018, d_passing_2019]
year = ['2012', '2013', '2014', '2015', '2016', '2017', '2018','2019']
fin_dp = []
for yr, df in zip(year, d_pass):
    for idx, weeks in enumerate(df):
        dp_temp = weeks.drop(columns=['Gms'])
        dp_temp['year'] = yr
        fin_dp.append(dp_temp)
        
final_dp = fin_dp[0]
for i in range(1,len(fin_dp)):
    final_dp = final_dp.append(fin_dp[i], ignore_index=True)
    
final_dp.to_csv('def_pass.csv')

In [9]:
# Iterate through df and parse weeks and years, drop columns and export to csv
d_rush = [d_rushing_2012, d_rushing_2013, d_rushing_2014, d_rushing_2015, d_rushing_2016, d_rushing_2017, d_rushing_2018, d_rushing_2019]
year = ['2012', '2013', '2014', '2015', '2016', '2017', '2018','2019']
fin_dr = []
for yr, df in zip(year, d_rush):
    for idx, weeks in enumerate(df):
        dr_temp = weeks.drop(columns=['Gms'])
        dr_temp['year'] = yr
        fin_dr.append(dr_temp)
        
final_dr = fin_dr[0]
for i in range(1,len(fin_dr)):
    final_dr = final_dr.append(fin_dr[i], ignore_index=True)
    
final_dr.to_csv('def_rush.csv')

In [10]:
# Iterate through df and parse weeks and years, drop columns and export to csv
o_down = [o_downs_2012, o_downs_2013, o_downs_2014, o_downs_2015, o_downs_2016, o_downs_2017, o_downs_2018, o_downs_2019]
year = ['2012', '2013', '2014', '2015', '2016', '2017', '2018','2019']
fin_od = []
for yr, df in zip(year, o_down):
    for idx, weeks in enumerate(df):
        od_temp = weeks
        od_temp['year'] = yr
        fin_od.append(od_temp)
        
final_od = fin_od[0]
for i in range(1,len(fin_od)):
    final_od = final_od.append(fin_od[i], ignore_index=True)
    
final_od.to_csv('off_downs.csv')

In [11]:
# Iterate through df and parse weeks and years, drop columns and export to csv
d_down = [d_downs_2012, d_downs_2013, d_downs_2014, d_downs_2015, d_downs_2016, d_downs_2017, d_downs_2018, d_downs_2019]
year = ['2012', '2013', '2014', '2015', '2016', '2017', '2018','2019']
fin_dd = []
for yr, df in zip(year, d_down):
    for idx, weeks in enumerate(df):
        dd_temp = weeks
        dd_temp['year'] = yr
        fin_dd.append(dd_temp)
        
final_dd = fin_dd[0]
for i in range(1,len(fin_dd)):
    final_dd = final_dd.append(fin_dd[i], ignore_index=True)
    
final_dd.to_csv('def_downs.csv')

In [12]:
# Iterate through df and parse weeks and years, drop columns and export to csv
list_years =[season_2012, season_2013, season_2014, season_2015, season_2016, season_2017, season_2018, season_2019]
year = ['2012', '2013', '2014', '2015', '2016', '2017', '2018','2019']
cleaned_df = []
for yr, df in zip(year, list_years):
    for idx, weeks in enumerate(df):
        df_temp = weeks.drop(columns=['Box', 'Date']).rename(columns={'Unnamed: 2': 'Visitor_Score', 'Unnamed: 4':'Home_Score', 'Unnamed: 5':'OT'})
        df_temp['wk_year'] = 'week_'+str(idx+1)
        df_temp['year'] = yr
        cleaned_df.append(df_temp)

In [13]:
# Add Outcome column. Home team win or tie results in Won, Home team loss results in Lost.
final_df = cleaned_df[0]
for i in range(1,len(cleaned_df)):
    final_df = final_df.append(cleaned_df[i], ignore_index=True)
    
    
final_df['Outcome'] = final_df.apply(lambda row: 'Lost' if row.Visitor_Score > row.Home_Score else ('Tie' if  row.Visitor_Score == row.Home_Score else 'Won'), axis=1)
final_df['Outcome'] = final_df.apply(lambda row: 'Lost' if row.Visitor_Score > row.Home_Score else 'Won', axis=1)

In [14]:
# There were only 8 overtimes and they don't have much weight when coming to prediction, so we dropped the overtime column
final_df.drop(columns=['OT'], inplace=True)

In [15]:
# Saving entire dataset to csv
final_df.to_csv('myentiredata.csv')

---
* We used data wrangling methods in excel power query to join and manipulate multiple spreadsheets
---

In [16]:
path = ("~/Desktop/NFL-Prediction-Model/nfl_complete_csv.csv")
data = pd.read_csv(path)
data

Unnamed: 0,Team,3D_Pct,4D_Pct,3O_Pct,4O_Pct,OP_Pct,OP_Yds,OP_TD,OP_Int,OP_Sack,...,D_TotYds,D_Yds/G,year,Visitor,Visitor_Score,Home,Home_Score,wk_year,year.1,Outcome
0,Arizona CardinalsARI,32.9,44.4,25.2,41.7,55.4,3383.0,11.0,21.0,58.0,...,5405.0,337.8,2012.0,Dallas CowboysDAL,24,New York GiantsNYG,17,week_1,2012,Lost
1,Atlanta FalconsATL,40.5,46.2,45.1,25.0,68.6,4719.0,32.0,14.0,28.0,...,5849.0,365.6,2012.0,Washington RedskinsWAS,40,New Orleans SaintsNO,32,week_1,2012,Lost
2,Baltimore RavensBAL,35.8,50.0,36.9,42.9,59.6,3996.0,22.0,11.0,38.0,...,5615.0,350.9,2012.0,Los Angeles RamsLA,23,Detroit LionsDET,27,week_1,2012,Won
3,Buffalo BillsBUF,44.0,50.0,38.9,62.5,60.5,3430.0,24.0,17.0,30.0,...,5806.0,362.9,2012.0,New England PatriotsNE,34,Tennessee TitansTEN,13,week_1,2012,Lost
4,Carolina PanthersCAR,36.1,52.6,43.1,33.3,58.0,3927.0,19.0,12.0,36.0,...,5329.0,333.1,2012.0,Miami DolphinsMIA,10,Houston TexansHOU,30,week_1,2012,Won
5,Chicago BearsCHI,35.5,62.5,36.5,38.5,59.2,3298.0,21.0,16.0,44.0,...,5050.0,315.6,2012.0,Buffalo BillsBUF,28,New York JetsNYJ,48,week_1,2012,Won
6,Cincinnati BengalsCIN,36.1,41.7,34.1,68.8,62.0,3807.0,28.0,16.0,46.0,...,5115.0,319.7,2012.0,Atlanta FalconsATL,40,Kansas City ChiefsKC,24,week_1,2012,Lost
7,Cleveland BrownsCLE,38.1,63.6,30.7,43.8,58.0,3668.0,16.0,18.0,36.0,...,5821.0,363.8,2012.0,Jacksonville JaguarsJAX,23,Minnesota VikingsMIN,26,week_1,2012,Won
8,Dallas CowboysDAL,40.1,54.5,43.9,72.7,66.0,4992.0,29.0,19.0,36.0,...,5687.0,355.4,2012.0,Philadelphia EaglesPHI,17,Cleveland BrownsCLE,16,week_1,2012,Lost
9,Denver BroncosDEN,30.6,38.9,45.1,60.0,68.4,4671.0,37.0,11.0,21.0,...,4652.0,290.8,2012.0,Indianapolis ColtsIND,21,Chicago BearsCHI,41,week_1,2012,Won


In [17]:
# Change integer to category
data['year'] = data['year'].astype('category')
data['year.1'] =data['year.1'].astype('category')

#Normalize data for machine learning model
data = pd.get_dummies(data.iloc[:,:-1])

In [18]:
# Changed year column value from integer to category
final_df['year'] = final_df['year'].astype('category')
#test['year'] = test['year'].astype('category')

# Normalize dataset for machine learning models
final_df = pd.get_dummies(final_df.iloc[:,:-1])
#test_df = pd.get_dummies(x_test.iloc[:,:-1])

In [19]:
# Sace training dataset to csv
final_df.iloc[:2125].to_csv('train.csv')

In [20]:
# Save testing dataset to csv
final_df.iloc[2125:].to_csv('test.csv')

In [21]:
# Display training dataset
final_df.iloc[2125:] 

Unnamed: 0,Visitor_Score,Home_Score,Visitor_Arizona CardinalsARI,Visitor_Atlanta FalconsATL,Visitor_Baltimore RavensBAL,Visitor_Buffalo BillsBUF,Visitor_Carolina PanthersCAR,Visitor_Chicago BearsCHI,Visitor_Cincinnati BengalsCIN,Visitor_Cleveland BrownsCLE,...,wk_year_week_8,wk_year_week_9,year_2012,year_2013,year_2014,year_2015,year_2016,year_2017,year_2018,year_2019
2125,19,22,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2126,20,13,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2127,26,20,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2128,17,9,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2129,10,27,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2130,28,12,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2131,31,51,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2132,23,28,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2133,24,35,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2134,20,37,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1


In [22]:
# Read training and testing datasets
test = pd.read_csv("test.csv")
train = pd.read_csv("train.csv")

In [23]:
# Drop unnecessary columns
test.drop(columns=['Unnamed: 0'], inplace=True)
train.drop(columns=['Unnamed: 0'], inplace=True)

In [24]:
test.to_csv('test.csv')
train.to_csv('train.csv')

### Model testing and training

In [25]:
y_train = train[['Visitor_Score', 'Home_Score']]
x_train = train.drop(['Visitor_Score', 'Home_Score'], axis=1)
y_test = test[['Visitor_Score', 'Home_Score']]
x_test = test.drop(['Visitor_Score', 'Home_Score'], axis=1)

In [26]:
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor, AdaBoostRegressor, GradientBoostingRegressor

In [27]:
lr = LinearRegression()
rf = RandomForestRegressor(n_estimators=500, max_depth=7)
ab = AdaBoostRegressor()
gr = GradientBoostingRegressor()

In [28]:
lr.fit(x_train.values, y_train.values)
rf.fit(x_train.values, y_train.values)
#ab.fit(x_train.values, y_train.values)
#gr.fit(x_train.values, y_train.values)


RandomForestRegressor(max_depth=7, n_estimators=500)

In [29]:
predicted_train = lr.predict(x_train.values)
predicted_test = lr.predict(x_test.values)

predicted_train_rf = rf.predict(x_train.values)
predicted_test_rf = rf.predict(x_test.values)

#predicted_train_ab = ab.predict(x_train.values)
#predicted_test_ab = ab.predict(x_test.values)

#predicted_train_gr = gr.predict(x_train.values)
#predicted_test_gr = gr.predict(x_test.values)

In [30]:
from sklearn.metrics import r2_score


In [31]:
train_score = r2_score(y_train.values, predicted_train)
test_score = r2_score(y_test.values, predicted_test)

train_score_rf = r2_score(y_train.values, predicted_train_rf)
test_score_rf = r2_score(y_test.values, predicted_test_rf)

### Logistic Regression

In [32]:
print(train_score)

0.11453735781631164


In [33]:
print(test_score)

-0.21829612988505215


In [34]:
pd.concat([pd.DataFrame(predicted_test, columns=['VS_Pred', 'HS_Pred']),pd.DataFrame(y_test)], axis=1)

Unnamed: 0,VS_Pred,HS_Pred,Visitor_Score,Home_Score
0,17.53125,16.75,19,22
1,16.875,26.59375,20,13
2,23.09375,23.1875,26,20
3,20.34375,15.375,17,9
4,22.96875,22.96875,10,27
5,19.71875,28.5,28,12
6,20.5,25.375,31,51
7,23.40625,23.515625,23,28
8,16.34375,27.125,24,35
9,22.84375,27.375,20,37


---
* Our first attempt at a machine learning model provided a win ratio of .45
* As you can see, logistic regression would be worse than flipping a coin. So this model will not work for what we are trying to achieve.
* Albeit, it has provided tremendous insight and experience
---

### Random Forest Regressor

In [35]:
print(train_score_rf)

0.1194768477578308


In [36]:
print(test_score_rf)

-0.06388597984561994


In [37]:
pd.concat([pd.DataFrame(predicted_test_rf, columns=['VS_Pred', 'HS_Pred']),pd.DataFrame(y_test)], axis=1)

Unnamed: 0,VS_Pred,HS_Pred,Visitor_Score,Home_Score
0,21.585693,23.506678,19,22
1,17.043455,29.755151,20,13
2,23.013178,27.602998,26,20
3,22.081311,20.450043,17,9
4,21.616151,23.625729,10,27
5,20.861076,23.957659,28,12
6,21.587391,23.609806,31,51
7,21.562471,23.031915,23,28
8,21.229317,24.927048,24,35
9,21.604069,25.168299,20,37


---
* Our second attemp worked much better and gave us a win ratio of .73
* This is a much better model at predicting the outcome of games and is better than flipping a coin.
---

### Updated data training set

In [42]:
data

Unnamed: 0,3D_Pct,4D_Pct,3O_Pct,4O_Pct,OP_Pct,OP_Yds,OP_TD,OP_Int,OP_Sack,OP_NetYds,...,wk_year_week_8,wk_year_week_9,year.1_2012,year.1_2013,year.1_2014,year.1_2015,year.1_2016,year.1_2017,year.1_2018,year.1_2019
0,32.9,44.4,25.2,41.7,55.4,3383.0,11.0,21.0,58.0,3005.0,...,0,0,1,0,0,0,0,0,0,0
1,40.5,46.2,45.1,25.0,68.6,4719.0,32.0,14.0,28.0,4509.0,...,0,0,1,0,0,0,0,0,0,0
2,35.8,50.0,36.9,42.9,59.6,3996.0,22.0,11.0,38.0,3739.0,...,0,0,1,0,0,0,0,0,0,0
3,44.0,50.0,38.9,62.5,60.5,3430.0,24.0,17.0,30.0,3269.0,...,0,0,1,0,0,0,0,0,0,0
4,36.1,52.6,43.1,33.3,58.0,3927.0,19.0,12.0,36.0,3683.0,...,0,0,1,0,0,0,0,0,0,0
5,35.5,62.5,36.5,38.5,59.2,3298.0,21.0,16.0,44.0,2999.0,...,0,0,1,0,0,0,0,0,0,0
6,36.1,41.7,34.1,68.8,62.0,3807.0,28.0,16.0,46.0,3578.0,...,0,0,1,0,0,0,0,0,0,0
7,38.1,63.6,30.7,43.8,58.0,3668.0,16.0,18.0,36.0,3435.0,...,0,0,1,0,0,0,0,0,0,0
8,40.1,54.5,43.9,72.7,66.0,4992.0,29.0,19.0,36.0,4729.0,...,0,0,1,0,0,0,0,0,0,0
9,30.6,38.9,45.1,60.0,68.4,4671.0,37.0,11.0,21.0,4534.0,...,0,0,1,0,0,0,0,0,0,0


In [46]:
data = data.fillna(0)

In [57]:
y_train = data[['3D_Pct','4D_Pct','3O_Pct','4O_Pct','OP_Pct','OP_Yds','OP_TD','OP_Int','OP_Sack','OP_NetYds','OP_Yds/G','DP_Pct','DP_Yds','DP_TD','DP_Int','DP_Sack','DP_NetYds',
'DP_Yds/G','OR_Yds','OR_Avg','OR_TD','OR_Yds/G','DR_Yds','DR_Avg','DR_TD','DR_Yds/G','O_Total_Pts','O_Pts/G','O_RushYds','O_RYds/G','O_PassYds','O_PYds/G',
'O_TotYds','O_Yds/G','D_Total_Pts','D_Pts/G','D_RushYds','D_RYds/G','D_PassYds','D_PYds/G','D_TotYds','D_Yds/G','Visitor_Score',
'Home_Score']]
x_train = data.drop(['3D_Pct','4D_Pct','3O_Pct','4O_Pct','OP_Pct','OP_Yds','OP_TD','OP_Int','OP_Sack','OP_NetYds','OP_Yds/G','DP_Pct','DP_Yds','DP_TD','DP_Int','DP_Sack','DP_NetYds',
'DP_Yds/G','OR_Yds','OR_Avg','OR_TD','OR_Yds/G','DR_Yds','DR_Avg','DR_TD','DR_Yds/G','O_Total_Pts','O_Pts/G','O_RushYds','O_RYds/G','O_PassYds','O_PYds/G',
'O_TotYds','O_Yds/G','D_Total_Pts','D_Pts/G','D_RushYds','D_RYds/G','D_PassYds','D_PYds/G','D_TotYds','D_Yds/G','Visitor_Score',
'Home_Score'], axis=1)
y_test = test[['3D_Pct','4D_Pct','3O_Pct','4O_Pct','OP_Pct','OP_Yds','OP_TD','OP_Int','OP_Sack','OP_NetYds','OP_Yds/G','DP_Pct','DP_Yds','DP_TD','DP_Int','DP_Sack','DP_NetYds',
'DP_Yds/G','OR_Yds','OR_Avg','OR_TD','OR_Yds/G','DR_Yds','DR_Avg','DR_TD','DR_Yds/G','O_Total_Pts','O_Pts/G','O_RushYds','O_RYds/G','O_PassYds','O_PYds/G',
'O_TotYds','O_Yds/G','D_Total_Pts','D_Pts/G','D_RushYds','D_RYds/G','D_PassYds','D_PYds/G','D_TotYds','D_Yds/G','Visitor_Score',
'Home_Score']]
x_test = test.drop(['3D_Pct','4D_Pct','3O_Pct','4O_Pct','OP_Pct','OP_Yds','OP_TD','OP_Int','OP_Sack','OP_NetYds','OP_Yds/G','DP_Pct','DP_Yds','DP_TD','DP_Int','DP_Sack','DP_NetYds',
'DP_Yds/G','OR_Yds','OR_Avg','OR_TD','OR_Yds/G','DR_Yds','DR_Avg','DR_TD','DR_Yds/G','O_Total_Pts','O_Pts/G','O_RushYds','O_RYds/G','O_PassYds','O_PYds/G',
'O_TotYds','O_Yds/G','D_Total_Pts','D_Pts/G','D_RushYds','D_RYds/G','D_PassYds','D_PYds/G','D_TotYds','D_Yds/G','Visitor_Score',
'Home_Score'], axis=1)

KeyError: "['OR_TD', 'O_TotYds', 'DP_Int', '4O_Pct', 'OP_TD', 'O_Total_Pts', 'DP_Pct', 'DR_Yds', '3O_Pct', 'OR_Yds', 'DR_Yds/G', 'O_PYds/G', 'OP_Sack', 'OP_Pct', 'D_Yds/G', 'D_TotYds', 'D_RYds/G', 'DP_Sack', 'D_RushYds', 'OP_Yds/G', '4D_Pct', 'DP_TD', 'DP_Yds/G', 'O_RushYds', 'D_PYds/G', 'OP_Int', 'DP_Yds', 'DP_NetYds', 'O_Yds/G', 'OP_NetYds', 'DR_Avg', 'OR_Yds/G', 'OP_Yds', 'O_RYds/G', 'DR_TD', '3D_Pct', 'O_Pts/G', 'O_PassYds', 'D_Pts/G', 'OR_Avg', 'D_Total_Pts', 'D_PassYds'] not in index"

In [50]:
# Run models
lr.fit(x_train.values, y_train.values)
rf.fit(x_train.values, y_train.values)

RandomForestRegressor(max_depth=7, n_estimators=500)

In [51]:
predicted_train = lr.predict(x_train.values)
predicted_test = lr.predict(x_test.values)

predicted_train_rf = rf.predict(x_train.values)
predicted_test_rf = rf.predict(x_test.values)

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 135 is different from 97)

In [26]:
from sklearn.svm import SVC
classifier = SVC(kernel='linear')
classifier

SVC(kernel='linear')

In [27]:
# Use the train_test_split function to create training and testing subsets
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y, 
                                                    random_state=1, 
                                                    stratify=y)
X_train.shape

ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

In [22]:
classifier.fit(x_train, y_train)

NameError: name 'x_train' is not defined