# CS329E Data Analytics Project

**Team Members:** *Bryce Holladay, Joshua Mathew, Austin Rinn, Eddie Castillo*

Using the techniques that we have learned in class, we attempted to predict the result of a National Football League (NFL) play based on elements existing before the play begins, such as field position and time remaining in game.

We used data collected from [publiclly available play by play data from the years 2013 through 2019](http://nflsavant.com/about.php) to build our model. As inputs, our model takes parameters of time, down, yards to go, yardline, and offensive formation. Our data has several play resultant classifiers that we have tried to predict, including touchdowns, interceptions, sacks, first downs, yards, and penalties.

In order to fit the data into our model, we performed several actions to pre-process it, including reformatting time into a linear format and removing non-descriptive data like season year. The results of our model are shown below.

In [147]:
# Use this cell for any notes
# Rubric: https://utexas.instructure.com/courses/1275914/assignments/4897667
import pandas as pd, numpy as np

## Data Preprocessing
Data cleaning, data exploration, and feature engineering

In [148]:
#Read in data from csv
#For building purposes use one season to save processing time.
#For final runs we will switch to compiled data sheet with all seasons.
#Display initial data head

df19 = pd.read_csv('pbp-2019.csv')
df19.head()

Unnamed: 0,GameId,GameDate,Quarter,Minute,Second,OffenseTeam,DefenseTeam,Down,ToGo,YardLine,...,IsTwoPointConversion,IsTwoPointConversionSuccessful,RushDirection,YardLineFixed,YardLineDirection,IsPenaltyAccepted,PenaltyTeam,IsNoPlay,PenaltyType,PenaltyYards
0,2019100605,2019-10-06,1,2,25,OAK,CHI,1,10,50,...,0,0,,50,OPP,0,,0,,0
1,2019100605,2019-10-06,1,1,45,OAK,CHI,2,9,51,...,0,0,RIGHT GUARD,49,OPP,0,,0,,0
2,2019101400,2019-10-14,1,10,34,DET,GB,1,10,84,...,0,0,RIGHT TACKLE,16,OPP,0,,0,,0
3,2019101400,2019-10-14,1,9,55,DET,GB,2,9,85,...,0,0,,15,OPP,0,,0,,0
4,2019101400,2019-10-14,1,9,10,DET,GB,3,3,91,...,0,0,,9,OPP,0,,0,,0


In [149]:
####

In [150]:
#Convert time into a standard format
#Display both format heads for comparison
df19['AbsoluteTime'] = (df19['Quarter']-1)*900 + df19['Minute']*60 + df19['Second'] 

In [151]:
#Convert GameDate into just month to represent time of year
import re
pattern = "-(.*?)\-"
for index in range(df19.shape[0]):
   df19['GameDate'][index] = re.search(pattern, df19['GameDate'][index]).group(1)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


In [152]:
df19.rename(columns={"GameDate": "GameMonth"})

Unnamed: 0,GameId,GameMonth,Quarter,Minute,Second,OffenseTeam,DefenseTeam,Down,ToGo,YardLine,...,IsTwoPointConversionSuccessful,RushDirection,YardLineFixed,YardLineDirection,IsPenaltyAccepted,PenaltyTeam,IsNoPlay,PenaltyType,PenaltyYards,AbsoluteTime
0,2019100605,10,1,2,25,OAK,CHI,1,10,50,...,0,,50,OPP,0,,0,,0,145
1,2019100605,10,1,1,45,OAK,CHI,2,9,51,...,0,RIGHT GUARD,49,OPP,0,,0,,0,105
2,2019101400,10,1,10,34,DET,GB,1,10,84,...,0,RIGHT TACKLE,16,OPP,0,,0,,0,634
3,2019101400,10,1,9,55,DET,GB,2,9,85,...,0,,15,OPP,0,,0,,0,595
4,2019101400,10,1,9,10,DET,GB,3,3,91,...,0,,9,OPP,0,,0,,0,550
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
42181,2019090803,09,3,7,54,BAL,MIA,0,0,35,...,0,,35,OWN,0,,0,,0,2274
42182,2019090800,09,3,0,0,,LA,0,0,0,...,0,,0,OWN,0,,0,,0,1800
42183,2019090800,09,1,0,0,,LA,0,0,0,...,0,,0,OWN,0,,0,,0,0
42184,2019090500,09,3,15,0,GB,CHI,0,0,35,...,0,,35,OWN,0,,0,,0,2700


##### Drop Data that has no effect or could mislead models

In [153]:
#Purge other data not needed
# No longer need Quarter, Minute, Seconds
# GameID has no effect on the play
# SeriesFirstDown has no description
# NextScore is 0 for every row. Has no effect.
df_19 = df19.drop(['Quarter', 'Minute', 'Second', 'GameId', 'Unnamed: 10', 'Unnamed: 12', 'Unnamed: 16', 'Unnamed: 17', 'SeriesFirstDown', 'NextScore', 'TeamWin', 'Description', 'OffenseTeam', 'DefenseTeam', 'SeasonYear'], axis=1)


##### Drop Data that can only be known after a play. Including this data would be "cheating"

In [124]:
# Yards is information known after the play
df_19 = df_19.drop(['Yards', 'IsIncomplete', 'IsSack', 'IsChallenge', 'IsChallengeReversed', 'Challenger', 'IsMeasurement', 'IsInterception', 'IsFumble', 'IsPenalty', 'IsTwoPointConversion', 'IsTwoPointConversionSuccessful', 'IsPenaltyAccepted', 'PenaltyTeam', 'PenaltyType', 'PenaltyYards', 'YardLineFixed'], axis=1)
df_19.head()

Unnamed: 0,GameDate,Down,ToGo,YardLine,Formation,PlayType,IsRush,IsPass,IsIncomplete,IsTouchdown,PassType,RushDirection,YardLineDirection,IsNoPlay,AbsoluteTime
0,10,1,10,50,NO HUDDLE,PASS,0,1,0,0,SHORT LEFT,,OPP,0,145
1,10,2,9,51,UNDER CENTER,RUSH,1,0,0,0,,RIGHT GUARD,OPP,0,105
2,10,1,10,84,UNDER CENTER,RUSH,1,0,0,0,,RIGHT TACKLE,OPP,0,634
3,10,2,9,85,SHOTGUN,PASS,0,1,0,0,SHORT MIDDLE,,OPP,0,595
4,10,3,3,91,SHOTGUN,PASS,0,1,0,0,SHORT MIDDLE,,OPP,0,550


##### Combine RushDirection and PassType in order to get 1 highly descriptive column for the play. Can delete some columns after this

In [154]:
# Combine RushDirection and PassType to get one column with play type
# No need for PlayType column anymore because it says the same information but less descriptive
df_19['RushDirection'] = df_19['RushDirection'].fillna('')
df_19['PassType'] = df_19['PassType'].fillna('')
df_19['PlayType2'] = df_19['RushDirection'] + df_19['PassType']
df_19 = df_19.drop('PlayType', axis=1)


In [155]:
df19.rename(columns={"PlayType": "PlayType2"})
df_19 = df_19.drop(['PassType', 'RushDirection', 'YardLineDirection'], axis=1)
df_19.head(50)

Unnamed: 0,GameDate,Down,ToGo,YardLine,Yards,Formation,IsRush,IsPass,IsIncomplete,IsTouchdown,...,IsTwoPointConversion,IsTwoPointConversionSuccessful,YardLineFixed,IsPenaltyAccepted,PenaltyTeam,IsNoPlay,PenaltyType,PenaltyYards,AbsoluteTime,PlayType2
0,10,1,10,50,1,NO HUDDLE,0,1,0,0,...,0,0,50,0,,0,,0,145,SHORT LEFT
1,10,2,9,51,3,UNDER CENTER,1,0,0,0,...,0,0,49,0,,0,,0,105,RIGHT GUARD
2,10,1,10,84,1,UNDER CENTER,1,0,0,0,...,0,0,16,0,,0,,0,634,RIGHT TACKLE
3,10,2,9,85,6,SHOTGUN,0,1,0,0,...,0,0,15,0,,0,,0,595,SHORT MIDDLE
4,10,3,3,91,6,SHOTGUN,0,1,0,0,...,0,0,9,0,,0,,0,550,SHORT MIDDLE
5,11,1,10,56,22,SHOTGUN,0,1,0,0,...,0,0,44,0,,0,,0,447,SHORT MIDDLE
6,11,1,10,78,17,SHOTGUN,0,1,0,0,...,0,0,22,0,,0,,0,404,DEEP LEFT
7,11,2,1,70,0,SHOTGUN,0,0,0,0,...,0,0,30,0,,0,,0,2049,
8,11,1,10,73,0,SHOTGUN,0,0,0,0,...,0,0,27,1,BAL,1,FALSE START,5,2008,
9,11,2,10,51,4,SHOTGUN,1,0,0,0,...,0,0,49,0,,0,,0,736,LEFT END


In [156]:
c = (df_19['PlayType2'] == '').sum()
print(c)
df_19.head(50)
df_19.describe()

13334


Unnamed: 0,Down,ToGo,YardLine,Yards,IsRush,IsPass,IsIncomplete,IsTouchdown,IsSack,IsChallenge,...,IsInterception,IsFumble,IsPenalty,IsTwoPointConversion,IsTwoPointConversionSuccessful,YardLineFixed,IsPenaltyAccepted,IsNoPlay,PenaltyYards,AbsoluteTime
count,42186.0,42186.0,42186.0,42186.0,42186.0,42186.0,42186.0,42186.0,42186.0,42186.0,...,42186.0,42186.0,42186.0,42186.0,42186.0,42186.0,42186.0,42186.0,42186.0,42186.0
mean,1.666003,7.302209,45.411321,4.197767,0.286778,0.415351,0.147395,0.031503,0.03001,0.004931,...,0.010074,0.014009,0.088394,0.002702,0.001399,26.886408,0.076163,0.058195,0.628265,1818.090433
std,1.170637,4.98911,26.773955,8.289672,0.452262,0.492788,0.354503,0.174676,0.170617,0.070045,...,0.099866,0.117531,0.283871,0.051914,0.037372,14.270611,0.265261,0.234114,2.633319,1036.803255
min,0.0,0.0,0.0,-23.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,1.0,3.0,25.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,16.0,0.0,0.0,0.0,932.0
50%,1.0,9.0,40.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,29.0,0.0,0.0,0.0,1800.0
75%,2.0,10.0,66.0,6.0,1.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,38.0,0.0,0.0,0.0,2761.0
max,4.0,40.0,99.0,100.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,50.0,1.0,1.0,53.0,4200.0


##### Drop rows where it is not a rush/pass play

In [157]:
# Get names of indexes for which plays are not rush or pass
indexNames = df_19[(df_19['IsRush'] == 0) & (df_19['IsPass'] == 0)].index
 
# Delete these row indexes from dataFrame
df_19.drop(indexNames , inplace=True)
df_19.describe()

Unnamed: 0,Down,ToGo,YardLine,Yards,IsRush,IsPass,IsIncomplete,IsTouchdown,IsSack,IsChallenge,...,IsInterception,IsFumble,IsPenalty,IsTwoPointConversion,IsTwoPointConversionSuccessful,YardLineFixed,IsPenaltyAccepted,IsNoPlay,PenaltyYards,AbsoluteTime
count,29620.0,29620.0,29620.0,29620.0,29620.0,29620.0,29620.0,29620.0,29620.0,29620.0,...,29620.0,29620.0,29620.0,29620.0,29620.0,29620.0,29620.0,29620.0,29620.0,29620.0
mean,1.784571,8.660331,48.803916,6.265496,0.40844,0.59156,0.209926,0.043214,0.0,0.00655,...,0.014281,0.009014,0.078123,0.0,0.0,29.19865,0.064281,0.049865,0.625186,1826.877819
std,0.819517,4.022085,24.38435,9.01832,0.491554,0.491554,0.407262,0.203342,0.0,0.080666,...,0.118648,0.094516,0.268369,0.0,0.0,12.779531,0.245257,0.21767,2.822876,1040.954517
min,0.0,0.0,0.0,-12.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,1.0,6.0,28.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,20.0,0.0,0.0,0.0,939.0
50%,2.0,10.0,45.0,4.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,30.0,0.0,0.0,0.0,1808.0
75%,2.0,10.0,68.0,9.0,1.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,40.0,0.0,0.0,0.0,2774.0
max,4.0,40.0,99.0,100.0,1.0,1.0,1.0,1.0,0.0,1.0,...,1.0,1.0,1.0,0.0,0.0,50.0,1.0,1.0,53.0,4200.0


##### This took care of most of the nulls. Dropping the rest is a small fraction of our data

In [158]:
# Get names of indexes for which plays arre not specified
indexNames = df_19[df_19['PlayType2'] == ''].index
 
# Delete these row indexes from dataFrame
df_19.drop(indexNames , inplace=True)

In [179]:
c = (df_19['PlayType2'] == '').sum()
print(c)
df_19.head(50)
df_19.describe()

0


Unnamed: 0,Down,ToGo,YardLine,Yards,IsRush,IsPass,IsIncomplete,IsTouchdown,IsSack,IsChallenge,...,IsPenalty,IsTwoPointConversion,IsTwoPointConversionSuccessful,YardLineFixed,IsPenaltyAccepted,IsNoPlay,PenaltyYards,AbsoluteTime,Formation_Code,PlayType_Code
count,28852.0,28852.0,28852.0,28852.0,28852.0,28852.0,28852.0,28852.0,28852.0,28852.0,...,28852.0,28852.0,28852.0,28852.0,28852.0,28852.0,28852.0,28852.0,28852.0,28852.0
mean,1.77797,8.654963,48.778178,6.240157,0.392867,0.607133,0.215514,0.043255,0.0,0.006689,...,0.076182,0.0,0.0,29.184875,0.062353,0.049459,0.612956,1826.642209,3.164009,10.396333
std,0.817211,4.025163,24.397549,9.093324,0.488396,0.488396,0.411185,0.203435,0.0,0.081516,...,0.265294,0.0,0.0,12.78471,0.241799,0.216829,2.812296,1041.035169,0.855022,5.493827
min,0.0,0.0,0.0,-12.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,1.0,6.0,28.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,20.0,0.0,0.0,0.0,939.0,3.0,5.0
50%,2.0,10.0,45.0,4.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,30.0,0.0,0.0,0.0,1805.5,3.0,13.0
75%,2.0,10.0,68.0,9.0,1.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,40.0,0.0,0.0,0.0,2774.0,4.0,15.0
max,4.0,40.0,99.0,100.0,1.0,1.0,1.0,1.0,0.0,1.0,...,1.0,0.0,0.0,50.0,1.0,1.0,53.0,4200.0,5.0,17.0


##### Label Encode the categorical data

In [159]:
#Label Encode
from sklearn.preprocessing import LabelEncoder
# creating initial dataframe
#bridge_types = ('Arch','Beam','Truss','Cantilever','Tied Arch','Suspension','Cable')
#bridge_df = pd.DataFrame(bridge_types, columns=['Bridge_Types'])
# creating instance of labelencoder
labelencoder = LabelEncoder()
# Assigning numerical values and storing in another column
df_19['Formation_Code'] = labelencoder.fit_transform(df_19['Formation'])
df_19['PlayType_Code'] = labelencoder.fit_transform(df_19['PlayType2'])


In [160]:
df_19_encoded = df_19.drop(['Formation', 'PlayType2'], axis=1)

In [161]:
df_19_encoded

Unnamed: 0,GameDate,Down,ToGo,YardLine,Yards,IsRush,IsPass,IsIncomplete,IsTouchdown,IsSack,...,IsTwoPointConversionSuccessful,YardLineFixed,IsPenaltyAccepted,PenaltyTeam,IsNoPlay,PenaltyType,PenaltyYards,AbsoluteTime,Formation_Code,PlayType_Code
0,10,1,10,50,1,0,1,0,0,0,...,0,50,0,,0,,0,145,0,14
1,10,2,9,51,3,1,0,0,0,0,...,0,49,0,,0,,0,105,4,12
2,10,1,10,84,1,1,0,0,0,0,...,0,16,0,,0,,0,634,4,13
3,10,2,9,85,6,0,1,0,0,0,...,0,15,0,,0,,0,595,3,15
4,10,3,3,91,6,0,1,0,0,0,...,0,9,0,,0,,0,550,3,15
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
42149,09,2,5,42,-1,1,0,0,0,0,...,0,42,0,,0,,0,1727,4,6
42152,09,1,10,22,0,1,0,0,0,0,...,0,22,0,,0,,0,3544,4,5
42154,09,1,15,40,3,1,0,0,0,0,...,0,40,0,,0,,0,2495,4,11
42159,09,1,10,79,0,0,1,1,0,0,...,0,21,0,,0,,0,1854,4,4


##### Drop Data that can only be known after a play. Including this data would be "cheating"

### Create dataset for predicting touchdowns

In [162]:
# To predict a touchdown, we must drop data that cannot be known prior to the play
df19_isTD = df_19_encoded.drop(['Yards', 'IsIncomplete', 'IsSack', 'IsChallenge', 'IsChallengeReversed', 'Challenger', 'IsMeasurement', 'IsInterception', 'IsFumble', 'IsPenalty', 'IsTwoPointConversion', 'IsTwoPointConversionSuccessful', 'IsPenaltyAccepted', 'PenaltyTeam', 'PenaltyType', 'PenaltyYards', 'YardLineFixed'], axis=1)
df19_isTD.head()

Unnamed: 0,GameDate,Down,ToGo,YardLine,IsRush,IsPass,IsTouchdown,IsNoPlay,AbsoluteTime,Formation_Code,PlayType_Code
0,10,1,10,50,0,1,0,0,145,0,14
1,10,2,9,51,1,0,0,0,105,4,12
2,10,1,10,84,1,0,0,0,634,4,13
3,10,2,9,85,0,1,0,0,595,3,15
4,10,3,3,91,0,1,0,0,550,3,15


### Create dataset for predicting sacks

In [164]:
# To predict a sack, we must drop data that cannot be known prior to the play
df19_isSack = df_19_encoded.drop(['Yards', 'IsIncomplete', 'IsTouchdown', 'IsChallenge', 'IsChallengeReversed', 'Challenger', 'IsMeasurement', 'IsInterception', 'IsFumble', 'IsPenalty', 'IsTwoPointConversion', 'IsTwoPointConversionSuccessful', 'IsPenaltyAccepted', 'PenaltyTeam', 'PenaltyType', 'PenaltyYards', 'YardLineFixed'], axis=1)
df19_isSack.head()

Unnamed: 0,GameDate,Down,ToGo,YardLine,IsRush,IsPass,IsSack,IsNoPlay,AbsoluteTime,Formation_Code,PlayType_Code
0,10,1,10,50,0,1,0,0,145,0,14
1,10,2,9,51,1,0,0,0,105,4,12
2,10,1,10,84,1,0,0,0,634,4,13
3,10,2,9,85,0,1,0,0,595,3,15
4,10,3,3,91,0,1,0,0,550,3,15


### Create dataset for predicting a fumble

In [168]:
# To predict a fumble, we must drop data that cannot be known prior to the play
df19_isFum = df_19_encoded.drop(['Yards', 'IsIncomplete', 'IsTouchdown', 'IsChallenge', 'IsChallengeReversed', 'Challenger', 'IsMeasurement', 'IsInterception', 'IsSack', 'IsPenalty', 'IsTwoPointConversion', 'IsTwoPointConversionSuccessful', 'IsPenaltyAccepted', 'PenaltyTeam', 'PenaltyType', 'PenaltyYards', 'YardLineFixed'], axis=1)
df19_isFum.head()

Unnamed: 0,GameDate,Down,ToGo,YardLine,IsRush,IsPass,IsFumble,IsNoPlay,AbsoluteTime,Formation_Code,PlayType_Code
0,10,1,10,50,0,1,0,0,145,0,14
1,10,2,9,51,1,0,0,0,105,4,12
2,10,1,10,84,1,0,0,0,634,4,13
3,10,2,9,85,0,1,0,0,595,3,15
4,10,3,3,91,0,1,0,0,550,3,15


### Create dataset for predicting an incomplete pass

In [213]:
# To predict a fumble, we must drop data that cannot be known prior to the play
df19_isIC = df_19_encoded.drop(['Yards', 'IsFumble', 'IsTouchdown', 'IsChallenge', 'IsChallengeReversed', 'Challenger', 'IsMeasurement', 'IsInterception', 'IsSack', 'IsPenalty', 'IsTwoPointConversion', 'IsTwoPointConversionSuccessful', 'IsPenaltyAccepted', 'PenaltyTeam', 'PenaltyType', 'PenaltyYards', 'YardLineFixed'], axis=1)
df19_isIC.head()

Unnamed: 0,GameDate,Down,ToGo,YardLine,IsRush,IsPass,IsIncomplete,IsNoPlay,AbsoluteTime,Formation_Code,PlayType_Code
0,10,1,10,50,0,1,0,0,145,0,14
1,10,2,9,51,1,0,0,0,105,4,12
2,10,1,10,84,1,0,0,0,634,4,13
3,10,2,9,85,0,1,0,0,595,3,15
4,10,3,3,91,0,1,0,0,550,3,15


##### Incomplete only applies to passing plays. Must drop all rows where isRush = 1

In [217]:
rows = df19_isIC['IsRush'] == 1
indexNames = df19_isIC[rows].index
df19_isIC = df19_isIC.drop(indexNames)

### Create dataset for predicting an interception


In [222]:
# To predict a fumble, we must drop data that cannot be known prior to the play
df19_isINT = df_19_encoded.drop(['Yards', 'IsFumble', 'IsTouchdown', 'IsChallenge', 'IsChallengeReversed', 'Challenger', 'IsMeasurement', 'IsIncomplete', 'IsSack', 'IsPenalty', 'IsTwoPointConversion', 'IsTwoPointConversionSuccessful', 'IsPenaltyAccepted', 'PenaltyTeam', 'PenaltyType', 'PenaltyYards', 'YardLineFixed'], axis=1)
df19_isINT.head()

Unnamed: 0,GameDate,Down,ToGo,YardLine,IsRush,IsPass,IsInterception,IsNoPlay,AbsoluteTime,Formation_Code,PlayType_Code
0,10,1,10,50,0,1,0,0,145,0,14
1,10,2,9,51,1,0,0,0,105,4,12
2,10,1,10,84,1,0,0,0,634,4,13
3,10,2,9,85,0,1,0,0,595,3,15
4,10,3,3,91,0,1,0,0,550,3,15


### Create dataset for predicting Yards gained

In [221]:
# To predict a fumble, we must drop data that cannot be known prior to the play
df19_yardsGain = df_19_encoded.drop(['IsIncomplete', 'IsFumble', 'IsTouchdown', 'IsChallenge', 'IsChallengeReversed', 'Challenger', 'IsMeasurement', 'IsInterception', 'IsSack', 'IsPenalty', 'IsTwoPointConversion', 'IsTwoPointConversionSuccessful', 'IsPenaltyAccepted', 'PenaltyTeam', 'PenaltyType', 'PenaltyYards', 'YardLineFixed'], axis=1)
df19_yardsGain.head()

Unnamed: 0,GameDate,Down,ToGo,YardLine,Yards,IsRush,IsPass,IsNoPlay,AbsoluteTime,Formation_Code,PlayType_Code
0,10,1,10,50,1,0,1,0,145,0,14
1,10,2,9,51,3,1,0,0,105,4,12
2,10,1,10,84,1,1,0,0,634,4,13
3,10,2,9,85,6,0,1,0,595,3,15
4,10,3,3,91,6,0,1,0,550,3,15


In [22]:
#Separate labels from classifiers
#Labels will most likely need to be converted into one column with casting as nothing=0, touchdown=1, interception=2, etc 

#### df19_isTD - Use to predict if they will score a touchdown

#### df19_isSack - Use to predict if there will be a sack

#### df19_isFum - Use to predict if there will be a sack

#### df19_isIC - Use to predict if there will be a incomplete pass

#### df19_isINT - Use to predict if there will be an interception

In [None]:
df19_yardsGain

## Data Analysis

#### Decision Trees

In [24]:
#Perform Decision Trees (Assign 1)
#Report results, including accuracy scores and appropriate visuals

#### KNN

In [25]:
#Perform KNN (Assign 2)
#Report results, including accuracy scores and appropriate visuals

#### Naive-Bayes

In [26]:
#Perform Naive-Bayes (Assign 2)
#Report results, including accuracy scores and appropriate visuals

#### SVM

In [27]:
#Perform SVM (Assign 3)
#Report results, including accuracy scores and appropriate visuals

#### Neural Net

In [28]:
#Perform Neural Net (Assign 3)
#Report results, including accuracy scores and appropriate visuals

#### Ensembles

In [29]:
#Perform Ensembles (Assign 3)
#Report results, including accuracy scores and appropriate visuals

## Model Analysis

In [30]:
#Compare accuracy scores and other metrics for our different models.
#How confident are we in the success rates of these various models?

In [31]:
#Discuss which model was the best.

In [32]:
#Discuss data. What issues may have existed in the data?  What assumptions did we make? What could have made our data better?

In [33]:
#Discuss our project as a whole. How could we have improved project? How might this model be used in real world applications?