# CS329E Data Analytics Project

**Team Members:** *Bryce Holladay, Joshua Mathew, Austin Rinn, Eddie Castillo*

Using the techniques that we have learned in class, we attempted to predict the result of a National Football League (NFL) play based on elements existing before the play begins, such as field position and time remaining in game.

We used data collected from [publiclly available play by play data from the years 2013 through 2019](http://nflsavant.com/about.php) to build our model. As inputs, our model takes parameters of time, down, yards to go, yardline, and offensive formation. Our data has several play resultant classifiers that we have tried to predict, including touchdowns, interceptions, sacks, first downs, yards, and penalties.

In order to fit the data into our model, we performed several actions to pre-process it, including reformatting time into a linear format and removing non-descriptive data like season year. The results of our model are shown below.

In [2]:
# Use this cell for any notes
# Rubric: https://utexas.instructure.com/courses/1275914/assignments/4897667
import pandas as pd

## Data Preprocessing
Data cleaning, data exploration, and feature engineering

In [19]:
#Read in data from csv
#For building purposes use one season to save processing time.
#For final runs we will switch to compiled data sheet with all seasons.
#Display initial data head

df19 = pd.read_csv('pbp-2019.csv')
df19.head()

Unnamed: 0,GameId,GameDate,Quarter,Minute,Second,OffenseTeam,DefenseTeam,Down,ToGo,YardLine,...,IsTwoPointConversion,IsTwoPointConversionSuccessful,RushDirection,YardLineFixed,YardLineDirection,IsPenaltyAccepted,PenaltyTeam,IsNoPlay,PenaltyType,PenaltyYards
0,2019100605,2019-10-06,1,2,25,OAK,CHI,1,10,50,...,0,0,,50,OPP,0,,0,,0
1,2019100605,2019-10-06,1,1,45,OAK,CHI,2,9,51,...,0,0,RIGHT GUARD,49,OPP,0,,0,,0
2,2019101400,2019-10-14,1,10,34,DET,GB,1,10,84,...,0,0,RIGHT TACKLE,16,OPP,0,,0,,0
3,2019101400,2019-10-14,1,9,55,DET,GB,2,9,85,...,0,0,,15,OPP,0,,0,,0
4,2019101400,2019-10-14,1,9,10,DET,GB,3,3,91,...,0,0,,9,OPP,0,,0,,0


In [20]:
#Convert time into a standard format
#Display both format heads for comparison
df19['AbsoluteTime'] = (df19['Quarter']-1)*900 + df19['Minute']*60 + df19['Second'] 

In [21]:
#Convert GameDate into just month to represent time of year
import re
pattern = "-(.*?)\-"
for index in range(df19.shape[0]):
   df19['GameDate'][index] = re.search(pattern, df19['GameDate'][index]).group(1)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


In [22]:
df19.rename(columns={"GameDate": "GameMonth"})

Unnamed: 0,GameId,GameMonth,Quarter,Minute,Second,OffenseTeam,DefenseTeam,Down,ToGo,YardLine,...,IsTwoPointConversionSuccessful,RushDirection,YardLineFixed,YardLineDirection,IsPenaltyAccepted,PenaltyTeam,IsNoPlay,PenaltyType,PenaltyYards,AbsoluteTime
0,2019100605,10,1,2,25,OAK,CHI,1,10,50,...,0,,50,OPP,0,,0,,0,145
1,2019100605,10,1,1,45,OAK,CHI,2,9,51,...,0,RIGHT GUARD,49,OPP,0,,0,,0,105
2,2019101400,10,1,10,34,DET,GB,1,10,84,...,0,RIGHT TACKLE,16,OPP,0,,0,,0,634
3,2019101400,10,1,9,55,DET,GB,2,9,85,...,0,,15,OPP,0,,0,,0,595
4,2019101400,10,1,9,10,DET,GB,3,3,91,...,0,,9,OPP,0,,0,,0,550
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
42181,2019090803,09,3,7,54,BAL,MIA,0,0,35,...,0,,35,OWN,0,,0,,0,2274
42182,2019090800,09,3,0,0,,LA,0,0,0,...,0,,0,OWN,0,,0,,0,1800
42183,2019090800,09,1,0,0,,LA,0,0,0,...,0,,0,OWN,0,,0,,0,0
42184,2019090500,09,3,15,0,GB,CHI,0,0,35,...,0,,35,OWN,0,,0,,0,2700


In [23]:
#Purge other data not needed
# No longer need Quarter, Minute, Seconds
df19.drop(['Quarter', 'Minute', 'Second', 'GameId'], axis=1)


Unnamed: 0,GameDate,OffenseTeam,DefenseTeam,Down,ToGo,YardLine,Unnamed: 10,SeriesFirstDown,Unnamed: 12,NextScore,...,IsTwoPointConversionSuccessful,RushDirection,YardLineFixed,YardLineDirection,IsPenaltyAccepted,PenaltyTeam,IsNoPlay,PenaltyType,PenaltyYards,AbsoluteTime
0,10,OAK,CHI,1,10,50,,0,,0,...,0,,50,OPP,0,,0,,0,145
1,10,OAK,CHI,2,9,51,,0,,0,...,0,RIGHT GUARD,49,OPP,0,,0,,0,105
2,10,DET,GB,1,10,84,,0,,0,...,0,RIGHT TACKLE,16,OPP,0,,0,,0,634
3,10,DET,GB,2,9,85,,0,,0,...,0,,15,OPP,0,,0,,0,595
4,10,DET,GB,3,3,91,,1,,0,...,0,,9,OPP,0,,0,,0,550
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
42181,09,BAL,MIA,0,0,35,,1,,0,...,0,,35,OWN,0,,0,,0,2274
42182,09,,LA,0,0,0,,1,,0,...,0,,0,OWN,0,,0,,0,1800
42183,09,,LA,0,0,0,,1,,0,...,0,,0,OWN,0,,0,,0,0
42184,09,GB,CHI,0,0,35,,1,,0,...,0,,35,OWN,0,,0,,0,2700


In [22]:
#Separate labels from classifiers
#Labels will most likely need to be converted into one column with casting as nothing=0, touchdown=1, interception=2, etc 

In [23]:
#Confirm and display final data

## Data Analysis

#### Decision Trees

In [24]:
#Perform Decision Trees (Assign 1)
#Report results, including accuracy scores and appropriate visuals

#### KNN

In [25]:
#Perform KNN (Assign 2)
#Report results, including accuracy scores and appropriate visuals

#### Naive-Bayes

In [26]:
#Perform Naive-Bayes (Assign 2)
#Report results, including accuracy scores and appropriate visuals

#### SVM

In [27]:
#Perform SVM (Assign 3)
#Report results, including accuracy scores and appropriate visuals

#### Neural Net

In [1]:
#Perform Neural Net (Assign 3)
%%time

#Report results, including accuracy scores and appropriate visuals

UsageError: Line magic function `%%time` not found.


#### Ensembles

In [29]:
#Perform Ensembles (Assign 3)
#Report results, including accuracy scores and appropriate visuals

## Model Analysis

In [30]:
#Compare accuracy scores and other metrics for our different models.
#How confident are we in the success rates of these various models?

In [31]:
#Discuss which model was the best.

In [32]:
#Discuss data. What issues may have existed in the data?  What assumptions did we make? What could have made our data better?

In [33]:
#Discuss our project as a whole. How could we have improved project? How might this model be used in real world applications?