# NFL Big Data Bowl 2022 Notebook
### Author: Conrad Bituin

## Topic to Analyze

Quantify special teams strategy. Special teams’ coaches are among the most creative and innovative in the league. Compare/contrast how each team game plans. Which strategies yield the best results? What are other strategies that could be adopted?


### References
- [Official Competition Page](https://www.kaggle.com/c/nfl-big-data-bowl-2022/overview)
- [Official Explanation of Data](https://www.kaggle.com/c/nfl-big-data-bowl-2022/data)
- [Beginner Notebook via Kaggle](https://www.kaggle.com/werooring/nfl-big-data-bowl-basic-eda-for-beginner/notebook)
- [Previous Bowl Recaps](https://operations.nfl.com/gameday/analytics/big-data-bowl/past-big-data-bowl-recaps/)

## Background

As of the writing of this notebook (Week 14, 2021 Season), NFL kickers have collectively missed 69 extra point attempts (PATs) and 122 field goals in the 2021 season [according to Pro Football Reference](https://www.pro-football-reference.com/). The intent of this notebook is to analyze the various situations and scenarios that could affect PAT and field goal outcomes. The model produced will attempt to find the optimal field conditions that will produce the highest likelihood for successful extra point and field goal attempts.

## Acquire Data

In [1]:
# Common imports

import matplotlib.pyplot as plt
import matplotlib as mpl
from matplotlib import cm
import numpy as np
import pandas as pd

In [4]:
# File imports

# The games.csv contains the teams playing in each game. The key variable is gameId.
games_df = pd.read_csv('../data/games.csv')

# The PFFScoutingData.csv file contains play-level scouting information for each game. The key variables are gameId and playId.
pffscouting_df = pd.read_csv('../data/PFFScoutingData.csv')

# The players.csv file contains player-level information from players that participated in any of the tracking data files. The key variable is nflId.
players_df = pd.read_csv('../data/players.csv')

# The plays.csv file contains play-level information from each game. The key variables are gameId and playId.
plays_df = pd.read_csv('../data/plays.csv')

#Files tracking[season].csv contain player tracking data from season [season]. The key variables are gameId, playId, and nflId.
tracking_2018_df = pd.read_csv('../data/tracking2018.csv')
tracking_2019_df = pd.read_csv('../data/tracking2019.csv')
tracking_2020_df = pd.read_csv('../data/tracking2020.csv')

## Analyze and Describe Data

In [39]:
games_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 764 entries, 0 to 763
Data columns (total 7 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   gameId           764 non-null    int64 
 1   season           764 non-null    int64 
 2   week             764 non-null    int64 
 3   gameDate         764 non-null    object
 4   gameTimeEastern  764 non-null    object
 5   homeTeamAbbr     764 non-null    object
 6   visitorTeamAbbr  764 non-null    object
dtypes: int64(3), object(4)
memory usage: 41.9+ KB


In [24]:
players_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2732 entries, 0 to 2731
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   nflId        2732 non-null   int64 
 1   height       2732 non-null   object
 2   weight       2732 non-null   int64 
 3   birthDate    2715 non-null   object
 4   collegeName  2724 non-null   object
 5   Position     2732 non-null   object
 6   displayName  2732 non-null   object
dtypes: int64(2), object(5)
memory usage: 149.5+ KB


In [45]:
# plays_df.info()
# plays_df.head(40)
plays_df.specialTeamsPlayType.unique() # Will need to target 'Field Goal', 'Extra Point'

array(['Kickoff', 'Punt', 'Field Goal', 'Extra Point'], dtype=object)

In [27]:
tracking_2018_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12777351 entries, 0 to 12777350
Data columns (total 18 columns):
 #   Column         Dtype  
---  ------         -----  
 0   time           object 
 1   x              float64
 2   y              float64
 3   s              float64
 4   a              float64
 5   dis            float64
 6   o              float64
 7   dir            float64
 8   event          object 
 9   nflId          float64
 10  displayName    object 
 11  jerseyNumber   float64
 12  position       object 
 13  team           object 
 14  frameId        int64  
 15  gameId         int64  
 16  playId         int64  
 17  playDirection  object 
dtypes: float64(9), int64(3), object(6)
memory usage: 1.7+ GB


In [28]:
tracking_2019_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12170933 entries, 0 to 12170932
Data columns (total 18 columns):
 #   Column         Dtype  
---  ------         -----  
 0   time           object 
 1   x              float64
 2   y              float64
 3   s              float64
 4   a              float64
 5   dis            float64
 6   o              float64
 7   dir            float64
 8   event          object 
 9   nflId          float64
 10  displayName    object 
 11  jerseyNumber   float64
 12  position       object 
 13  team           object 
 14  frameId        int64  
 15  gameId         int64  
 16  playId         int64  
 17  playDirection  object 
dtypes: float64(9), int64(3), object(6)
memory usage: 1.6+ GB


In [29]:
tracking_2020_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11821701 entries, 0 to 11821700
Data columns (total 18 columns):
 #   Column         Dtype  
---  ------         -----  
 0   time           object 
 1   x              float64
 2   y              float64
 3   s              float64
 4   a              float64
 5   dis            float64
 6   o              float64
 7   dir            float64
 8   event          object 
 9   nflId          float64
 10  displayName    object 
 11  jerseyNumber   float64
 12  position       object 
 13  team           object 
 14  frameId        int64  
 15  gameId         int64  
 16  playId         int64  
 17  playDirection  object 
dtypes: float64(9), int64(3), object(6)
memory usage: 1.6+ GB


## Wrangle Data

### Feature Engineering

### Response Engineering