# NFL Big Data Bowl 2022 Notebook
### Author: Conrad Bituin

## Topic to Analyze

Quantify special teams strategy. Special teams’ coaches are among the most creative and innovative in the league. Compare/contrast how each team game plans. Which strategies yield the best results? What are other strategies that could be adopted?


### References
- [Official Competition Page](https://www.kaggle.com/c/nfl-big-data-bowl-2022/overview)
- [Official Explanation of Data](https://www.kaggle.com/c/nfl-big-data-bowl-2022/data)
- [Beginner Notebook via Kaggle](https://www.kaggle.com/werooring/nfl-big-data-bowl-basic-eda-for-beginner/notebook)
- [Previous Bowl Recaps](https://operations.nfl.com/gameday/analytics/big-data-bowl/past-big-data-bowl-recaps/)

## Background

As of the writing of this notebook (Week 14, 2021 Season), NFL kickers have collectively missed 69 extra point attempts (PATs) and 122 field goals in the 2021 season [according to Pro Football Reference](https://www.pro-football-reference.com/). The intent of this notebook is to analyze the various situations and scenarios that could affect PAT and field goal outcomes. The model produced will attempt to find the optimal field conditions that will produce the highest likelihood for successful extra point and field goal attempts.

## Acquire Data

In [1]:
# Common imports

import matplotlib.pyplot as plt
import matplotlib as mpl
from matplotlib import cm
import numpy as np
import pandas as pd

In [None]:
# File imports

# The games.csv contains the teams playing in each game. The key variable is gameId.
games_df = pd.read_csv('../data/games.csv')

# The PFFScoutingData.csv file contains play-level scouting information for each game. The key variables are gameId and playId.
pffscouting_df = pd.read_csv('../data/PFFScoutingData.csv')

# The players.csv file contains player-level information from players that participated in any of the tracking data files. The key variable is nflId.
players_df = pd.read_csv('../data/players.csv')

# The plays.csv file contains play-level information from each game. The key variables are gameId and playId.
plays_df = pd.read_csv('../data/plays.csv')

#Files tracking[season].csv contain player tracking data from season [season]. The key variables are gameId, playId, and nflId.
tracking_2018_df = pd.read_csv('../data/tracking2018.csv')
tracking_2019_df = pd.read_csv('../data/tracking2019.csv')
tracking_2020_df = pd.read_csv('../data/tracking2020.csv')

## Analyze and Describe Data

### Summary 

Based on the data provided for consideration, there are a total of:
- 764 games
- 2732 players
- 4435 plays
  - 1986 plays were Field Goal attempts
  - 2345 plays were Extra Point attempts 


### Overview of Games, Plays, and Players

In [None]:
print('Dataframe: Games')
games_df.head()

In [None]:
print('Dataframe: Players')
players_df.head()

In [None]:
print('Dataframe: Plays')
plays_df.head()

### Creating Test Sets

In [None]:
from sklearn.model_selection import train_test_split

fg_df = plays_df[plays_df.specialTeamsPlayType == 'Field Goal']
pat_df = plays_df[plays_df.specialTeamsPlayType == 'Extra Point']

fg_train_set, fg_test_set = train_test_split(fg_df, test_size=0.2, random_state=42)
fg_train = fg_train_set.copy()
fg_test = fg_test_set.copy()

pat_train_set, pat_test_set = train_test_split(pat_df, test_size=0.2, random_state=42)
pat_train = pat_train_set.copy()
pat_test = pat_test_set.copy()

In [None]:
# Preparing Data for Visualization
%matplotlib inline

# Field Goal Plays
fg_df_by_quarter = fg_train.groupby('quarter').nunique().reset_index()[['quarter','playId']]
fg_df_by_down = fg_train.groupby('down').nunique().reset_index()[['down','playId']]

fg_success_by_kicker = fg_train[fg_train.specialTeamsResult == 'Kick Attempt Good'].groupby('kickerId').nunique().reset_index()[['kickerId','playId']]
fg_total_by_kicker = fg_train.groupby('kickerId').nunique().reset_index()[['kickerId','playId']]

fg_success_by_team = fg_train[fg_train.specialTeamsResult == 'Kick Attempt Good'].groupby('possessionTeam').nunique().reset_index()[['possessionTeam','playId']]
fg_total_by_team = fg_train.groupby('possessionTeam').nunique().reset_index()[['possessionTeam','playId']]

# PAT Plays
pat_df_by_quarter = pat_train.groupby('quarter').nunique().reset_index()[['quarter','playId']]
pat_df_by_down = pat_train.groupby('down').nunique().reset_index()[['down','playId']]

pat_success_by_kicker = pat_train[pat_train.specialTeamsResult == 'Kick Attempt Good'].groupby('kickerId').nunique().reset_index()[['kickerId','playId']]
pat_total_by_kicker = pat_train.groupby('kickerId').nunique().reset_index()[['kickerId','playId']]

pat_success_by_team = pat_train[pat_train.specialTeamsResult == 'Kick Attempt Good'].groupby('possessionTeam').nunique().reset_index()[['possessionTeam','playId']]
pat_total_by_team = pat_train.groupby('possessionTeam').nunique().reset_index()[['possessionTeam','playId']]


#### Visualizations of Field Goal and Extra Point Attempts

##### Field Goals

In [None]:
plt.figure(figsize=(15,10))

# Field Goals per Quarter
plt.subplot(1,2,1)
plt.bar(fg_df_by_quarter.quarter, fg_df_by_quarter.playId)
plt.title('Number of Field Goals per Quarter (2018-2020 Seasons)')
plt.xlabel('Quarter')
plt.ylabel('Number of Field Goals Attempted')
plt.xlim(0, 6)


# Field Goals per Down
plt.subplot(1,2,2)
plt.bar(fg_df_by_down.down, fg_df_by_down.playId)
plt.title('Number of Field Goals per Down (2018-2020 Seasons)')
plt.xlabel('Down')
plt.ylabel('Number of Field Goals Attempted')
plt.xlim(0, 5)
plt.show()


In [None]:
plt.figure(figsize=(15, 5))

plt.plot(fg_success_by_kicker.kickerId, fg_success_by_kicker.playId, label='Number of Successful Field Goals')
plt.xlabel('Kicker ID')
plt.ylabel('Number of Field Goals')

plt.plot(fg_total_by_kicker.kickerId, fg_total_by_kicker.playId, label='Number of Attempted Field Goals')
plt.xlabel('Kicker ID')
plt.ylabel('Number of Field Goals')

plt.legend()
plt.suptitle('Field Goals by Kicker ID, 2018-2020 Seasons')
plt.show()

In [None]:
plt.figure(figsize=(15, 5))

plt.plot(fg_success_by_team.possessionTeam, fg_success_by_team.playId, label='Number of Successful Field Goals')
plt.xlabel('Team')
plt.ylabel('Number of Field Goals')

plt.plot(fg_total_by_team.possessionTeam, fg_total_by_team.playId, label='Number of Attempted Field Goals')
plt.xlabel('Team')
plt.ylabel('Number of Field Goals')

plt.legend()
plt.suptitle('Field Goals by Team, 2018-2020 Seasons')
plt.show()

##### PATs

In [None]:
# PATs per Quarter
plt.figure(figsize=(15,10))

plt.bar(pat_df_by_quarter.quarter, pat_df_by_quarter.playId)
plt.title('Number of PATs per Quarter (2018-2020 Seasons)')
plt.xlabel('Quarter')
plt.ylabel('Number of PATs Attempted')
plt.xlim(0, 6)

plt.show()

In [None]:
plt.figure(figsize=(15, 5))

plt.plot(pat_success_by_kicker.kickerId, pat_success_by_kicker.playId, label='Number of Successful PATs')
plt.xlabel('Kicker ID')
plt.ylabel('Number of PATs')

plt.plot(pat_total_by_kicker.kickerId, pat_total_by_kicker.playId, label='Number of Attempted PATs')
plt.xlabel('Kicker ID')
plt.ylabel('Number of PATs')

plt.legend()
plt.suptitle('PATs by Kicker ID, 2018-2020 Seasons')
plt.show()

In [None]:
plt.figure(figsize=(15, 5))

plt.plot(pat_success_by_team.possessionTeam, pat_success_by_team.playId, label='Number of Successful PATs')
plt.xlabel('Team')
plt.ylabel('Number of PATs')

plt.plot(pat_total_by_team.possessionTeam, pat_total_by_team.playId, label='Number of Attempted PATs')
plt.xlabel('Team')
plt.ylabel('Number of PATs')

plt.legend()
plt.suptitle('PATs by Team, 2018-2020 Seasons')
plt.show()

### Overview of Tracking Data for NFL Seasons 2018-2020

In [None]:
print('Dataframe: 2018 Season')
tracking_2018_df.head()

In [None]:
print('Dataframe: 2019 Season')
tracking_2019_df.head()

In [None]:
print('Dataframe: 2020 Season')
tracking_2020_df.head()

## Wrangle Data

In [None]:
# fg_df has 2657 records

# The predicted responses should follow the categories found in `specialTeamsResult`. 
# Ultimately, this notebook will attempt to predict scenarios where `specialTeamsResult` will be `Kick Attempt Good`.

fg_train.specialTeamsResult.unique()

In [None]:
# Combining season tracking data for holistic view of potential features that impact the success of the FG/PAT

season_tracking_df = pd.concat([tracking_2018_df, tracking_2019_df, tracking_2020_df])

In [None]:
# Reducing the season tracking data to the initial position for Field Goal plays
fg_initial_position = season_tracking_df[season_tracking_df.gameId.isin(fg_df.gameId) & 
                                         (season_tracking_df.frameId == 1) & 
                                         (season_tracking_df.position == 'K')]

# Reducing the season tracking data to the initial position for PAT plays
pat_initial_position = season_tracking_df[season_tracking_df.gameId.isin(pat_df.gameId) & 
                                         (season_tracking_df.frameId == 1) & 
                                         (season_tracking_df.position == 'K')]

In [None]:
fg_num_feats = ['yardsToGo', 'kickLength', 'preSnapHomeScore', 'preSnapVisitorScore', 'playResult', 'absoluteYardlineNumber']

fg_cat_feats = ['quarter', 'down', 'kickerId']

In [None]:
# pat_df has 3488 records

# The predicted responses should follow the categories found in `specialTeamsResult`. 
# Ultimately, this notebook will attempt to predict scenarios where `specialTeamsResult` will be `Kick Attempt Good`.

pat_train.specialTeamsResult.unique()

### Feature Engineering

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer

num_pipeline = Pipeline([
    ('imputer', SimpleImputer(strategy = 'mean')),
    ('std_scaler', StandardScaler())
])

full_pipeline = ColumnTransformer([
    ('num', num_pipeline, fg_num_feats),
])

In [None]:
fg_prepared = fg_train[['yardsToGo', 'kickLength', 'preSnapHomeScore', 'preSnapVisitorScore', 'playResult', 'absoluteYardlineNumber', 'quarter', 'down', 'kickerId']]
fg_features = full_pipeline.fit_transform(fg_prepared)
fg_features

### Response Engineering

In [None]:
# fg_df
fg_labels = pd.get_dummies(fg_train, columns=['specialTeamsResult']).rename(columns={'specialTeamsResult_Kick Attempt Good': 'fg_attempt_good'})
fg_response = fg_labels.fg_attempt_good

# pat_df
pat_labels = pd.get_dummies(pat_train, columns=['specialTeamsResult']).rename(columns={'specialTeamsResult_Kick Attempt Good': 'pat_attempt_good'})
pat_response = pat_labels.pat_attempt_good

## Model Data

In [None]:
from sklearn.linear_model import LogisticRegression

log_reg = LogisticRegression(solver='lbfgs', random_state=42)
log_reg.fit(fg_features, fg_response)

In [None]:
log_reg.coef_

In [None]:
log_reg.intercept_

In [None]:
fg_prepared_preds = fg_test[['yardsToGo', 'kickLength', 'preSnapHomeScore', 'preSnapVisitorScore', 'playResult', 'absoluteYardlineNumber', 'quarter', 'down', 'kickerId']]
fg_predict = full_pipeline.fit_transform(fg_prepared_preds)

print('Predictions:', log_reg.predict(fg_predict))

In [None]:
fg_labels = pd.get_dummies(fg_test, columns=['specialTeamsResult']).rename(columns={'specialTeamsResult_Kick Attempt Good': 'fg_attempt_good'})
fg_labels = fg_labels.fg_attempt_good
print('Labels:', list(fg_labels))

In [None]:
from sklearn.metrics import accuracy_score

print("accuracy score: ", accuracy_score(fg_labels, log_reg.predict(fg_predict)))

## Solution

This is the solution, tbd.