# NFL Tackling Analysis

### Kaggle Competition: [NFL Big Data Bowl 2024](https://www.kaggle.com/competitions/nfl-big-data-bowl-2024/overview)

### nflverse data: [nflverse-data GitHub](https://github.com/nflverse/nflverse-data/releases/tag/pbp)

In [24]:
import pandas as pd
import numpy as np

## File descriptions:

Game data: The `games.csv` contains the teams playing in each game. The key variable is **`gameId`**.

Play data: The plays.csv file contains play-level information from each game. The key variables are **`gameId`** and **`playId`**.

Player data: The `players.csv` file contains player-level information from players that participated in any of the tracking data files. The key variable is **`nflId`**.

Tackles data: The `tackles.csv` file contains player-level tackle information for each game and play. The key variables are **`gameId`**, **`playId`**, and **`nflId`**.

Tracking data: Files `tracking_week_[week].csv` contain player tracking data from week number `[week]`. The key variables are **`gameId`**, **`playId`**, and **`nflId`**.

# EDA Part 1: initial look at CSV and cleaning dtypes if necessary

## Game data

- **`gameId`**: Game identifier, unique (numeric)
- `season`: Season of game
- `week`: Week of game
- `gameDate`: Game Date (time, mm/dd/yyyy)
- `gameTimeEastern`: Start time of game (time, HH:MM:SS, EST)
- `homeTeamAbbr`: Home team three-letter code (text)
- `visitorTeamAbbr`: Visiting team three-letter code (text)
- `homeFinalScore`: The total amount of points scored by the home team in the game (numeric)
- `visitorFinalScore`: The total amount of points scored by the visiting team in the game (numeric)

In [36]:
df_games = pd.read_csv("data/games.csv")
df_games.head()

Unnamed: 0,gameId,season,week,gameDate,gameTimeEastern,homeTeamAbbr,visitorTeamAbbr,homeFinalScore,visitorFinalScore
0,2022090800,2022,1,09/08/2022,20:20:00,LA,BUF,10,31
1,2022091100,2022,1,09/11/2022,13:00:00,ATL,NO,26,27
2,2022091101,2022,1,09/11/2022,13:00:00,CAR,CLE,24,26
3,2022091102,2022,1,09/11/2022,13:00:00,CHI,SF,19,10
4,2022091103,2022,1,09/11/2022,13:00:00,CIN,PIT,20,23


In [37]:
df_games.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 136 entries, 0 to 135
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   gameId             136 non-null    int64 
 1   season             136 non-null    int64 
 2   week               136 non-null    int64 
 3   gameDate           136 non-null    object
 4   gameTimeEastern    136 non-null    object
 5   homeTeamAbbr       136 non-null    object
 6   visitorTeamAbbr    136 non-null    object
 7   homeFinalScore     136 non-null    int64 
 8   visitorFinalScore  136 non-null    int64 
dtypes: int64(5), object(4)
memory usage: 9.7+ KB


In [38]:
df_games['gameDate'] = pd.to_datetime(df_games['gameDate'])
df_games.head()

Unnamed: 0,gameId,season,week,gameDate,gameTimeEastern,homeTeamAbbr,visitorTeamAbbr,homeFinalScore,visitorFinalScore
0,2022090800,2022,1,2022-09-08,20:20:00,LA,BUF,10,31
1,2022091100,2022,1,2022-09-11,13:00:00,ATL,NO,26,27
2,2022091101,2022,1,2022-09-11,13:00:00,CAR,CLE,24,26
3,2022091102,2022,1,2022-09-11,13:00:00,CHI,SF,19,10
4,2022091103,2022,1,2022-09-11,13:00:00,CIN,PIT,20,23


In [51]:
df_games.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 136 entries, 0 to 135
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   gameId             136 non-null    int64         
 1   season             136 non-null    int64         
 2   week               136 non-null    int64         
 3   gameDate           136 non-null    datetime64[ns]
 4   gameTimeEastern    136 non-null    object        
 5   homeTeamAbbr       136 non-null    object        
 6   visitorTeamAbbr    136 non-null    object        
 7   homeFinalScore     136 non-null    int64         
 8   visitorFinalScore  136 non-null    int64         
dtypes: datetime64[ns](1), int64(5), object(3)
memory usage: 9.7+ KB


In [52]:
df_games.to_csv('data/games_clean.csv', index=False)

In [53]:
df1 = pd.read_csv('data/games_clean.csv')
df1.head()

Unnamed: 0,gameId,season,week,gameDate,gameTimeEastern,homeTeamAbbr,visitorTeamAbbr,homeFinalScore,visitorFinalScore
0,2022090800,2022,1,2022-09-08,20:20:00,LA,BUF,10,31
1,2022091100,2022,1,2022-09-11,13:00:00,ATL,NO,26,27
2,2022091101,2022,1,2022-09-11,13:00:00,CAR,CLE,24,26
3,2022091102,2022,1,2022-09-11,13:00:00,CHI,SF,19,10
4,2022091103,2022,1,2022-09-11,13:00:00,CIN,PIT,20,23


## Player data
- **`nflId`**: Player identification number, unique across players (numeric)
- `height`: Player height (text)
- `weight`: Player weight (numeric)
- `birthDate`: Date of birth (YYYY-MM-DD)
- `collegeName`: Player college (text)
- `position`: Official player position (text)
- `displayName`: Player name (text)

In [46]:
df_players = pd.read_csv("data/players.csv")
df_players.head()

Unnamed: 0,nflId,height,weight,birthDate,collegeName,position,displayName
0,25511,6-4,225,1977-08-03,Michigan,QB,Tom Brady
1,29550,6-4,328,1982-01-22,Arkansas,T,Jason Peters
2,29851,6-2,225,1983-12-02,California,QB,Aaron Rodgers
3,30842,6-6,267,1984-05-19,UCLA,TE,Marcedes Lewis
4,33084,6-4,217,1985-05-17,Boston College,QB,Matt Ryan


In [47]:
df_players.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1683 entries, 0 to 1682
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   nflId        1683 non-null   int64 
 1   height       1683 non-null   object
 2   weight       1683 non-null   int64 
 3   birthDate    1204 non-null   object
 4   collegeName  1683 non-null   object
 5   position     1683 non-null   object
 6   displayName  1683 non-null   object
dtypes: int64(2), object(5)
memory usage: 92.2+ KB


In [48]:
df_players["height_inch"] = (df_players.height.str.split("-").str[0].astype(int) * 12) + (
    df_players.height.str.split("-").str[1].astype(int)
)
df_players['height_cm'] = df_players['height_inch'] * 2.54
df_players['weight_kg'] = df_players['weight'] / 2.205
df_players['birthDate']= pd.to_datetime(df_players['birthDate'], format='mixed')
df_players.head()

Unnamed: 0,nflId,height,weight,birthDate,collegeName,position,displayName,height_inch,height_cm,weight_kg
0,25511,6-4,225,1977-08-03,Michigan,QB,Tom Brady,76,193.04,102.040816
1,29550,6-4,328,1982-01-22,Arkansas,T,Jason Peters,76,193.04,148.752834
2,29851,6-2,225,1983-12-02,California,QB,Aaron Rodgers,74,187.96,102.040816
3,30842,6-6,267,1984-05-19,UCLA,TE,Marcedes Lewis,78,198.12,121.088435
4,33084,6-4,217,1985-05-17,Boston College,QB,Matt Ryan,76,193.04,98.412698


In [50]:
pd.set_option("display.precision", 2)
df_players_final = df_players[['nflId', 'displayName', 'birthDate', 'collegeName', 'position', 'height_cm', 'weight_kg']]
df_players_final.head()

Unnamed: 0,nflId,displayName,birthDate,collegeName,position,height_cm,weight_kg
0,25511,Tom Brady,1977-08-03,Michigan,QB,193.04,102.04
1,29550,Jason Peters,1982-01-22,Arkansas,T,193.04,148.75
2,29851,Aaron Rodgers,1983-12-02,California,QB,187.96,102.04
3,30842,Marcedes Lewis,1984-05-19,UCLA,TE,198.12,121.09
4,33084,Matt Ryan,1985-05-17,Boston College,QB,193.04,98.41


In [54]:
df_players_final.to_csv('data/players_clean.csv', index=False)
df_players_final.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1683 entries, 0 to 1682
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   nflId        1683 non-null   int64         
 1   displayName  1683 non-null   object        
 2   birthDate    1204 non-null   datetime64[ns]
 3   collegeName  1683 non-null   object        
 4   position     1683 non-null   object        
 5   height_cm    1683 non-null   float64       
 6   weight_kg    1683 non-null   float64       
dtypes: datetime64[ns](1), float64(2), int64(1), object(3)
memory usage: 92.2+ KB


In [55]:
df2 = pd.read_csv('data/players_clean.csv')
df2.head()

Unnamed: 0,nflId,displayName,birthDate,collegeName,position,height_cm,weight_kg
0,25511,Tom Brady,1977-08-03,Michigan,QB,193.04,102.04
1,29550,Jason Peters,1982-01-22,Arkansas,T,193.04,148.75
2,29851,Aaron Rodgers,1983-12-02,California,QB,187.96,102.04
3,30842,Marcedes Lewis,1984-05-19,UCLA,TE,198.12,121.09
4,33084,Matt Ryan,1985-05-17,Boston College,QB,193.04,98.41


## Play data
- **`gameId`**: Game identifier, unique (numeric)
- **`playId`**: Play identifier, not unique across games (numeric)
- `ballCarrierId`: The nflId of the ball carrier (receiver of the handoff, receiver of pass or the QB scrambling) on the play. This is the player that the defense is attempting to tackle. (numeric)
- `ballCarrierName`: The displayName of the ball carrier on the play (text)
- `playDescription`: Description of play (text)
- `quarter`: Game quarter (numeric)
- `down`: Down (numeric)
- `yardsToGo`: Distance needed for a first down (numeric)
- `possessionTeam`: Team abbr of team on offense with possession of ball (text)
- `defensiveTeam`: Team abbr of team on defense (text)
- `yardlineSide`: 3-letter team code corresponding to line-of-scrimmage (text)
- `yardlineNumber`: Yard line at line-of-scrimmage (numeric)
- `gameClock`: Time on clock of play (MM:SS)
- `preSnapHomeScore`: Home score prior to the play (numeric)
- `preSnapVisitorScore`: Visiting team score prior to the play (numeric)
- `passResult`: Dropback outcome of the play (C: Complete pass, I: Incomplete pass, S: Quarterback sack, IN: Intercepted pass, R: Scramble, text)
- `passLength`: The distance beyond the LOS that the ball traveled not including yards into the endzone. If thrown behind LOS, the value is negative. (numeric)
- `penaltyYards`: yards gained by offense by penalty (numeric)
- `prePenaltyPlayResult`: Net yards gained by the offense, before penalty yardage (numeric)
- `playResult`: Net yards gained by the offense, including penalty yardage (numeric)
- `playNullifiedByPenalty`: Whether or not an accepted penalty on the play cancels the play outcome. Y stands for yes and N stands for no. (text)
- `absoluteYardlineNumber`: Distance from end zone for possession team (numeric)
- `offenseFormation`: Formation used by possession team (text)
- `defendersInTheBox`: Number of defenders in close proximity to line-of-scrimmage (numeric)
- `passProbability`: NGS probability of next play being pass (as opposed to rush) based off model without tracking data inputs (numeric)
- `preSnapHomeTeamWinProbability`: The win probability of the home team before the play (numeric)
- `preSnapVisitorTeamWinProbability`: The win probability of the visiting team before the play (numeric)
- `preSnapHomeTeamWinProbabilityAdded`: Win probability delta for home team (numeric)
- `preSnapVisitorTeamWinProbabilityAdded`: Win probability delta for visitor team (numeric)
- `expectedPoints`: Expected points on this play (numeric)
- `expectedPointsAdded`: Delta of expected points on this play (numeric)
- `foulName[i]`: Name of the i-th penalty committed during the play. i ranges between 1 and 2 (text)
- `foulNFLId[i]`: nflId of the player who comitted the i-th penalty during the play. i ranges between 1 and 2 (numeric)

In [149]:
df_plays = pd.read_csv("data/plays.csv")
df_plays.head()

Unnamed: 0,gameId,playId,ballCarrierId,ballCarrierDisplayName,playDescription,quarter,down,yardsToGo,possessionTeam,defensiveTeam,...,preSnapHomeTeamWinProbability,preSnapVisitorTeamWinProbability,homeTeamWinProbabilityAdded,visitorTeamWinProbilityAdded,expectedPoints,expectedPointsAdded,foulName1,foulName2,foulNFLId1,foulNFLId2
0,2022100908,3537,48723,Parker Hesse,(7:52) (Shotgun) M.Mariota pass short middle t...,4,1,10,ATL,TB,...,0.98,0.02,-0.00611,0.00611,2.36,0.98,,,,
1,2022091103,3126,52457,Chase Claypool,(7:38) (Shotgun) C.Claypool right end to PIT 3...,4,1,10,PIT,CIN,...,0.16,0.84,-0.0109,0.0109,1.73,-0.26,,,,
2,2022091111,1148,42547,Darren Waller,(8:57) D.Carr pass short middle to D.Waller to...,2,2,5,LV,LAC,...,0.76,0.24,-0.0374,0.0374,1.31,1.13,,,,
3,2022100212,2007,46461,Mike Boone,(13:12) M.Boone left tackle to DEN 44 for 7 ya...,3,2,10,DEN,LV,...,0.62,0.38,-0.00245,0.00245,1.64,-0.04,,,,
4,2022091900,1372,47857,Devin Singletary,(8:33) D.Singletary right guard to TEN 32 for ...,2,1,10,BUF,TEN,...,0.84,0.16,0.00105,-0.00105,3.69,-0.17,,,,


In [150]:
df_plays['noHuddle'] = df_plays['playDescription'].str.contains('No Huddle').astype(int)

In [151]:
df_plays['playNullifiedByPenalty'] = df_plays['playNullifiedByPenalty'].map({'Y': 1, 'N': 0})

In [152]:
df_plays['defendersInTheBox'].fillna(6.0, inplace=True)
df_plays['defendersInTheBox'] = df_plays['defendersInTheBox'].astype(int)

In [153]:
df_plays['passLength'] = df_plays['passLength'].fillna(pd.NA).astype('Int64')
df_plays['penaltyYards'] = df_plays['penaltyYards'].fillna(pd.NA).astype('Int64')

In [154]:
df_plays['expectedPointsAdded'].fillna(0.30, inplace=True)

In [157]:
df_plays.drop(columns = ['foulNFLId1', 'foulNFLId2'], inplace=True)

In [158]:
df_plays.to_csv('data/plays_cleaned.csv', index=False)
df_plays.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12486 entries, 0 to 12485
Data columns (total 34 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   gameId                            12486 non-null  int64  
 1   playId                            12486 non-null  int64  
 2   ballCarrierId                     12486 non-null  int64  
 3   ballCarrierDisplayName            12486 non-null  object 
 4   playDescription                   12486 non-null  object 
 5   quarter                           12486 non-null  int64  
 6   down                              12486 non-null  int64  
 7   yardsToGo                         12486 non-null  int64  
 8   possessionTeam                    12486 non-null  object 
 9   defensiveTeam                     12486 non-null  object 
 10  yardlineSide                      12319 non-null  object 
 11  yardlineNumber                    12486 non-null  int64  
 12  game

## Tackles data
- **`gameId`**: Game identifier, unique (numeric)
- **`playId`**: Play identifier, not unique across games (numeric)
- **`nflId`**: Player identification number, unique across players (numeric)
- `tackle`: Indicator for whether the given player made a tackle on the play (binary)
- `assist`: Indicator for whether the given player made an assist tackle on the play (binary)
- `forcedFumble`: Indicator for whether the given player forced a fumble on the play (binary)
- `pff_missedTackle`: Provided by Pro Football Focus (PFF). Indicator for whether the given player missed a tackle on the play (binary)

In [160]:
df_tackles = pd.read_csv("data/tackles.csv")
df_tackles.head()

Unnamed: 0,gameId,playId,nflId,tackle,assist,forcedFumble,pff_missedTackle
0,2022090800,101,42816,1,0,0,0
1,2022090800,393,46232,1,0,0,0
2,2022090800,486,40166,1,0,0,0
3,2022090800,646,47939,1,0,0,0
4,2022090800,818,40107,1,0,0,0


In [166]:
df_tackles.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17426 entries, 0 to 17425
Data columns (total 7 columns):
 #   Column            Non-Null Count  Dtype
---  ------            --------------  -----
 0   gameId            17426 non-null  int64
 1   playId            17426 non-null  int64
 2   nflId             17426 non-null  int64
 3   tackle            17426 non-null  int64
 4   assist            17426 non-null  int64
 5   forcedFumble      17426 non-null  int64
 6   pff_missedTackle  17426 non-null  int64
dtypes: int64(7)
memory usage: 953.1 KB


## Tracking data
Files `tracking_week_[week].csv` contains player tracking data from week `[week]`.

- **`gameId`**: Game identifier, unique (numeric)
- **`playId`**: Play identifier, not unique across games (numeric)
- **`nflId`**: Player identification number, unique across players. When value is NA, row corresponds to ball. (numeric)
- `displayName`: Player name (text)
- `frameId`: Frame identifier for each play, starting at 1 (numeric)
- `time`: Time stamp of play (time, yyyy-mm-dd, hh:mm:ss)
- `jerseyNumber`: Jersey number of player (numeric)
- `club`: Team abbrevation of corresponding player (text)
- `playDirection`: Direction that the offense is moving (left or right)
- `x`: Player position along the long axis of the field, 0 - 120 yards. See Figure 1 below. (numeric)
- `y`: Player position along the short axis of the field, 0 - 53.3 yards. See Figure 1 below. (numeric)
- `s`: Speed in yards/second (numeric)
- `a`: Speed in yards/second^2 (numeric)
- `dis`: Distance traveled from prior time point, in yards (numeric)
- `o`: Player orientation (deg), 0 - 360 degrees (numeric)
- `dir`: Angle of player motion (deg), 0 - 360 degrees (numeric)
- `event`: Tagged play details, including moment of ball snap, pass release, pass catch, tackle, etc (text)

![field_position](static/field_positioning.png)

In [210]:
df_tracking_1 = pd.read_csv("data/tracking_week_1.csv")
df_tracking_1['nflId'].fillna(10000.0, inplace=True)
df_tracking_1['nflId'] = df_tracking_1['nflId'].astype(int)
df_tracking_1['jerseyNumber'].fillna(999.0, inplace=True)
df_tracking_1['jerseyNumber'] = df_tracking_1['jerseyNumber'].astype(int)
df_tracking_1.to_csv('data/tracking_1_clean.csv', index=False)
df_tracking_1.head()

Unnamed: 0,gameId,playId,nflId,displayName,frameId,time,jerseyNumber,club,playDirection,x,y,s,a,dis,o,dir,event
0,2022090800,56,35472,Rodger Saffold,1,2022-09-08 20:24:05.200000,76,BUF,left,88.37,27.27,1.62,1.15,0.16,231.74,147.9,
1,2022090800,56,35472,Rodger Saffold,2,2022-09-08 20:24:05.299999,76,BUF,left,88.47,27.13,1.67,0.61,0.17,230.98,148.53,pass_arrived
2,2022090800,56,35472,Rodger Saffold,3,2022-09-08 20:24:05.400000,76,BUF,left,88.56,27.01,1.57,0.49,0.15,230.98,147.05,
3,2022090800,56,35472,Rodger Saffold,4,2022-09-08 20:24:05.500000,76,BUF,left,88.64,26.9,1.44,0.89,0.14,232.38,145.42,
4,2022090800,56,35472,Rodger Saffold,5,2022-09-08 20:24:05.599999,76,BUF,left,88.72,26.8,1.29,1.24,0.13,233.36,141.95,


In [211]:
df_tracking_2 = pd.read_csv("data/tracking_week_2.csv")
df_tracking_2['nflId'].fillna(10000.0, inplace=True)
df_tracking_2['nflId'] = df_tracking_2['nflId'].astype(int)
df_tracking_2['jerseyNumber'].fillna(999.0, inplace=True)
df_tracking_2['jerseyNumber'] = df_tracking_2['jerseyNumber'].astype(int)
df_tracking_2.to_csv('data/tracking_2_clean.csv', index=False)
df_tracking_2.head()

Unnamed: 0,gameId,playId,nflId,displayName,frameId,time,jerseyNumber,club,playDirection,x,y,s,a,dis,o,dir,event
0,2022091500,55,40011,Travis Kelce,1,2022-09-15 20:16:32.700000,87,KC,left,87.2,24.9,0.0,0.0,0.0,263.11,138.55,
1,2022091500,55,40011,Travis Kelce,2,2022-09-15 20:16:32.799999,87,KC,left,87.2,24.9,0.0,0.0,0.0,263.11,142.54,
2,2022091500,55,40011,Travis Kelce,3,2022-09-15 20:16:32.900000,87,KC,left,87.2,24.9,0.0,0.0,0.0,262.47,143.82,
3,2022091500,55,40011,Travis Kelce,4,2022-09-15 20:16:33.000000,87,KC,left,87.2,24.9,0.0,0.0,0.0,262.47,149.71,
4,2022091500,55,40011,Travis Kelce,5,2022-09-15 20:16:33.099999,87,KC,left,87.2,24.9,0.01,0.15,0.0,262.47,309.38,


In [212]:
df_tracking_3 = pd.read_csv("data/tracking_week_3.csv")
df_tracking_3['nflId'].fillna(10000.0, inplace=True)
df_tracking_3['nflId'] = df_tracking_3['nflId'].astype(int)
df_tracking_3['jerseyNumber'].fillna(999.0, inplace=True)
df_tracking_3['jerseyNumber'] = df_tracking_3['jerseyNumber'].astype(int)
df_tracking_3.to_csv('data/tracking_3_clean.csv', index=False)
df_tracking_3.head()

Unnamed: 0,gameId,playId,nflId,displayName,frameId,time,jerseyNumber,club,playDirection,x,y,s,a,dis,o,dir,event
0,2022092200,56,35449,Tyson Alualu,1,2022-09-22 20:16:26.500000,94,PIT,left,84.1,23.83,0.0,0.0,0.0,97.54,272.43,
1,2022092200,56,35449,Tyson Alualu,2,2022-09-22 20:16:26.599999,94,PIT,left,84.1,23.83,0.0,0.0,0.0,97.54,269.87,
2,2022092200,56,35449,Tyson Alualu,3,2022-09-22 20:16:26.700000,94,PIT,left,84.1,23.83,0.0,0.0,0.0,97.54,269.98,
3,2022092200,56,35449,Tyson Alualu,4,2022-09-22 20:16:26.799999,94,PIT,left,84.1,23.83,0.0,0.0,0.01,97.54,284.87,
4,2022092200,56,35449,Tyson Alualu,5,2022-09-22 20:16:26.900000,94,PIT,left,84.1,23.83,0.0,0.0,0.0,97.54,281.79,


In [213]:
df_tracking_4 = pd.read_csv("data/tracking_week_4.csv")
df_tracking_4['nflId'].fillna(10000.0, inplace=True)
df_tracking_4['nflId'] = df_tracking_4['nflId'].astype(int)
df_tracking_4['jerseyNumber'].fillna(999.0, inplace=True)
df_tracking_4['jerseyNumber'] = df_tracking_4['jerseyNumber'].astype(int)
df_tracking_4.to_csv('data/tracking_4_clean.csv', index=False)
df_tracking_4.head()

Unnamed: 0,gameId,playId,nflId,displayName,frameId,time,jerseyNumber,club,playDirection,x,y,s,a,dis,o,dir,event
0,2022092900,57,42654,La'el Collins,1,2022-09-29 20:16:00.099999,71,CIN,left,86.21,30.88,0.0,0.0,0.0,262.6,246.29,
1,2022092900,57,42654,La'el Collins,2,2022-09-29 20:16:00.200000,71,CIN,left,86.21,30.88,0.0,0.0,0.0,263.32,234.76,
2,2022092900,57,42654,La'el Collins,3,2022-09-29 20:16:00.299999,71,CIN,left,86.21,30.87,0.01,0.23,0.01,263.32,173.85,
3,2022092900,57,42654,La'el Collins,4,2022-09-29 20:16:00.400000,71,CIN,left,86.21,30.86,0.07,0.69,0.01,263.32,191.83,
4,2022092900,57,42654,La'el Collins,5,2022-09-29 20:16:00.500000,71,CIN,left,86.21,30.85,0.19,1.01,0.02,263.32,192.85,


In [214]:
df_tracking_5 = pd.read_csv("data/tracking_week_5.csv")
df_tracking_5['nflId'].fillna(10000.0, inplace=True)
df_tracking_5['nflId'] = df_tracking_5['nflId'].astype(int)
df_tracking_5['jerseyNumber'].fillna(999.0, inplace=True)
df_tracking_5['jerseyNumber'] = df_tracking_5['jerseyNumber'].astype(int)
df_tracking_5.to_csv('data/tracking_5_clean.csv', index=False)
df_tracking_5.head()

Unnamed: 0,gameId,playId,nflId,displayName,frameId,time,jerseyNumber,club,playDirection,x,y,s,a,dis,o,dir,event
0,2022100600,90,33084,Matt Ryan,1,2022-10-06 20:17:04.799999,2,IND,left,90.42,23.74,0.11,0.04,0.03,271.98,257.76,
1,2022100600,90,33084,Matt Ryan,2,2022-10-06 20:17:04.900000,2,IND,left,90.39,23.74,0.14,0.06,0.03,272.84,256.68,
2,2022100600,90,33084,Matt Ryan,3,2022-10-06 20:17:05.000000,2,IND,left,90.36,23.73,0.17,0.09,0.03,272.84,254.91,
3,2022100600,90,33084,Matt Ryan,4,2022-10-06 20:17:05.099999,2,IND,left,90.32,23.73,0.19,0.11,0.04,275.8,260.06,
4,2022100600,90,33084,Matt Ryan,5,2022-10-06 20:17:05.200000,2,IND,left,90.28,23.72,0.2,0.13,0.04,275.8,257.79,


In [215]:
df_tracking_6 = pd.read_csv("data/tracking_week_6.csv")
df_tracking_6['nflId'].fillna(10000.0, inplace=True)
df_tracking_6['nflId'] = df_tracking_6['nflId'].astype(int)
df_tracking_6['jerseyNumber'].fillna(999.0, inplace=True)
df_tracking_6['jerseyNumber'] = df_tracking_6['jerseyNumber'].astype(int)
df_tracking_6.to_csv('data/tracking_6_clean.csv', index=False)
df_tracking_6.head()

Unnamed: 0,gameId,playId,nflId,displayName,frameId,time,jerseyNumber,club,playDirection,x,y,s,a,dis,o,dir,event
0,2022101300,54,42488,Bobby McCain,1,2022-10-13 20:16:18.799999,20,WAS,left,68.86,37.6,3.95,2.63,0.39,175.81,254.51,
1,2022101300,54,42488,Bobby McCain,2,2022-10-13 20:16:18.900000,20,WAS,left,68.48,37.47,4.07,2.75,0.4,180.36,249.5,
2,2022101300,54,42488,Bobby McCain,3,2022-10-13 20:16:19.000000,20,WAS,left,68.1,37.31,4.17,2.83,0.41,181.03,244.73,
3,2022101300,54,42488,Bobby McCain,4,2022-10-13 20:16:19.099999,20,WAS,left,67.73,37.11,4.27,2.98,0.42,182.99,239.75,
4,2022101300,54,42488,Bobby McCain,5,2022-10-13 20:16:19.200000,20,WAS,left,67.37,36.88,4.4,2.92,0.43,182.99,235.32,


In [216]:
df_tracking_7 = pd.read_csv("data/tracking_week_7.csv")
df_tracking_7['nflId'].fillna(10000.0, inplace=True)
df_tracking_7['nflId'] = df_tracking_7['nflId'].astype(int)
df_tracking_7['jerseyNumber'].fillna(999.0, inplace=True)
df_tracking_7['jerseyNumber'] = df_tracking_7['jerseyNumber'].astype(int)
df_tracking_7.to_csv('data/tracking_7_clean.csv', index=False)
df_tracking_7.head()

Unnamed: 0,gameId,playId,nflId,displayName,frameId,time,jerseyNumber,club,playDirection,x,y,s,a,dis,o,dir,event
0,2022102000,56,37084,J.J. Watt,1,2022-10-20 20:16:19.099999,99,ARI,left,87.64,23.49,3.95,2.68,0.4,113.01,67.5,
1,2022102000,56,37084,J.J. Watt,2,2022-10-20 20:16:19.200000,99,ARI,left,88.02,23.63,4.08,2.35,0.41,118.7,71.05,pass_arrived
2,2022102000,56,37084,J.J. Watt,3,2022-10-20 20:16:19.299999,99,ARI,left,88.44,23.74,4.21,2.07,0.44,114.82,75.53,
3,2022102000,56,37084,J.J. Watt,4,2022-10-20 20:16:19.400000,99,ARI,left,88.86,23.82,4.2,2.07,0.43,121.02,79.59,
4,2022102000,56,37084,J.J. Watt,5,2022-10-20 20:16:19.500000,99,ARI,left,89.28,23.88,4.15,2.08,0.42,124.76,82.73,


In [217]:
df_tracking_8 = pd.read_csv("data/tracking_week_8.csv")
df_tracking_8['nflId'].fillna(10000.0, inplace=True)
df_tracking_8['nflId'] = df_tracking_8['nflId'].astype(int)
df_tracking_8['jerseyNumber'].fillna(999.0, inplace=True)
df_tracking_8['jerseyNumber'] = df_tracking_8['jerseyNumber'].astype(int)
df_tracking_8.to_csv('data/tracking_8_clean.csv', index=False)
df_tracking_8.head()

Unnamed: 0,gameId,playId,nflId,displayName,frameId,time,jerseyNumber,club,playDirection,x,y,s,a,dis,o,dir,event
0,2022102700,68,38557,Kevin Zeitler,1,2022-10-27 20:16:37.099999,70,BAL,right,25.82,28.56,0.92,1.22,0.1,62.16,327.7,
1,2022102700,68,38557,Kevin Zeitler,2,2022-10-27 20:16:37.200000,70,BAL,right,25.78,28.64,0.87,1.12,0.09,59.23,337.03,
2,2022102700,68,38557,Kevin Zeitler,3,2022-10-27 20:16:37.299999,70,BAL,right,25.77,28.72,0.78,1.14,0.08,58.48,348.42,pass_arrived
3,2022102700,68,38557,Kevin Zeitler,4,2022-10-27 20:16:37.400000,70,BAL,right,25.77,28.79,0.72,1.23,0.07,57.03,1.0,
4,2022102700,68,38557,Kevin Zeitler,5,2022-10-27 20:16:37.500000,70,BAL,right,25.79,28.86,0.7,1.26,0.07,54.68,15.53,


In [218]:
df_tracking_9 = pd.read_csv("data/tracking_week_9.csv")
df_tracking_9['nflId'].fillna(10000.0, inplace=True)
df_tracking_9['nflId'] = df_tracking_9['nflId'].astype(int)
df_tracking_9['jerseyNumber'].fillna(999.0, inplace=True)
df_tracking_9['jerseyNumber'] = df_tracking_9['jerseyNumber'].astype(int)
df_tracking_9.to_csv('data/tracking_9_clean.csv', index=False)
df_tracking_9.head()

Unnamed: 0,gameId,playId,nflId,displayName,frameId,time,jerseyNumber,club,playDirection,x,y,s,a,dis,o,dir,event
0,2022110300,55,38542,Fletcher Cox,1,2022-11-03 20:16:30.400000,91,PHI,right,35.31,21.25,0.25,0.21,0.01,275.05,263.18,
1,2022110300,55,38542,Fletcher Cox,2,2022-11-03 20:16:30.500000,91,PHI,right,35.3,21.25,0.21,0.2,0.02,270.08,264.09,
2,2022110300,55,38542,Fletcher Cox,3,2022-11-03 20:16:30.599999,91,PHI,right,35.29,21.25,0.17,0.18,0.01,267.61,264.78,
3,2022110300,55,38542,Fletcher Cox,4,2022-11-03 20:16:30.700000,91,PHI,right,35.31,21.24,0.1,0.15,0.02,263.43,250.8,
4,2022110300,55,38542,Fletcher Cox,5,2022-11-03 20:16:30.799999,91,PHI,right,35.31,21.25,0.07,0.12,0.01,262.28,258.57,


# NFlVerse Play by PLay data

[field descriptions](https://www.nflfastr.com/articles/field_descriptions.html)

In [35]:
df_verse = pd.read_csv('data/play_by_play_2022.csv', low_memory=False)
df_verse.head()

Unnamed: 0,play_id,game_id,old_game_id,home_team,away_team,season_type,week,posteam,posteam_type,defteam,...,out_of_bounds,home_opening_kickoff,qb_epa,xyac_epa,xyac_mean_yardage,xyac_median_yardage,xyac_success,xyac_fd,xpass,pass_oe
0,1,2022_01_BAL_NYJ,2022091107,NYJ,BAL,REG,1,,,,...,0,1,0.0,,,,,,,
1,43,2022_01_BAL_NYJ,2022091107,NYJ,BAL,REG,1,NYJ,home,BAL,...,0,1,-0.443521,,,,,,,
2,68,2022_01_BAL_NYJ,2022091107,NYJ,BAL,REG,1,NYJ,home,BAL,...,0,1,1.468819,,,,,,0.440373,-44.037291
3,89,2022_01_BAL_NYJ,2022091107,NYJ,BAL,REG,1,NYJ,home,BAL,...,0,1,-0.492192,0.727261,6.988125,6.0,0.60693,0.227598,0.389904,61.009598
4,115,2022_01_BAL_NYJ,2022091107,NYJ,BAL,REG,1,NYJ,home,BAL,...,0,1,-0.325931,,,,,,0.443575,-44.357494


In [4]:
df_verse.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50147 entries, 0 to 50146
Columns: 372 entries, play_id to pass_oe
dtypes: float64(179), int64(41), object(152)
memory usage: 142.3+ MB


In [16]:
week1 = df_verse[df_verse['old_game_id'] == 2022090800]
week1.head()

Unnamed: 0,play_id,game_id,old_game_id,home_team,away_team,season_type,week,posteam,posteam_type,defteam,...,out_of_bounds,home_opening_kickoff,qb_epa,xyac_epa,xyac_mean_yardage,xyac_median_yardage,xyac_success,xyac_fd,xpass,pass_oe
179,1,2022_01_BUF_LA,2022090800,LA,BUF,REG,1,,,,...,0,0,0.0,,,,,,,
180,41,2022_01_BUF_LA,2022090800,LA,BUF,REG,1,BUF,away,LA,...,0,0,0.0,,,,,,,
181,56,2022_01_BUF_LA,2022090800,LA,BUF,REG,1,BUF,away,LA,...,0,0,0.217842,0.499397,3.927394,2.0,0.734816,0.265285,0.515357,48.46431
182,80,2022_01_BUF_LA,2022090800,LA,BUF,REG,1,BUF,away,LA,...,0,0,0.590252,,,,,,0.483545,51.645491
183,101,2022_01_BUF_LA,2022090800,LA,BUF,REG,1,BUF,away,LA,...,0,0,0.449645,,,,,,0.46302,-46.301982
