# The 4th Down Equation: Analyzing the Interaction Between Offensive EPA and Defensive Pass-Prevention Tendencies

**Team Members:** Nathan Spear, Hayden Kellington, Shivam Sharma

**Date:** 02/11/206

---

## Research Question

How does going for it on 4th down affect Expected Points Added (EPA) on the play, and does this effect differ based on a defense’s pass-prevention tendencies, measured by opponents’ average depth of target (aDOT) allowed and EPA allowed on deep passing plays?

**Expected Outcomes:**
- Quantifying the "Go for it" Impact: We aim to demonstrate that going for it on 4th down generally results in a positive Expected Points Added (EPA) compared to more conservative play-calling, particularly in short-yardage situations.


- Defensive Shell Sensitivity: We hope to discover if offensive success on 4th down is significantly hindered by defenses with low average depth of target (aDOT) allowed—which suggests a "bend-but-don't-break" or deep-pass-prevention style.


- The "Deep-Pass EPA" Correlation: We want to test the hypothesis that teams facing a defense with a high Deep-Pass EPA allowed will see a higher marginal benefit (EPA gain) when going for it on 4th down passing plays.


- Interaction Effects: We aim to demonstrate that the decision to go for it is not "one size fits all" but is statistically dependent on defensive tendencies, showing that an offense's EPA outcome on 4th down varies predictably when interacting with a defense's historical pass-prevention metrics.


- Data-Driven Decision Making: Ultimately, we hope to provide a framework that moves beyond "offensive confidence" to show that specific pbp-derived defensive metrics (like aDOT and deep-pass EPA) should be primary factors in a team’s 4th-down strategy.

---

## Data Source

**Dataset Name:** nflverse (specifically nflfastR play-by-play data)

**Link:** https://github.com/nflverse/nflfastR-data

**Description:** 
A comprehensive dataset containing play-by-play information for every NFL game. It includes advanced metrics like Expected Points Added (EPA), Win Probability (WP), and specific play details such as air yards, pass location, and personnel.

- Number of observations: Approximately 45,000 to 50,000 plays per regular season. For your specific study on 4th downs, this will be filtered down to roughly 4,000–5,000 observations depending on the years selected.


- Number of features: 370+ variables per play, covering everything from game state (time, score, yard line) to advanced efficiency metrics.

Key variables:


- epa: Expected Points Added on the play (Target Variable).


- fourth_down_attempt: Binary indicator (1 if the team went for it, 0 otherwise).


- air_yards: Distance the ball traveled in the air, used to calculate aDOT.


- pass_length / pass_location: Used to identify deep passing plays.


- defteam: The defensive team, used to aggregate defensive tendency metrics.


- yardline_100: Distance to the opponent's end zone.

- Time period covered: The nflverse repository contains data from 1999 to the 2024 season. Based on your project guidance, you will likely focus on the most recent 2024–2025 data.


- Data collection method: Scraped and cleaned play-by-play data sourced directly from the NFL’s Game Statistics and Information System (GSIS).


**Citation:** 
Carl, S., & Baldwin, B. (2024). nflfastR: Functions to Efficiently Access NFL Play by Play Data. https://www.nflverse.com/.

---

## Setup and Imports

In [2]:
# Standard imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import scipy.stats as stats

!pip install nfl_data_py
import nfl_data_py as nfl

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

# For reproducibility
np.random.seed(42)

print("Imports successful!")

Defaulting to user installation because normal site-packages is not writeable
Collecting nfl_data_py
  Downloading nfl_data_py-0.3.3-py3-none-any.whl.metadata (12 kB)
Collecting pandas<2.0,>=1.0 (from nfl_data_py)
  Downloading pandas-1.5.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Collecting fastparquet>0.5 (from nfl_data_py)
  Downloading fastparquet-2025.12.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (4.4 kB)
Collecting cramjam>=2.3 (from fastparquet>0.5->nfl_data_py)
  Downloading cramjam-2.11.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.6 kB)
Downloading nfl_data_py-0.3.3-py3-none-any.whl (13 kB)
Downloading fastparquet-2025.12.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (1.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m24.5 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hDownloading pand

---

## Data Loading

**TODO:** Load your dataset and perform initial inspection

In [6]:
df = nfl.import_pbp_data([2024, 2025])
df_4th = df[df['down'] == 4].copy()
print(f"Successfully loaded {len(df_4th)} fourth-down plays!")

# Display basic information
if df is not None:
    print(f"Dataset shape: {df_4th.shape}")
    print(f"\nFirst few rows:")
    display(df_4th.head())

2024 done.
2025 done.
Downcasting floats.
Successfully loaded 8569 fourth-down plays!
Dataset shape: (8569, 397)

First few rows:


Unnamed: 0,play_id,game_id,old_game_id_x,home_team,away_team,season_type,week,posteam,posteam_type,defteam,side_of_field,yardline_100,game_date,quarter_seconds_remaining,half_seconds_remaining,game_seconds_remaining,game_half,quarter_end,drive,sp,qtr,down,goal_to_go,time,yrdln,ydstogo,ydsnet,desc,play_type,yards_gained,shotgun,no_huddle,qb_dropback,qb_kneel,qb_spike,qb_scramble,pass_length,pass_location,air_yards,yards_after_catch,run_location,run_gap,field_goal_result,kick_distance,extra_point_result,two_point_conv_result,home_timeouts_remaining,away_timeouts_remaining,timeout,timeout_team,td_team,td_player_name,td_player_id,posteam_timeouts_remaining,defteam_timeouts_remaining,total_home_score,total_away_score,posteam_score,defteam_score,score_differential,posteam_score_post,defteam_score_post,score_differential_post,no_score_prob,opp_fg_prob,opp_safety_prob,opp_td_prob,fg_prob,safety_prob,td_prob,extra_point_prob,two_point_conversion_prob,ep,epa,total_home_epa,total_away_epa,total_home_rush_epa,total_away_rush_epa,total_home_pass_epa,total_away_pass_epa,air_epa,yac_epa,comp_air_epa,comp_yac_epa,total_home_comp_air_epa,total_away_comp_air_epa,total_home_comp_yac_epa,total_away_comp_yac_epa,total_home_raw_air_epa,total_away_raw_air_epa,total_home_raw_yac_epa,total_away_raw_yac_epa,wp,def_wp,home_wp,away_wp,wpa,vegas_wpa,vegas_home_wpa,home_wp_post,away_wp_post,vegas_wp,vegas_home_wp,total_home_rush_wpa,total_away_rush_wpa,total_home_pass_wpa,total_away_pass_wpa,air_wpa,yac_wpa,comp_air_wpa,comp_yac_wpa,total_home_comp_air_wpa,total_away_comp_air_wpa,total_home_comp_yac_wpa,total_away_comp_yac_wpa,total_home_raw_air_wpa,total_away_raw_air_wpa,total_home_raw_yac_wpa,total_away_raw_yac_wpa,punt_blocked,first_down_rush,first_down_pass,first_down_penalty,third_down_converted,third_down_failed,fourth_down_converted,fourth_down_failed,incomplete_pass,touchback,interception,punt_inside_twenty,punt_in_endzone,punt_out_of_bounds,punt_downed,punt_fair_catch,kickoff_inside_twenty,kickoff_in_endzone,kickoff_out_of_bounds,kickoff_downed,kickoff_fair_catch,fumble_forced,fumble_not_forced,fumble_out_of_bounds,solo_tackle,safety,penalty,tackled_for_loss,fumble_lost,own_kickoff_recovery,own_kickoff_recovery_td,qb_hit,rush_attempt,pass_attempt,sack,touchdown,pass_touchdown,rush_touchdown,return_touchdown,extra_point_attempt,two_point_attempt,field_goal_attempt,kickoff_attempt,punt_attempt,fumble,complete_pass,assist_tackle,lateral_reception,lateral_rush,lateral_return,lateral_recovery,passer_player_id,passer_player_name,passing_yards,receiver_player_id,receiver_player_name,receiving_yards,rusher_player_id,rusher_player_name,rushing_yards,lateral_receiver_player_id,lateral_receiver_player_name,lateral_receiving_yards,lateral_rusher_player_id,lateral_rusher_player_name,lateral_rushing_yards,lateral_sack_player_id,lateral_sack_player_name,interception_player_id,interception_player_name,lateral_interception_player_id,lateral_interception_player_name,punt_returner_player_id,punt_returner_player_name,lateral_punt_returner_player_id,lateral_punt_returner_player_name,kickoff_returner_player_name,kickoff_returner_player_id,lateral_kickoff_returner_player_id,lateral_kickoff_returner_player_name,punter_player_id,punter_player_name,kicker_player_name,kicker_player_id,own_kickoff_recovery_player_id,own_kickoff_recovery_player_name,blocked_player_id,blocked_player_name,tackle_for_loss_1_player_id,tackle_for_loss_1_player_name,tackle_for_loss_2_player_id,tackle_for_loss_2_player_name,qb_hit_1_player_id,qb_hit_1_player_name,qb_hit_2_player_id,qb_hit_2_player_name,forced_fumble_player_1_team,forced_fumble_player_1_player_id,forced_fumble_player_1_player_name,forced_fumble_player_2_team,forced_fumble_player_2_player_id,forced_fumble_player_2_player_name,solo_tackle_1_team,solo_tackle_2_team,solo_tackle_1_player_id,solo_tackle_2_player_id,solo_tackle_1_player_name,solo_tackle_2_player_name,assist_tackle_1_player_id,assist_tackle_1_player_name,assist_tackle_1_team,assist_tackle_2_player_id,assist_tackle_2_player_name,assist_tackle_2_team,assist_tackle_3_player_id,assist_tackle_3_player_name,assist_tackle_3_team,assist_tackle_4_player_id,assist_tackle_4_player_name,assist_tackle_4_team,tackle_with_assist,tackle_with_assist_1_player_id,tackle_with_assist_1_player_name,tackle_with_assist_1_team,tackle_with_assist_2_player_id,tackle_with_assist_2_player_name,tackle_with_assist_2_team,pass_defense_1_player_id,pass_defense_1_player_name,pass_defense_2_player_id,pass_defense_2_player_name,fumbled_1_team,fumbled_1_player_id,fumbled_1_player_name,fumbled_2_player_id,fumbled_2_player_name,fumbled_2_team,fumble_recovery_1_team,fumble_recovery_1_yards,fumble_recovery_1_player_id,fumble_recovery_1_player_name,fumble_recovery_2_team,fumble_recovery_2_yards,fumble_recovery_2_player_id,fumble_recovery_2_player_name,sack_player_id,sack_player_name,half_sack_1_player_id,half_sack_1_player_name,half_sack_2_player_id,half_sack_2_player_name,return_team,return_yards,penalty_team,penalty_player_id,penalty_player_name,penalty_yards,replay_or_challenge,replay_or_challenge_result,penalty_type,defensive_two_point_attempt,defensive_two_point_conv,defensive_extra_point_attempt,defensive_extra_point_conv,safety_player_name,safety_player_id,season,cp,cpoe,series,series_success,series_result,order_sequence,start_time,time_of_day,stadium,weather,nfl_api_id,play_clock,play_deleted,play_type_nfl,special_teams_play,st_play_type,end_clock_time,end_yard_line,fixed_drive,fixed_drive_result,drive_real_start_time,drive_play_count,drive_time_of_possession,drive_first_downs,drive_inside20,drive_ended_with_score,drive_quarter_start,drive_quarter_end,drive_yards_penalized,drive_start_transition,drive_end_transition,drive_game_clock_start,drive_game_clock_end,drive_start_yard_line,drive_end_yard_line,drive_play_id_started,drive_play_id_ended,away_score,home_score,location,result,total,spread_line,total_line,div_game,roof,surface,temp,wind,home_coach,away_coach,stadium_id,game_stadium,aborted_play,success,passer,passer_jersey_number,rusher,rusher_jersey_number,receiver,receiver_jersey_number,pass,rush,first_down,special,play,passer_id,rusher_id,receiver_id,name,jersey_number,id,fantasy_player_name,fantasy_player_id,fantasy,fantasy_id,out_of_bounds,home_opening_kickoff,qb_epa,xyac_epa,xyac_mean_yardage,xyac_median_yardage,xyac_success,xyac_fd,xpass,pass_oe,nflverse_game_id,old_game_id_y,possession_team,offense_formation,offense_personnel,defenders_in_box,defense_personnel,number_of_pass_rushers,players_on_play,offense_players,defense_players,n_offense,n_defense,ngs_air_yards,time_to_throw,was_pressure,route,defense_man_zone_type,defense_coverage_type,offense_names,defense_names,offense_positions,defense_positions,offense_numbers,defense_numbers
31,823.0,2024_01_ARI_BUF,2024090801,BUF,ARI,REG,1,ARI,away,BUF,BUF,11.0,2024-09-08,900.0,900.0,2700.0,Half1,0.0,3.0,1.0,2.0,4.0,0,15:00,BUF 11,4.0,60.0,"(15:00) 5-M.Prater 29 yard field goal is GOOD,...",field_goal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,made,29.0,,,3.0,3.0,0.0,,,,,3.0,3.0,0.0,10.0,7.0,0.0,7.0,10.0,0.0,10.0,0.002077,0.007349,0.000103,0.01276,0.966858,0.000108,0.010744,0.0,0.0,2.864426,0.135574,-10.698709,10.698709,0.494553,-0.494553,-12.61465,12.61465,,,0.0,0.0,2.935215,-2.935215,-10.958288,10.958288,2.419075,-2.419075,-9.974523,9.974523,0.721678,0.278322,0.278322,0.721678,-0.002172,0.010197,-0.010197,0.280494,0.719506,0.557039,0.442961,-0.001494,0.001494,-0.33398,0.33398,,,0.0,0.0,-0.043326,0.043326,-0.167168,0.167168,-0.043326,0.043326,-0.151786,0.151786,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,M.Prater,00-0023853,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,,,,,0.0,,,0.0,0.0,0.0,0.0,,,2024,,,12.0,0.0,Field goal,823.0,"9/8/24, 13:03:02",2024-09-08T17:34:16Z,Highmark Stadium,"Clouds and sun with wind Temp: 61° F, Humidity...",7d40236a-1312-11ef-afd1-646009f18b2e,0,0.0,FIELD_GOAL,0.0,,,,3.0,Field goal,2024-09-08T17:25:39.110Z,10.0,5:27,3.0,1.0,1.0,1.0,2.0,0.0,FUMBLE,FIELD_GOAL,05:24,14:57,ARI 29,BUF 11,565.0,823.0,28,34,Home,6,62,6.5,46.0,0,outdoors,a_turf,61.0,20.0,Sean McDermott,Jonathan Gannon,BUF00,New Era Field,0.0,1.0,,,,,,,0.0,0.0,0.0,1.0,0.0,,,,,,,,,,,0.0,0.0,0.135574,,,,,,,,2024_01_ARI_BUF,2024090801,ARI,,"3 C, 2 G, 1 K, 1 LS, 1 P, 2 T, 1 TE",0.0,"1 CB, 2 DE, 2 DT, 1 FS, 1 ILB, 1 MLB, 1 OLB, 2 SS",0.0,00-0034495;00-0035258;00-0035992;00-0037661;00...,00-0034495;00-0035258;00-0035992;00-0034346;00...,00-0037661;00-0036192;00-0036914;00-0032763;00...,11.0,11.0,,,False,,,,Evan Brown;Hjalte Froholdt;Trystan Colon;Will ...,Ja'Marcus Ingram;A.J. Epenesa;Greg Rousseau;Au...,C;C;C;G;G;K;LS;P;T;T;TE,CB;DE;DE;DT;DT;FS;ILB;MLB;OLB;SS;SS,62;72;63;76;74;5;46;12;68;70;87,46;57;50;98;90;39;54;43;42;9;3
43,1120.0,2024_01_ARI_BUF,2024090801,BUF,ARI,REG,1,BUF,home,ARI,ARI,19.0,2024-09-08,591.0,591.0,2391.0,Half1,0.0,4.0,1.0,2.0,4.0,1,09:51,ARI 19,19.0,51.0,"(9:51) 2-T.Bass 37 yard field goal is GOOD, Ce...",field_goal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,made,37.0,,,3.0,3.0,0.0,,,,,3.0,3.0,3.0,10.0,0.0,10.0,-10.0,3.0,10.0,-7.0,0.014307,0.02165,0.00026,0.038332,0.904247,0.000226,0.020978,0.0,0.0,2.52625,0.47375,-8.924877,8.924877,1.170744,-1.170744,-11.222011,11.222011,,,0.0,0.0,3.671403,-3.671403,-9.824366,9.824366,3.155262,-3.155262,-8.8406,8.8406,0.286478,0.713522,0.286478,0.713522,0.047245,0.025475,0.025475,0.333723,0.666277,0.450525,0.450525,-0.011502,0.011502,-0.306144,0.306144,,,0.0,0.0,-0.043326,0.043326,-0.109734,0.109734,-0.043326,0.043326,-0.094352,0.094352,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,T.Bass,00-0036162,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,,,,,0.0,,,0.0,0.0,0.0,0.0,,,2024,,,17.0,0.0,Field goal,1120.0,"9/8/24, 13:03:02",2024-09-08T17:45:10Z,Highmark Stadium,"Clouds and sun with wind Temp: 61° F, Humidity...",7d40236a-1312-11ef-afd1-646009f18b2e,0,0.0,FIELD_GOAL,0.0,,,,4.0,Field goal,2024-09-08T17:36:15.657Z,9.0,5:10,4.0,1.0,1.0,2.0,2.0,-9.0,KICKOFF,FIELD_GOAL,14:57,09:47,BUF 30,ARI 19,847.0,1120.0,28,34,Home,6,62,6.5,46.0,0,outdoors,a_turf,61.0,20.0,Sean McDermott,Jonathan Gannon,BUF00,New Era Field,0.0,1.0,,,,,,,0.0,0.0,0.0,1.0,0.0,,,,,,,,,,,0.0,0.0,0.47375,,,,,,,,2024_01_ARI_BUF,2024090801,BUF,,"1 FB, 3 G, 1 K, 1 LS, 1 P, 3 T, 1 TE",0.0,"2 CB, 3 DT, 1 FS, 2 ILB, 1 NT, 1 OLB, 1 SS",0.0,00-0035239;00-0039007;00-0034677;00-0034387;00...,00-0036187;00-0035668;00-0037428;00-0038548;00...,00-0035239;00-0039007;00-0034677;00-0034387;00...,11.0,11.0,,,False,,,,Reggie Gilliam;David Edwards;Alec Anderson;O'C...,Sean Murphy-Bunting;Garrett Williams;Justin Jo...,FB;G;G;G;K;LS;P;T;T;T;TE,CB;CB;DT;DT;DT;FS;ILB;ILB;NT;OLB;SS,41;76;70;64;2;69;8;73;79;68;85,23;21;93;92;55;42;51;44;98;45;3
93,2360.0,2024_01_ARI_BUF,2024090801,BUF,ARI,REG,1,ARI,away,BUF,ARI,74.0,2024-09-08,483.0,1383.0,1383.0,Half2,0.0,9.0,0.0,3.0,4.0,0,08:03,ARI 26,14.0,-4.0,"(8:03) 12-B.Gillikin punts 50 yards to BUF 24,...",punt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,50.0,,,3.0,3.0,0.0,,,,,3.0,3.0,17.0,17.0,17.0,17.0,0.0,17.0,17.0,0.0,0.015738,0.233666,0.005049,0.417462,0.124309,0.003625,0.200151,0.0,0.0,-1.852098,0.077613,-0.429929,0.429929,0.454255,-0.454255,-2.112581,2.112581,,,0.0,0.0,5.627691,-5.627691,-5.117061,5.117061,0.945468,-0.945468,0.357743,-0.357743,0.359586,0.640414,0.640414,0.359586,0.073095,0.034948,-0.034948,0.567319,0.432681,0.308432,0.691568,-0.095234,0.095234,0.027777,-0.027777,,,0.0,0.0,0.070959,-0.070959,0.022691,-0.022691,0.071407,-0.071407,0.03617,-0.03617,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,00-0039333,B.Codrington,,,,,,,00-0035859,B.Gillikin,,,,,,,,,,,,,,,,,,,,,ARI,,00-0039807,,M.Melton,,,,,,,,,,,,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,BUF,7.0,,,,,0.0,,,0.0,0.0,0.0,0.0,,,2024,,,34.0,0.0,Punt,2360.0,"9/8/24, 13:03:02",2024-09-08T18:48:36.437Z,Highmark Stadium,"Clouds and sun with wind Temp: 61° F, Humidity...",7d40236a-1312-11ef-afd1-646009f18b2e,0,0.0,PUNT,1.0,,2024-09-08T18:48:44.967Z,,9.0,Punt,2024-09-08T18:46:01.340Z,3.0,1:58,0.0,0.0,0.0,3.0,3.0,0.0,KICKOFF,PUNT,09:51,07:53,ARI 30,ARI 26,2257.0,2360.0,28,34,Home,6,62,6.5,46.0,0,outdoors,a_turf,61.0,20.0,Sean McDermott,Jonathan Gannon,BUF00,New Era Field,0.0,1.0,,,,,,,0.0,0.0,0.0,1.0,0.0,,,,,,,,,,,0.0,0.0,0.077613,,,,,,,,2024_01_ARI_BUF,2024090801,ARI,,"1 CB, 1 FS, 2 ILB, 1 LS, 2 OLB, 1 P, 1 RB, 1 T...",0.0,"3 CB, 1 DE, 1 FB, 2 FS, 2 ILB, 1 TE, 1 WR",0.0,00-0039807;00-0037242;00-0037661;00-0039333;00...,00-0039807;00-0037141;00-0035961;00-0038600;00...,00-0037242;00-0037661;00-0039333;00-0035850;00...,11.0,11.0,,,False,,,,Max Melton;Joey Blount;Krys Barnes;Owen Pappoe...,Kaiir Elam;Ja'Marcus Ingram;Brandon Codrington...,CB;FS;ILB;ILB;LS;OLB;OLB;P;RB;TE;WR,CB;CB;CB;DE;FB;FS;FS;ILB;ILB;TE;WR,16;32;51;44;46;45;52;12;20;84;0,5;46;29;96;41;21;39;52;54;85;13
97,2468.0,2024_01_ARI_BUF,2024090801,BUF,ARI,REG,1,BUF,home,ARI,BUF,66.0,2024-09-08,388.0,1288.0,1288.0,Half2,0.0,10.0,0.0,3.0,4.0,0,06:28,BUF 34,7.0,3.0,"(6:28) 8-S.Martin punts 51 yards to ARI 15, Ce...",punt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,51.0,,,3.0,3.0,0.0,,,,,3.0,3.0,17.0,17.0,17.0,17.0,0.0,17.0,17.0,0.0,0.02003,0.202747,0.003196,0.347554,0.149063,0.003429,0.273981,0.0,0.0,-0.675595,0.210434,-2.669576,2.669576,-0.316404,0.316404,-3.792002,3.792002,,,0.0,0.0,4.301374,-4.301374,-4.076196,4.076196,3.420564,-3.420564,-3.796775,3.796775,0.504012,0.495988,0.504012,0.495988,0.0342,0.038065,0.038065,0.538212,0.461788,0.574894,0.574894,-0.129517,0.129517,-0.001247,0.001247,,,0.0,0.0,0.070959,-0.070959,0.02442,-0.02442,0.071407,-0.071407,0.007147,-0.007147,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,00-0035500,G.Dortch,,,,,,,00-0030092,S.Martin,,,,,,,,,,,,,,,,,,,,,,,,,,,00-0033555,M.Hollins,BUF,00-0035357,C.Lewis,BUF,,,,,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,ARI,6.0,,,,,0.0,,,0.0,0.0,0.0,0.0,,,2024,,,35.0,0.0,Punt,2468.0,"9/8/24, 13:03:02",2024-09-08T18:53:40Z,Highmark Stadium,"Clouds and sun with wind Temp: 61° F, Humidity...",7d40236a-1312-11ef-afd1-646009f18b2e,0,0.0,PUNT,1.0,,,,10.0,Punt,2024-09-08T18:51:42.567Z,3.0,1:37,0.0,0.0,0.0,3.0,3.0,0.0,PUNT,PUNT,07:53,06:16,BUF 31,BUF 34,2386.0,2468.0,28,34,Home,6,62,6.5,46.0,0,outdoors,a_turf,61.0,20.0,Sean McDermott,Jonathan Gannon,BUF00,New Era Field,0.0,1.0,,,,,,,0.0,0.0,0.0,1.0,0.0,,,,,,,,,,,0.0,0.0,0.210434,,,,,,,,2024_01_ARI_BUF,2024090801,BUF,,"1 CB, 2 DE, 1 FB, 1 FS, 2 ILB, 1 LS, 1 P, 1 TE...",0.0,"2 CB, 2 FS, 2 ILB, 1 OLB, 2 RB, 2 WR",0.0,00-0037661;00-0038984;00-0039807;00-0036403;00...,00-0037661;00-0036403;00-0035850;00-0036187;00...,00-0038984;00-0039807;00-0037141;00-0039864;00...,11.0,11.0,,,False,,,,Ja'Marcus Ingram;Casey Toohill;Kameron Cline;R...,Kei'Trel Clark;Max Melton;Joey Blount;Dadrion ...,CB;DE;DE;FB;FS;ILB;ILB;LS;P;TE;WR,CB;CB;FS;FS;ILB;ILB;OLB;RB;RB;WR;WR,46;99;96;41;39;52;54;69;8;85;13,13;16;32;42;51;44;52;20;31;0;4
103,2632.0,2024_01_ARI_BUF,2024090801,BUF,ARI,REG,1,BUF,home,ARI,ARI,12.0,2024-09-08,204.0,1104.0,1104.0,Half2,0.0,12.0,0.0,3.0,4.0,0,03:24,ARI 12,1.0,21.0,(3:24) 17-J.Allen up the middle to ARI 11 for ...,run,1.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,middle,,,,,,3.0,3.0,0.0,,,,,3.0,3.0,17.0,17.0,17.0,17.0,0.0,17.0,17.0,0.0,0.006518,0.029227,0.000649,0.055303,0.495588,0.000986,0.411729,0.0,0.0,3.894738,1.343717,3.034041,-3.034041,-0.281947,0.281947,1.877157,-1.877157,,,0.0,0.0,5.277059,-5.277059,-4.309857,4.309857,4.396249,-4.396249,-4.030437,4.030437,0.636436,0.363564,0.636436,0.363564,0.063095,0.047678,0.047678,0.699531,0.300469,0.71893,0.71893,-0.125627,0.125627,0.156182,-0.156182,,,0.0,0.0,0.086387,-0.086387,0.032495,-0.032495,0.086835,-0.086835,0.015222,-0.015222,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,,,,,,,00-0034857,J.Allen,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,00-0036933,Z.Collins,ARI,00-0034375,K.White,ARI,,,,,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,,,,,0.0,,,0.0,0.0,0.0,0.0,,,2024,,,37.0,1.0,First down,2632.0,"9/8/24, 13:03:02",2024-09-08T19:00:32.253Z,Highmark Stadium,"Clouds and sun with wind Temp: 61° F, Humidity...",7d40236a-1312-11ef-afd1-646009f18b2e,0,0.0,RUSH,0.0,,2024-09-08T19:00:34.153Z,,12.0,Touchdown,2024-09-08T18:58:31.313Z,5.0,2:48,2.0,1.0,1.0,3.0,3.0,0.0,FUMBLE,TOUCHDOWN,05:25,02:37,ARI 21,ARI 11,2561.0,2684.0,28,34,Home,6,62,6.5,46.0,0,outdoors,a_turf,61.0,20.0,Sean McDermott,Jonathan Gannon,BUF00,New Era Field,0.0,1.0,,,J.Allen,17.0,,,0.0,1.0,1.0,0.0,1.0,,00-0034857,,J.Allen,17.0,00-0034857,J.Allen,00-0034857,J.Allen,00-0034857,0.0,0.0,1.343717,,,,,,0.161714,-16.171412,2024_01_ARI_BUF,2024090801,BUF,UNDER CENTER,"1 C, 2 G, 1 QB, 1 RB, 2 T, 2 TE, 2 WR",8.0,"2 CB, 2 DT, 1 FS, 2 ILB, 1 NT, 2 OLB, 1 SS",0.0,00-0035679;00-0035239;00-0038663;00-0034677;00...,00-0035679;00-0035668;00-0038548;00-0034857;00...,00-0035239;00-0038663;00-0034677;00-0038623;00...,11.0,11.0,,,False,,,,Connor McGovern;David Edwards;O'Cyrus Torrence...,Sean Murphy-Bunting;Starling Thomas V;Justin J...,C;G;G;QB;RB;T;T;TE;TE;WR;WR,CB;CB;DT;DT;FS;ILB;ILB;NT;OLB;OLB;SS,66;76;64;17;4;73;79;88;86;13;0,23;24;93;55;34;7;2;95;45;25;3


In [7]:
# TODO: Examine dataset structure
if df_4th is not None:
    print("Dataset Info:")
    df_4th.info()
    
    print("\n" + "="*50)
    print("Summary Statistics:")
    display(df_4th.describe())
    
    print("\n" + "="*50)
    print("Data Types:")
    display(df_4th.dtypes)

Dataset Info:
<class 'pandas.core.frame.DataFrame'>
Index: 8569 entries, 31 to 98253
Columns: 397 entries, play_id to defense_numbers
dtypes: float32(205), int32(7), int64(1), object(184)
memory usage: 19.1+ MB

Summary Statistics:


Unnamed: 0,play_id,week,yardline_100,quarter_seconds_remaining,half_seconds_remaining,game_seconds_remaining,quarter_end,drive,sp,qtr,down,goal_to_go,ydstogo,ydsnet,yards_gained,shotgun,no_huddle,qb_dropback,qb_kneel,qb_spike,qb_scramble,air_yards,yards_after_catch,kick_distance,home_timeouts_remaining,away_timeouts_remaining,timeout,posteam_timeouts_remaining,defteam_timeouts_remaining,total_home_score,total_away_score,posteam_score,defteam_score,score_differential,posteam_score_post,defteam_score_post,score_differential_post,no_score_prob,opp_fg_prob,opp_safety_prob,opp_td_prob,fg_prob,safety_prob,td_prob,extra_point_prob,two_point_conversion_prob,ep,epa,total_home_epa,total_away_epa,total_home_rush_epa,total_away_rush_epa,total_home_pass_epa,total_away_pass_epa,air_epa,yac_epa,comp_air_epa,comp_yac_epa,total_home_comp_air_epa,total_away_comp_air_epa,total_home_comp_yac_epa,total_away_comp_yac_epa,total_home_raw_air_epa,total_away_raw_air_epa,total_home_raw_yac_epa,total_away_raw_yac_epa,wp,def_wp,home_wp,away_wp,wpa,vegas_wpa,vegas_home_wpa,home_wp_post,away_wp_post,vegas_wp,vegas_home_wp,total_home_rush_wpa,total_away_rush_wpa,total_home_pass_wpa,total_away_pass_wpa,air_wpa,yac_wpa,comp_air_wpa,comp_yac_wpa,total_home_comp_air_wpa,total_away_comp_air_wpa,total_home_comp_yac_wpa,total_away_comp_yac_wpa,total_home_raw_air_wpa,total_away_raw_air_wpa,total_home_raw_yac_wpa,total_away_raw_yac_wpa,punt_blocked,first_down_rush,first_down_pass,first_down_penalty,third_down_converted,third_down_failed,fourth_down_converted,fourth_down_failed,incomplete_pass,touchback,interception,punt_inside_twenty,punt_in_endzone,punt_out_of_bounds,punt_downed,punt_fair_catch,kickoff_inside_twenty,kickoff_in_endzone,kickoff_out_of_bounds,kickoff_downed,kickoff_fair_catch,fumble_forced,fumble_not_forced,fumble_out_of_bounds,solo_tackle,safety,penalty,tackled_for_loss,fumble_lost,own_kickoff_recovery,own_kickoff_recovery_td,qb_hit,rush_attempt,pass_attempt,sack,touchdown,pass_touchdown,rush_touchdown,return_touchdown,extra_point_attempt,two_point_attempt,field_goal_attempt,kickoff_attempt,punt_attempt,fumble,complete_pass,assist_tackle,lateral_reception,lateral_rush,lateral_return,lateral_recovery,passing_yards,receiving_yards,rushing_yards,lateral_receiving_yards,lateral_rushing_yards,tackle_with_assist,fumble_recovery_1_yards,fumble_recovery_2_yards,return_yards,penalty_yards,replay_or_challenge,defensive_two_point_attempt,defensive_two_point_conv,defensive_extra_point_attempt,defensive_extra_point_conv,season,cp,cpoe,series,series_success,order_sequence,play_deleted,special_teams_play,fixed_drive,drive_play_count,drive_first_downs,drive_inside20,drive_ended_with_score,drive_quarter_start,drive_quarter_end,drive_yards_penalized,drive_play_id_started,drive_play_id_ended,away_score,home_score,result,total,spread_line,total_line,div_game,temp,wind,aborted_play,success,passer_jersey_number,rusher_jersey_number,receiver_jersey_number,pass,rush,first_down,special,play,jersey_number,out_of_bounds,home_opening_kickoff,qb_epa,xyac_epa,xyac_mean_yardage,xyac_median_yardage,xyac_success,xyac_fd,xpass,pass_oe,defenders_in_box,number_of_pass_rushers,n_offense,n_defense,ngs_air_yards,time_to_throw
count,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8541.0,8569.0,8569.0,8549.0,8569.0,8569.0,8569.0,999.0,566.0,6248.0,8569.0,8569.0,8541.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,998.0,998.0,8541.0,8541.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8564.0,8564.0,8569.0,8489.0,8489.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,998.0,998.0,8541.0,8541.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8569.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,8541.0,566.0,566.0,668.0,5.0,1.0,8541.0,122.0,0.0,8541.0,929.0,8569.0,8541.0,8541.0,8541.0,8541.0,8569.0,972.0,972.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,8569.0,5580.0,5580.0,8569.0,8569.0,1226.0,681.0,1098.0,8569.0,8569.0,8541.0,8569.0,8569.0,1838.0,8569.0,8569.0,8569.0,791.0,791.0,791.0,791.0,791.0,2296.0,1838.0,8549.0,8549.0,8567.0,8567.0,0.0,1056.0
mean,2217.985352,9.946552,47.471699,403.379395,781.801636,1664.47168,0.0,11.047497,0.232583,2.603804,4.0,0.059867,7.631462,26.023457,0.962651,0.158828,0.02474,0.134519,0.000583,0.0,0.008052,10.056056,3.985866,44.716709,2.55374,2.539736,0.002107,2.555491,2.537986,11.755164,10.931964,10.297001,11.620609,-1.323608,11.039911,11.647217,-0.607305,0.147461,0.126965,0.002087,0.19944,0.354521,0.003345,0.163847,0.0,0.0,0.436353,0.054041,-0.089658,0.089658,-0.339657,0.339657,-0.235375,0.235375,1.992313,-1.713068,0.108531,0.062282,-0.30405,0.30405,-0.129466,0.129466,-0.782431,0.782431,0.456312,-0.456312,0.471039,0.528961,0.540845,0.459155,0.003925,0.005752,0.00059,0.54284,0.457042,0.468528,0.527969,-0.020069,0.020069,-0.023491,0.023491,0.047536,-0.038822,0.00502,-0.000996,-0.007447,0.007447,-0.014586,0.014586,-0.010951,0.010951,-0.012564,0.012564,0.002576,0.055965,0.058775,0.016392,0.0,0.0,0.114741,0.090388,0.045077,0.039094,0.00562,0.197167,0.000351,0.05058,0.057487,0.132654,0.0,0.0,0.0,0.0,0.0,0.004566,0.011474,0.001756,0.247629,0.000351,0.108769,0.006791,0.005971,0.0,0.0,0.027046,0.078211,0.127034,0.010069,0.024119,0.014167,0.005503,0.003278,0.0,0.0,0.244351,0.0,0.487179,0.015923,0.066269,0.08348,0.000585,0.000117,0.0,0.000937,11.899293,11.736749,3.238024,17.0,5.0,0.012411,1.409836,,2.147641,7.544672,0.007469,0.0,0.0,0.0,0.0,2024.500642,0.562714,1.959059,27.941183,0.132454,2217.981445,0.0,0.485588,11.022757,6.608356,1.584666,0.244953,0.314622,2.510328,2.612907,-1.093593,2041.953979,2236.816406,21.382892,23.267826,1.884934,44.650718,1.47561,44.423912,0.345781,58.143547,7.945878,0.001517,0.551173,9.631321,19.662262,32.785065,0.143074,0.07142,0.130196,0.72914,0.267943,13.371599,0.065819,0.523515,0.054041,1.041579,4.615991,2.558786,0.892437,0.892438,0.648972,2.597676,1.338519,0.628495,11.0007,10.984125,,2.908494
std,1262.482666,5.602199,24.115273,265.885742,533.047852,1047.310791,0.0,6.480571,0.422499,1.129239,0.0,0.237254,5.743312,25.178387,4.391374,0.365535,0.155337,0.341216,0.02415,0.0,0.089376,9.991424,5.841841,10.667015,0.795331,0.802871,0.045861,0.780684,0.817049,10.006568,9.413441,9.479876,9.841817,10.425443,9.586317,9.848873,10.605558,0.221643,0.084625,0.002781,0.141895,0.322738,0.003413,0.11726,0.0,0.0,1.658266,1.760754,12.600397,12.600397,5.647017,5.647017,10.528348,10.528348,2.490145,3.731405,0.768429,0.535228,6.100604,6.100604,6.246104,6.246104,9.422334,9.422334,12.195885,12.195885,0.294513,0.294513,0.293101,0.293101,0.063162,0.061224,0.061473,0.293888,0.29387,0.318216,0.318543,0.160764,0.160764,0.252621,0.252621,0.22742,0.237323,0.065264,0.065447,0.183721,0.183721,0.259002,0.259002,0.233415,0.233415,0.340673,0.340673,0.050687,0.229877,0.23521,0.126982,0.0,0.0,0.318734,0.286758,0.207489,0.193826,0.07476,0.397869,0.018742,0.219157,0.23278,0.339209,0.0,0.0,0.0,0.0,0.0,0.067421,0.106507,0.041871,0.431648,0.018741,0.311377,0.082129,0.077044,0.0,0.0,0.16223,0.268507,0.333022,0.099844,0.153423,0.118183,0.073983,0.057168,0.0,0.0,0.429718,0.0,0.499846,0.125181,0.24877,0.276612,0.02419,0.01082,0.0,0.030593,9.5648,9.403234,6.471609,12.509996,,0.110711,10.816834,,6.876728,4.138555,0.086107,0.0,0.0,0.0,0.0,0.500029,0.17074,46.531933,16.267923,0.338988,1262.487793,0.0,0.499842,6.464931,3.631732,1.617728,0.430094,0.464396,1.136141,1.129809,6.804781,1261.534058,1267.788452,9.379467,9.959643,14.044799,13.307194,5.804098,4.131613,0.47565,17.995192,4.259469,0.038922,0.497397,5.287752,18.810759,33.716732,0.350159,0.257541,0.336542,0.444423,0.442903,13.104755,0.247981,0.499476,1.760754,1.340334,1.982297,1.795572,0.217038,0.217039,0.334542,32.8339,2.641616,1.585507,0.06833,0.138306,,1.303301
min,120.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,4.0,0.0,1.0,-35.0,-34.0,0.0,0.0,0.0,0.0,0.0,0.0,-11.0,-2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-46.0,0.0,0.0,-46.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-3.566449,-11.448006,-49.984192,-57.706867,-26.954866,-25.150322,-47.86475,-48.468239,-5.501755,-10.995464,-4.347542,-0.606222,-30.07353,-23.872591,-27.928616,-29.372158,-45.642963,-58.451015,-66.813263,-61.069199,5.1e-05,7.2e-05,5.1e-05,5.6e-05,-0.798678,-0.798678,-0.798678,0.0,0.0,5e-06,7e-06,-1.180774,-0.750705,-1.164552,-0.91291,-0.976861,-0.974364,-0.938268,-0.974364,-1.362034,-1.644913,-1.949073,-1.799456,-1.963236,-1.884989,-2.434113,-2.29543,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-4.0,-4.0,-34.0,0.0,5.0,0.0,-1.0,,-10.0,0.0,0.0,0.0,0.0,0.0,0.0,2024.0,0.153086,-87.35775,1.0,0.0,120.0,0.0,0.0,1.0,3.0,0.0,0.0,0.0,1.0,1.0,-50.0,38.0,120.0,0.0,0.0,-38.0,9.0,-16.5,32.5,0.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-11.448006,-0.492335,0.531096,0.0,0.043952,0.043952,0.064369,-99.435692,0.0,0.0,10.0,9.0,,0.6
25%,1107.0,5.0,28.0,160.0,285.0,725.0,0.0,5.0,0.0,2.0,4.0,0.0,3.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,38.0,2.0,2.0,0.0,2.0,2.0,3.0,3.0,3.0,3.0,-7.0,3.0,3.0,-7.0,0.007969,0.042015,0.000511,0.050179,0.12483,0.000572,0.050162,0.0,0.0,-0.788434,-0.508501,-7.289443,-6.615476,-3.486079,-2.803197,-6.148725,-5.337955,1.685926,-5.679723,0.0,0.0,-3.939864,-3.101908,-3.69855,-3.342233,-6.017684,-4.475548,-6.110551,-6.801548,0.236624,0.305993,0.320486,0.226114,-0.015028,-0.013249,-0.018537,0.331188,0.226497,0.184579,0.246698,-0.10728,-0.072565,-0.190292,-0.152419,0.0,-0.064984,0.0,0.0,-0.06074,-0.039913,-0.126155,-0.105061,-0.063716,-0.042373,-0.158941,-0.131884,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,5.0,0.0,10.0,5.0,0.0,0.0,,0.0,5.0,0.0,0.0,0.0,0.0,0.0,2024.0,0.426402,-44.389366,14.0,0.0,1107.0,0.0,0.0,5.0,3.0,0.0,0.0,0.0,1.0,2.0,-5.0,928.0,1130.0,15.0,17.0,-6.0,36.0,-3.0,41.5,0.0,45.0,5.0,0.0,0.0,6.0,7.0,7.0,0.0,0.0,0.0,0.0,0.0,6.0,0.0,0.0,-0.508501,0.268027,3.489671,1.0,0.976941,0.976941,0.260508,-16.457098,0.0,0.0,11.0,11.0,,2.1
50%,2220.0,10.0,50.0,386.0,764.0,1713.0,0.0,11.0,0.0,3.0,4.0,0.0,6.0,19.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.0,1.5,46.0,3.0,3.0,0.0,3.0,3.0,10.0,10.0,7.0,10.0,0.0,10.0,10.0,0.0,0.039475,0.141084,0.001619,0.215247,0.162505,0.002526,0.170916,0.0,0.0,0.033692,0.073676,-0.358674,0.358674,-0.250488,0.250488,-0.519722,0.519722,2.680659,0.0,0.0,0.0,-0.252454,0.252454,0.0,0.0,-0.615899,0.615899,0.074568,-0.074568,0.461073,0.538927,0.564737,0.435263,0.000393,0.000701,3.8e-05,0.566792,0.433208,0.446268,0.545186,-0.013127,0.013127,-0.011121,0.011121,0.0,0.0,0.0,0.0,0.0,0.0,-0.005391,0.005391,0.0,0.0,-0.003234,0.003234,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9.5,9.0,2.0,21.0,5.0,0.0,0.0,,0.0,5.0,0.0,0.0,0.0,0.0,0.0,2025.0,0.582401,24.800581,28.0,0.0,2220.0,0.0,0.0,11.0,6.0,1.0,0.0,0.0,3.0,3.0,0.0,2065.0,2235.0,21.0,23.0,2.0,44.0,2.5,44.5,0.0,60.0,7.0,0.0,1.0,9.0,16.0,17.0,0.0,0.0,0.0,1.0,0.0,10.0,0.0,1.0,0.073676,0.378523,4.312233,2.0,0.998945,0.998962,0.801019,2.280158,0.0,0.0,11.0,11.0,,2.6
75%,3313.0,15.0,66.0,630.0,1241.0,2559.0,0.0,16.0,0.0,4.0,4.0,0.0,10.0,45.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,14.0,6.0,52.0,3.0,3.0,0.0,3.0,3.0,17.0,17.0,16.0,17.0,4.0,17.0,17.0,5.0,0.194371,0.193363,0.002965,0.327949,0.675531,0.005004,0.255335,0.0,0.0,1.957857,0.762431,6.615476,7.289443,2.803197,3.486079,5.337955,6.148725,3.684535,0.230419,0.0,0.0,3.101908,3.939864,3.342233,3.69855,4.475548,6.017684,6.801548,6.110551,0.694007,0.763376,0.773886,0.679514,0.025821,0.026754,0.020128,0.773503,0.668691,0.743519,0.810084,0.072565,0.10728,0.152419,0.190292,0.01875,0.012735,0.0,0.0,0.039913,0.06074,0.105061,0.126155,0.042373,0.063716,0.131884,0.158941,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,16.0,16.0,4.0,21.0,5.0,0.0,0.0,,0.0,10.0,0.0,0.0,0.0,0.0,0.0,2025.0,0.708076,38.025158,42.0,0.0,3313.0,0.0,1.0,16.0,9.0,3.0,0.0,1.0,4.0,4.0,0.0,3125.0,3333.0,28.0,30.0,10.0,52.0,5.5,47.5,1.0,72.0,10.0,0.0,1.0,15.0,27.0,81.0,0.0,0.0,0.0,1.0,1.0,17.0,0.0,1.0,0.762431,0.934838,5.411363,3.0,1.0,1.0,0.955742,11.10696,0.0,0.0,11.0,11.0,,3.3
max,5321.0,22.0,99.0,900.0,1785.0,3577.0,0.0,29.0,1.0,5.0,4.0,1.0,40.0,99.0,55.0,1.0,1.0,1.0,1.0,0.0,1.0,49.0,41.0,84.0,3.0,3.0,1.0,3.0,3.0,55.0,51.0,52.0,55.0,46.0,52.0,55.0,46.0,1.0,0.375928,0.095494,0.470153,0.997602,0.053067,0.682404,0.0,0.0,5.120186,7.264766,57.706867,49.984192,25.150322,26.954866,48.468239,47.86475,7.038276,7.679814,5.991952,7.679814,23.872591,30.07353,29.372158,27.928616,58.451015,45.642963,61.069199,66.813263,0.999928,0.999949,0.999944,0.999949,0.627027,0.627027,0.738177,1.0,1.0,0.999991,0.999995,0.750705,1.180774,0.91291,1.164552,0.975666,1.0,0.975666,0.938521,1.644913,1.362034,1.799456,1.949073,1.884989,1.963236,2.29543,2.434113,1.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,52.0,52.0,55.0,33.0,5.0,1.0,102.0,,96.0,34.0,1.0,0.0,0.0,0.0,0.0,2025.0,0.873578,79.368149,69.0,1.0,5321.0,0.0,1.0,29.0,21.0,9.0,1.0,1.0,5.0,5.0,61.0,5160.0,5321.0,51.0,55.0,46.0,90.0,19.5,56.5,1.0,93.0,22.0,1.0,1.0,39.0,89.0,89.0,1.0,1.0,1.0,1.0,1.0,89.0,1.0,1.0,7.264766,5.749337,14.706789,11.0,1.0,1.0,0.997794,91.723885,11.0,9.0,16.0,14.0,,11.5



Data Types:


play_id              float32
game_id               object
old_game_id_x         object
home_team             object
away_team             object
                      ...   
defense_names         object
offense_positions     object
defense_positions     object
offense_numbers       object
defense_numbers       object
Length: 397, dtype: object

**Initial Observations:**

- Noise & Complexity: With 397 columns, most data (like jersey numbers or lateral stats) is irrelevant and needs to be dropped to focus on EPA and defensive metrics.

- Missing Values (Sparsity): Metrics like air_yards are missing in ~60% of rows because they only apply to passes. We have to filter for pass_attempt == 1 before calculating aDOT to avoid skewed results.

- Data Integrity: There are roughly 1,000 rows missing EPA (likely timeouts or penalties) which should be removed. While binary indicators (like fourth_down_converted) are stored as floats, they are mathematically sound for regression.

- Range Validation: EPA values (-12.7 to +8.5) and yard lines (1 to 99) are within expected NFL ranges. The 4th-down conversion rate is currently low (~0.6%) because the data includes all downs; this will normalize once filtered for 4th-down attempts only.

---

## Data Cleaning

**TODO:** Clean and preprocess the data

### Missing Values Analysis

In [None]:
# TODO: Check for missing values
if df is not None:
    missing = df.isnull().sum()
    missing_pct = (missing / len(df)) * 100
    
    missing_df = pd.DataFrame({
        'Missing Count': missing,
        'Percentage': missing_pct
    }).sort_values('Percentage', ascending=False)
    
    print("Missing Values Summary:")
    display(missing_df[missing_df['Missing Count'] > 0])
    
    # Visualize missing data
    if missing.sum() > 0:
        plt.figure(figsize=(10, 6))
        missing_df[missing_df['Missing Count'] > 0]['Percentage'].plot(kind='barh')
        plt.xlabel('Percentage Missing')
        plt.title('Missing Values by Column')
        plt.tight_layout()
        plt.show()

In [None]:
# TODO: Handle missing values
# Strategy options:
# 1. Drop rows: df = df.dropna()
# 2. Drop columns: df = df.drop(columns=['col_name'])
# 3. Fill with mean/median: df['col'] = df['col'].fillna(df['col'].mean())
# 4. Fill with mode: df['col'] = df['col'].fillna(df['col'].mode()[0])
# 5. Forward/backward fill: df = df.fillna(method='ffill')

# df_clean = df.copy()
# TODO: Implement your cleaning strategy here

### Duplicate Detection

In [None]:
# TODO: Check for duplicates
if df is not None:
    duplicates = df.duplicated().sum()
    print(f"Number of duplicate rows: {duplicates}")
    
    if duplicates > 0:
        print("\nDuplicate rows:")
        display(df[df.duplicated(keep=False)])
        
        # TODO: Decide whether to keep or remove duplicates
        # df_clean = df_clean.drop_duplicates()

### Data Type Conversions

In [None]:
# TODO: Convert data types as needed
# Examples:
# df_clean['date_column'] = pd.to_datetime(df_clean['date_column'])
# df_clean['category_column'] = df_clean['category_column'].astype('category')
# df_clean['numeric_column'] = pd.to_numeric(df_clean['numeric_column'], errors='coerce')

pass

### Outlier Detection

In [None]:
# TODO: Detect outliers in numeric columns
# Common methods:
# 1. IQR method
# 2. Z-score method
# 3. Visual inspection with box plots

# Example: Box plots for numeric columns
if df is not None:
    numeric_cols = df.select_dtypes(include=[np.number]).columns
    
    if len(numeric_cols) > 0:
        # TODO: Create box plots for numeric columns
        # fig, axes = plt.subplots(len(numeric_cols), 1, figsize=(10, 3*len(numeric_cols)))
        # for i, col in enumerate(numeric_cols):
        #     df.boxplot(column=col, ax=axes[i])
        # plt.tight_layout()
        # plt.show()
        pass

### Feature Engineering (Optional)

In [None]:
# TODO: Create new features if needed
# Examples:
# - Combine existing features
# - Extract date components (year, month, day of week)
# - Bin continuous variables
# - Encode categorical variables

pass

In [None]:
# TODO: Save cleaned dataset (optional)
# df_clean.to_csv('data/cleaned_data.csv', index=False)
# print("Cleaned data saved!")

**Cleaning Summary:**

TODO: Document what cleaning steps were performed and why:
- Missing values: [strategy used]
- Duplicates: [action taken]
- Outliers: [how handled]
- Feature engineering: [new features created]

---

## Exploratory Data Analysis

**TODO:** Explore the data to understand patterns, relationships, and distributions

### Univariate Analysis

In [None]:
# TODO: Analyze distributions of individual variables

# Numeric variables - histograms and density plots
# if df_clean is not None:
#     numeric_cols = df_clean.select_dtypes(include=[np.number]).columns
#     
#     for col in numeric_cols:
#         fig, axes = plt.subplots(1, 2, figsize=(12, 4))
#         
#         df_clean[col].hist(bins=30, ax=axes[0], edgecolor='black')
#         axes[0].set_title(f'Histogram of {col}')
#         axes[0].set_xlabel(col)
#         
#         df_clean[col].plot(kind='density', ax=axes[1])
#         axes[1].set_title(f'Density Plot of {col}')
#         axes[1].set_xlabel(col)
#         
#         plt.tight_layout()
#         plt.show()

pass

In [None]:
# Categorical variables - bar charts
# if df_clean is not None:
#     categorical_cols = df_clean.select_dtypes(include=['object', 'category']).columns
#     
#     for col in categorical_cols:
#         plt.figure(figsize=(10, 5))
#         df_clean[col].value_counts().plot(kind='bar', edgecolor='black')
#         plt.title(f'Distribution of {col}')
#         plt.xlabel(col)
#         plt.ylabel('Count')
#         plt.xticks(rotation=45)
#         plt.tight_layout()
#         plt.show()

pass

### Bivariate Analysis

In [None]:
# TODO: Explore relationships between pairs of variables

# Correlation matrix for numeric variables
# if df_clean is not None:
#     numeric_cols = df_clean.select_dtypes(include=[np.number]).columns
#     
#     if len(numeric_cols) > 1:
#         plt.figure(figsize=(10, 8))
#         correlation_matrix = df_clean[numeric_cols].corr()
#         sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0, 
#                     square=True, linewidths=1)
#         plt.title('Correlation Matrix')
#         plt.tight_layout()
#         plt.show()

pass

In [None]:
# TODO: Scatter plots for key variable pairs
# plt.figure(figsize=(8, 6))
# plt.scatter(df_clean['var1'], df_clean['var2'], alpha=0.5)
# plt.xlabel('Variable 1')
# plt.ylabel('Variable 2')
# plt.title('Relationship between Var1 and Var2')
# plt.show()

pass

In [None]:
# TODO: Group comparisons
# if df_clean is not None:
#     # Example: Compare numeric variable across categories
#     # df_clean.groupby('category_col')['numeric_col'].describe()
#     
#     # Box plot by group
#     # plt.figure(figsize=(10, 6))
#     # sns.boxplot(x='category_col', y='numeric_col', data=df_clean)
#     # plt.xticks(rotation=45)
#     # plt.tight_layout()
#     # plt.show()
    
    pass

### Multivariate Analysis

In [None]:
# TODO: Explore relationships among multiple variables

# Pair plot for selected variables
# if df_clean is not None:
#     # Select key columns for pair plot
#     # key_cols = ['col1', 'col2', 'col3', 'target']
#     # sns.pairplot(df_clean[key_cols], hue='target')
#     # plt.show()
    
    pass

**EDA Findings:**

TODO: Summarize key insights from your exploratory analysis:
- What are the main patterns in the data?
- Are there any unexpected findings?
- Which variables seem most relevant to your research question?
- Are there any data quality issues that need addressing?

---

## Modeling and Analysis

**TODO:** Build and evaluate models to answer your research question

### Data Preparation for Modeling

In [None]:
# TODO: Prepare data for modeling
# from sklearn.model_selection import train_test_split
# from sklearn.preprocessing import StandardScaler, LabelEncoder

# Define features and target
# X = df_clean[['feature1', 'feature2', 'feature3']]
# y = df_clean['target']

# Split data
# X_train, X_test, y_train, y_test = train_test_split(
#     X, y, test_size=0.2, random_state=42
# )

# Scale features (if needed)
# scaler = StandardScaler()
# X_train_scaled = scaler.fit_transform(X_train)
# X_test_scaled = scaler.transform(X_test)

pass

### Model 1: [Model Name]

**TODO:** Describe the model and why you chose it

In [None]:
# TODO: Train your first model
# from sklearn.linear_model import LogisticRegression
# from sklearn.ensemble import RandomForestClassifier
# from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# model = LogisticRegression(random_state=42)
# model.fit(X_train_scaled, y_train)

pass

In [None]:
# TODO: Make predictions and evaluate
# y_pred = model.predict(X_test_scaled)

# print("Model Performance:")
# print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
# print("\nClassification Report:")
# print(classification_report(y_test, y_pred))

# Confusion Matrix
# cm = confusion_matrix(y_test, y_pred)
# plt.figure(figsize=(8, 6))
# sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
# plt.title('Confusion Matrix')
# plt.ylabel('True Label')
# plt.xlabel('Predicted Label')
# plt.show()

pass

### Model 2: [Model Name]

**TODO:** Describe your second model approach

In [None]:
# TODO: Train and evaluate second model

pass

### Model Comparison

In [None]:
# TODO: Compare model performance
# Create a comparison table or visualization

# results_df = pd.DataFrame({
#     'Model': ['Model 1', 'Model 2'],
#     'Accuracy': [acc1, acc2],
#     'Precision': [prec1, prec2],
#     'Recall': [rec1, rec2],
#     'F1-Score': [f1_1, f1_2]
# })
# display(results_df)

pass

### Feature Importance (Optional)

In [None]:
# TODO: Analyze feature importance (if applicable)
# if hasattr(model, 'feature_importances_'):
#     importance_df = pd.DataFrame({
#         'Feature': X.columns,
#         'Importance': model.feature_importances_
#     }).sort_values('Importance', ascending=False)
#     
#     plt.figure(figsize=(10, 6))
#     plt.barh(importance_df['Feature'], importance_df['Importance'])
#     plt.xlabel('Importance')
#     plt.title('Feature Importance')
#     plt.gca().invert_yaxis()
#     plt.tight_layout()
#     plt.show()

pass

**Modeling Results:**

TODO: Summarize your modeling findings:
- Which model performed best and why?
- What are the most important predictors?
- Are there any limitations or concerns with the models?
- Do the results answer your research question?

---

## Conclusions and Future Work

**TODO:** Summarize your project and findings

### Key Findings

TODO: List your main discoveries:
1. 
2. 
3. 

### Limitations

TODO: What are the limitations of this analysis?
- 
- 

### Future Work

TODO: What could be done to extend or improve this analysis?
- 
- 

### Recommendations

TODO: Based on your findings, what actions or decisions do you recommend?
- 
- 

---

## References

TODO: List all data sources, papers, and resources used:

1. 
2. 
3. 