# NFL Big Data Bowl 2024: Tackling

This year's competition offers up a general goal — create metrics that assign value to elements of tackling. You can access the NFL’s Next Gen Stats data as in previous competitions. This year's player tracking includes data from Weeks 1-9 of the 2022 NFL season. Data will show the location, speed, and acceleration of all 22 players on the field, along with football location. Additional PFF scouting data and NFL advanced stats such as expected points and win probability are also included.

Your challenge is generating actionable, practical, and novel insights from player tracking data corresponding to tackling. Examples include, but are not limited to:

- Predictions of tackle time, probability, and/or location
- Tackle range: angle of pursuit, speed and acceleration, closing speed
- Player evaluation (e.g, yards saved, tackle value, missed tackles)
- Credit assignment (e.g, one player makes a tackle because of another players, blocks shed, area of influence)
- Tackle type (solo vs gang, open field vs in the trenches, etc)
- Team and player roles and responsibilities (setting the edge, filling gaps, etc)

_Coaching presentation track._ This track aims to analyze and present data in a submission designed for coaches. We encourage participants interested in this track to partner with a coach (or current/former player), though this isn’t required.


**Dashboard Objectives:**

Defensive Stop Probability (For American Football): This metric assesses the likelihood of a player making a successful tackle based on their location, speed, and acceleration in proximity to the ball carrier. You can adapt this idea for soccer by redefining "tackle" as "defensive intervention" and using similar location, speed, and acceleration data.

1. **Evaluate Defensive Player Performance in Crucial Tackling Situations:** The primary objective of the dashboard is to enable coaches to assess how defensive players perform in critical tackling situations in American football. These situations are defined based on the likelihood of a player making a successful tackle. Coaches should be able to understand which defensive players excel in specific tackle situations and who may need improvement.

2. **Enhance In-Game Decision-Making:** Coaches should be able to use the dashboard during American football matches to make real-time decisions regarding player rotations and strategic adjustments. For example, they might analyze the performance of specific defensive players in critical tackle situations and decide whether to make substitutions.

3. **Identify Strengths and Weaknesses:** The dashboard should help coaches identify the strengths and weaknesses of defensive players in different tackle situations. Coaches should be able to view metrics that highlight areas where a player excels and areas for improvement when it comes to making successful tackles.

**Dashboard Features:**

1. **Tackle Situation Selection:** Coaches should be able to choose from a range of tackle situations for analysis. These situations are defined based on the likelihood of a player making a successful tackle. Coaches can select specific scenarios that align with critical defensive moments during a game.

2. **Player Selection:** Coaches should have the option to select specific defensive players or player groups (e.g., positions) for analysis. This allows coaches to focus on individual players or assess entire units in terms of their tackle performance.

3. **Metric Display:** The dashboard should display key metrics related to defensive stop probability. Metrics might include tackle success rates, historical performance trends in specific tackle situations, and comparative data for selected players or tackle scenarios.

4. **Interactive Visualizations:** The dashboard should feature interactive visualizations that allow coaches to explore the data. Visual elements may include charts, heatmaps, tables, and trend graphs that provide insights into defensive stop probabilities.

Player Characteristics: It considers the physical attributes and capabilities of the defensive player, such as their speed and acceleration. These attributes influence their ability to reach and tackle the ball carrier effectively.

Player Location: The location of the defensive player in relation to the ball carrier is a critical factor. It assesses how close the defender is to the ball carrier and whether they are in a favorable position to make a tackle.

Proximity to the Ball Carrier: This metric evaluates the proximity of the defensive player to the ball carrier at a specific point in time. It considers factors like the distance between the two players and their relative trajectories.

Tackle Success Prediction: The Defensive Stop Probability metric uses statistical models to predict the probability of a tackle being successful. The prediction is based on historical data, player attributes, and the current in-game situation.

Game Situation: It takes into account the context of the game, including the down, distance to the first down or goal, field position, and other situational variables. Different game situations can affect the probability of a successful tackle.

## Data Pre-processing

In [5]:
import pandas as pd
import numpy as np

# Import CSV files into python
games_df = pd.read_csv("games.csv")
player_df = pd.read_csv("players.csv")
plays_df = pd.read_csv("plays.csv")
tackles_df = pd.read_csv("tackles.csv")
week1_df = pd.read_csv("tracking_week_1.csv", low_memory = False)

In [6]:
games_df.head()

Unnamed: 0,gameId,season,week,gameDate,gameTimeEastern,homeTeamAbbr,visitorTeamAbbr,homeFinalScore,visitorFinalScore
0,2022090800,2022,1,09/08/2022,20:20:00,LA,BUF,10,31
1,2022091100,2022,1,09/11/2022,13:00:00,ATL,NO,26,27
2,2022091101,2022,1,09/11/2022,13:00:00,CAR,CLE,24,26
3,2022091102,2022,1,09/11/2022,13:00:00,CHI,SF,19,10
4,2022091103,2022,1,09/11/2022,13:00:00,CIN,PIT,20,23


In [7]:
player_df.head()

Unnamed: 0,nflId,height,weight,birthDate,collegeName,position,displayName
0,25511,6-4,225,1977-08-03,Michigan,QB,Tom Brady
1,29550,6-4,328,1982-01-22,Arkansas,T,Jason Peters
2,29851,6-2,225,1983-12-02,California,QB,Aaron Rodgers
3,30842,6-6,267,1984-05-19,UCLA,TE,Marcedes Lewis
4,33084,6-4,217,1985-05-17,Boston College,QB,Matt Ryan


In [8]:
plays_df.head()

Unnamed: 0,gameId,playId,ballCarrierId,ballCarrierDisplayName,playDescription,quarter,down,yardsToGo,possessionTeam,defensiveTeam,yardlineSide,yardlineNumber,gameClock,preSnapHomeScore,preSnapVisitorScore,passResult,passLength,penaltyYards,prePenaltyPlayResult,playResult,playNullifiedByPenalty,absoluteYardlineNumber,offenseFormation,defendersInTheBox,passProbability,preSnapHomeTeamWinProbability,preSnapVisitorTeamWinProbability,homeTeamWinProbabilityAdded,visitorTeamWinProbilityAdded,expectedPoints,expectedPointsAdded,foulName1,foulName2,foulNFLId1,foulNFLId2
0,2022100908,3537,48723,Parker Hesse,(7:52) (Shotgun) M.Mariota pass short middle t...,4,1,10,ATL,TB,ATL,41,7:52,21,7,C,6.0,,9,9,N,69,SHOTGUN,7.0,0.747284,0.976785,0.023215,-0.00611,0.00611,2.360609,0.981955,,,,
1,2022091103,3126,52457,Chase Claypool,(7:38) (Shotgun) C.Claypool right end to PIT 3...,4,1,10,PIT,CIN,PIT,34,7:38,14,20,,,,3,3,N,76,SHOTGUN,7.0,0.416454,0.160485,0.839515,-0.010865,0.010865,1.733344,-0.263424,,,,
2,2022091111,1148,42547,Darren Waller,(8:57) D.Carr pass short middle to D.Waller to...,2,2,5,LV,LAC,LV,30,8:57,10,3,C,11.0,,15,15,N,40,I_FORM,6.0,0.267933,0.756661,0.243339,-0.037409,0.037409,1.312855,1.133666,,,,
3,2022100212,2007,46461,Mike Boone,(13:12) M.Boone left tackle to DEN 44 for 7 ya...,3,2,10,DEN,LV,DEN,37,13:12,19,16,,,,7,7,N,47,SINGLEBACK,6.0,0.592704,0.620552,0.379448,-0.002451,0.002451,1.641006,-0.04358,,,,
4,2022091900,1372,47857,Devin Singletary,(8:33) D.Singletary right guard to TEN 32 for ...,2,1,10,BUF,TEN,TEN,35,8:33,7,7,,,,3,3,N,75,I_FORM,7.0,0.470508,0.83629,0.16371,0.001053,-0.001053,3.686428,-0.167903,,,,


In [9]:
tackles_df.head()

Unnamed: 0,gameId,playId,nflId,tackle,assist,forcedFumble,pff_missedTackle
0,2022090800,101,42816,1,0,0,0
1,2022090800,393,46232,1,0,0,0
2,2022090800,486,40166,1,0,0,0
3,2022090800,646,47939,1,0,0,0
4,2022090800,818,40107,1,0,0,0


In [10]:
week1_df.head()

Unnamed: 0,gameId,playId,nflId,displayName,frameId,time,jerseyNumber,club,playDirection,x,y,s,a,dis,o,dir,event
0,2022090800,56,35472.0,Rodger Saffold,1,2022-09-08 20:24:05.200000,76.0,BUF,left,88.37,27.27,1.62,1.15,0.16,231.74,147.9,
1,2022090800,56,35472.0,Rodger Saffold,2,2022-09-08 20:24:05.299999,76.0,BUF,left,88.47,27.13,1.67,0.61,0.17,230.98,148.53,pass_arrived
2,2022090800,56,35472.0,Rodger Saffold,3,2022-09-08 20:24:05.400000,76.0,BUF,left,88.56,27.01,1.57,0.49,0.15,230.98,147.05,
3,2022090800,56,35472.0,Rodger Saffold,4,2022-09-08 20:24:05.500000,76.0,BUF,left,88.64,26.9,1.44,0.89,0.14,232.38,145.42,
4,2022090800,56,35472.0,Rodger Saffold,5,2022-09-08 20:24:05.599999,76.0,BUF,left,88.72,26.8,1.29,1.24,0.13,233.36,141.95,


## Feature Engineering

In [None]:
# Distance to the Ball Carrier: Calculate the distance between the defender and the ball carrier.


In [None]:
# Relative Speed: Compute the relative speed between the defender and the ball carrier, indicating the closing speed.

In [None]:
# Relative Acceleration: Calculate the relative acceleration, which indicates how quickly the defender is accelerating toward the ball carrier.

In [None]:
# Historical Tackle Success Rate: Compute each player's historical tackle success rate based on past data.

In [None]:
# Game Situation Variables: Incorporate game-specific factors like down, distance, field position, and other relevant variables that may affect tackle success.

## Model Development

Choose an appropriate model for predicting Defensive Stop Probability. Some common options are:
- Logistic Regression: A simple model suitable for binary classification problems. It can predict the probability of a successful tackle.
- Decision Trees and Random Forests: Decision trees can capture non-linear relationships in the data, while random forests provide an ensemble approach for improved accuracy.
- Gradient Boosting: Algorithms like XGBoost or LightGBM are powerful for classification tasks and can be fine-tuned for better performance.
- Deep Learning Models: Neural networks, such as feedforward or convolutional neural networks (CNNs), can capture complex relationships in the data.