<div style="text-align: center; font-size: 30px; text-decoration: underline;">Defensive Stopping Power : Considering Yards Allowed After Every Contact</div>
<div style="text-align: center; font-size: 20px;" markdown="1">Undergraduate Track Submission</div>
<div style="text-align: center; font-size: 20px;" markdown="1">Allan Paiz</div> 
<div style="text-align: center; font-size: 20px;" markdown="1">University of South Carolina</div>
<div style="text-align: center; font-size: 20px;" markdown="1">apaiz@email.sc.edu</div>

# 1. Introduction
Traditional defensive evaluations in football often rely on basic metrics like tackle and assist counts, which do not fully capture a defensive player's overall impact. Despite the advancements in today's technology, which make it easier to conceive complex ideas, the development of a baseline contact model has been overlooked. This project aims to establish this foundational model.

I started with developing a framework to identify **Potential Impact Defenders (PID)** – these are defenders likely to come into contact with the ball carrier. Building upon this, the project introduces an **Artificial Neural Network (ANN)**, named **The Contact Model (TCM)**, which identifies every instance of defender-ball carrier contact during a play. Utilizing The Contact Model, two metrics are created: **Yards Allowed After Contact (YAAC)** and **Defensive Stopping Power (DSP)**. These metrics mirror the concept of Yards After Contact, used for ball carriers, but applies it to defensive players, calculating YAAC from the moment of contact until the play concludes.

This project is centered around two principal objectives:

1. **Developing The Contact Model**: This involves creating a model capable of identifying contact between the ball carrier and defenders throughout a play.
2. **Leveraging The Contact Model**: Utilizing this model, the paper introduces metrics such as Yards Allowed After Contact (YAAC) and Defensive Stopping Power (DSP). 

# 2. Motivation
In the early stages of developing a tackle metric, I encountered a significant roadblock: accurately identifying all instances of defender-ball carrier contact within the available data sets. While this task seems straightforward when watching a game or analyzing game footage, it proves to be far more complex when working solely with tracking data. This complexity initially stalled my original ideas.
So the primary goal became to discern any *meaningful* defender-ball carrier contact during a play. One silver lining in the tracking data is the *first_contact* flag within the *event* column. However, a closer examination revealed several limitations of the *first_contact* flag:
1. **Lack of Defender Identification**: While this flag indicates the initial contact, it fails to specify the defender involved.
2. **Inconsistent Definition** (See *Figure 1*): The criteria for what constitutes *first_contact* varies widely between plays. The variation reflects the nature of tackles in football, ranging from minimal touches like fingertip contacts to full body extensions, bear hugs, and everything in between.
3. **No Subsequent Contacts**: The flag only marks the first instance of contact and overlooks any subsequent interactions. This limitation significantly undermines the ability to represent the dynamics of a play.

<div style="width: 75%; text-align: center; margin-top: 1em; margin-bottom: 2em; margin: 0 auto;">
  <img src="https://raw.githubusercontent.com/allanpaiz/Defensive_Stopping_Power/main/figures/first_contact.gif">
</div>
<div style="text-align: center;" markdown="1"> <strong>Figure 1.</strong> Showing the variety of <em>first_contact</em> triggers. </div>

# 3. Developing The Contact Model
**The Contact Model (TCM)**, an **Artificial Neural Network (ANN)**, involves identifying and classifying defenders based on their potential impact on the ball carrier. Using this classification, TCM pinpoints moments of defender-ball carrier contact throughout a play.

### 3.1 Potential Impact Defender 
To enhance the *first_contact* flag, I needed to identify defenders who were in contact with the ball carrier and possessed the characteristics to make *meaningful* contact—contact that resulted in some form of recognition. **Potential Impact Defenders (PID)** are those defenders who have a high likelihood of being in contact and influencing the ball carrier's progress during a play. The primary criteria for determining a PID are:
1. **Proximity to the Ball Carrier**: Identifying the closest defender at the time of *first_contact*, using Euclidean distance.
2. **Box Score Association**: The closest defender must be credited in the box score or datasets with a tackle, assist, forced fumble, or missed tackle, demonstrating actual involvement in the play.
- Defenders meeting only the proximity criterion, without a direct association with the play's outcome, are labeled as **Non-Impact Defenders (NID)**. (Note: The NID classification is only relevant in the context of *first_contact*.)

### 3.2 Feature & Model Motivation
Inspired by Dmitry Gordeev and Philipp Singer, the 2020 Big Data Bowl winners with their *The Zoo* solution, I decided to integrate each defender's spatial and velocity data relative to the ball carrier. This model aligns with the concept that 'defenders are trying to tackle the ball carrier,' making, 'the whole setup much more straightforward.' [[1](#a)]
The model assesses each defender's relationship with the ball carrier independently, disregarding interactions between defenders and other offensive players. I trained the model on every frame flagged as *first_contact* in the tracking data, focusing on predicting the likelihood of a PID.
For its ability to ‘learn anything’ [[2](#b)] I chose an ANN to discern a pattern given the complexity of the data and the myriad variations of two opposing players coming into contact (see *Figure 2*).

<div style="width: 100%; text-align: center; margin-top: 1em; margin-bottom: 2em; margin: 0 auto;">
  <img src="https://raw.githubusercontent.com/allanpaiz/Defensive_Stopping_Power/main/figures/features.png">
</div>
<div style="text-align: center;" markdown="1"> <strong>Figure 2.</strong> Spatial and Velocity Heatmaps, data used as features for the model.</div>

### 3.3 The Neural Network
In the development of The Contact Model the process involved the following steps:
- **Data Integration**: Combining all the tracking data to train the model on as many variations as possible.
- **Focus on First Contact**: The initial filtering process centered on frames where *first_contact* occurred.
- **Standardization of Data**:  Standardizing the direction of tracking data for input simplification.
- **Assigning Ball Carrier Data**: Each defender was assigned the spatial data of the plays ball carrier.
- **PID Identification and Labeling**: PIDs were identified and labeled, based on the predefined criteria.
- **Velocity Vectors and Differences**: Computing and assigning velocity vectors and differences to each defender.

Specifically targeting *first_contact* frames the dataset comprised of only defender-ball carrier relationships. Training on 80% of the data, the model returned a 96.48% accuracy in identifying PIDs in the data test split.
The neural network draws inspiration from *The Zoo's* Convolution Neural Network, but I opted for a more straightforward approach, creating a feed-forward neural network. The network includes the following:
- **Single Hidden Layer**: Utilizing the ReLu Activation Function for its simplicity.
- **Loss Function**: Binary Crossentropy, aligning with the binary nature of the target variable.
- **Optimizer**: 'Adam' was selected, considering the presumed noisiness of the data.
- **Output Layer**: A sigmoid activation function was used, translating the output into a probability.

### 3.4 Predicted Contact Score (PCS)
Building on the success of TCM, I expanded its application to predict PIDs across all defender-ball carrier relationships in every frame of every play. This led to the introduction of the **Predicted Contact Score (PCS)**.
PCS is a metric generated by TCM for each defender in a play. It provides a distribution, indicating the likelihood of a defender being in contact with the ball carrier at any given moment during a play. This score offers a novel perspective on the game, quantifying the defensive effort in a way that traditional metrics can't capture.

The charts below showcase TCM and PCS in action (See *Figure 3*). Each line in these graphs represents a defender, and every peak within these lines signifies a moment of defender-ball carrier contact. No two graphs are the same, the diversity showcases the nature of football. Some graphs are dense with lines, while others end quickly. These visualizations are not just data; they're narratives of each play.


<div style="text-align: center;" markdown="1"> Using your imagination can you spot the 'tush-push'? </div>

<div style="width: 100%; text-align: center; margin-top: 1em; margin-bottom: 2em; margin: 0 auto;">
  <img src="https://raw.githubusercontent.com/allanpaiz/Defensive_Stopping_Power/main/figures/pcs_distro.png">
</div>
<div style="text-align: center;" markdown="1"> <strong>Figure 3.</strong> A visual representation showing the variation in PCS distributions per play.</div>


# 4. Analysis & Uses
The true innovation of The Contact Model lies in its ability to do what we instinctively do with our eyes but in a more structured and quantifiable manner (see *Figure 4*). This play is a great example as it includes a little bit of everything. It starts with an extended arm triggering *first_contact* , two consecutive missed tackles, and it finishes with a combined tackle to end the play.
<div style="width: 100%; text-align: center; margin-top: 1em; margin-bottom: 2em; margin: 0 auto;">
  <img src="https://raw.githubusercontent.com/allanpaiz/Defensive_Stopping_Power/main/figures/graphs.gif">
</div>
<div style="text-align: center;" markdown="1"> <strong>Figure 4.</strong> PCS animation with the corresonding play animation. </div>



### 4.1 Yards Allowed After Contact (YAAC)
Utilizing the PCS distribution, the **Predicted Contact Moment (PCM)** is identified as the peak score within this distribution. **YAAC** is a metric designed to assign responsibility for the play's outcome to any defender identified by The Contact Model. The steps to calculate YAAC are as follows:
1. **Spot the Moment of Contact**: The location of the contact on the field, denoted as *contact_X*, is determined using the PCM. 
    - Note: I only evaluated PCM’s that passed my 25% threshold.  This is an arbitrary value, chosen by stabilizing the distribution of all PCS scores.
2. **Final Ball Carrier Location**: At the *event* of a *tackle*, *out-of-bounds*, or *touchdown*, the final position of the ball carrier is recorded as *final_X*.
3. **Assigning YAAC**: YAAC is calculated for each defender with a PCM. This calculation represents the distance (in yards) the ball carrier traveled from the point of contact to the conclusion of the play. The formula is as follows:
$$\text{YAAC} = \text{final}_X - \text{contact}_X$$

<div style="width: 100%; text-align: center; margin-top: 1em; margin-bottom: 2em; margin: 0 auto;">
  <img src="https://raw.githubusercontent.com/allanpaiz/Defensive_Stopping_Power/main/figures/ContactPredictionGraph.gif">
</div>
<div style="text-align: center;" markdown="1"> <strong>Figure 4.</strong> Animation visualizing YAAC.</div>




With TCM and YAAC, we can now begin to evaluate players, compare performances, build teams, and generate scouting reports.
- *yaac_count*: The total number of PCMs and YAAC calculations, useful as a measure of involvement when paired with snap counts.
- *yaac_total*: The sum of all yards allowed by a defender.
- *yaac_avg*: The average yards allowed after making contact with a ball carrier.

Using gameplay data from weeks 1 through 9, we examine the *yaac_avg* leaderboard for defensive linemen (See *Figure 5*).


<div style="width: 100%; text-align: center; margin-top: 1em; margin-bottom: 2em; margin: 0 auto;">
  <img src="https://raw.githubusercontent.com/allanpaiz/Defensive_Stopping_Power/main/figures/DL_leaderboard.png">
</div>
<div style="text-align: center;" markdown="1"> <strong>...</strong></div>
<div style="width: 100%; text-align: center; margin-top: 1em; margin-bottom: 2em; margin: 0 auto;">
  <img src="https://raw.githubusercontent.com/allanpaiz/Defensive_Stopping_Power/main/figures/DL_bottom.png">
</div>
<div style="text-align: center;" markdown="1"> <strong>Figure 5.</strong> Defensive Linemen Average YAAC leaderboard.</div>




### 4.2 Defensive Stopping Power (DSP)
Defensive Stopping Power’s core is developed from YAAC. While YAAC begins as a metric for individual defenders, DSP extends this to evaluate a team's collective efficiency.
To compute DSP, we define the following components:
- **Team YAAC (TYAAC)**: This is the aggregate of YAAC by all defenders in a single play.
- **Yards Allowed (YA)**: The official yardage gained by the offense during a play.
- **Yards Allowed Adjusted (YAA)**: The sum of TYAAC and YA, offering a performance based adjustment on Yards Allowed.
- **Yards Conceded per Player (YCP)**: YAA divided by the number of defenders involved in the play.
- **Defender Ratio (DR)**: The ratio of the number of defenders in a play to the total number of defensive players (11).

Calculating DSP:
$$\text{DSP} = \text{DR} \times \text{YCP}$$

DSP, as a metric, encapsulates a team's defensive efficiency, shifting focus from individual to collective performance, crucial in this sport which is reliant on team strategy and coordination. It can provide strategic analysis or serve as a benchmark for assessing the strength of your favorite team’s defense and how they compare to the rest of the league. 

<div style="width: 100%; text-align: center; margin-top: 1em; margin-bottom: 2em; margin: 0 auto;">
  <img src="https://raw.githubusercontent.com/allanpaiz/Defensive_Stopping_Power/main/figures/team_dsp.png">
</div>
<div style="text-align: center;" markdown="1"> <strong>Figure 6.</strong> Team DSP vs Points Allowed (Weeks 1-9).</div>


# 5. Conclusion, Limitations, & Future Work
- Refinement of PID Concept: While the PID concept is promising, it requires further refinement with more specific data. The present definition of PID comes with limitations, particularly in accurately identifying the *first_contact* defender and in the assumptions about contact intensity and outcomes. This is a major area needing improvement.
- Model Implementation: The Contact Model, given my current experience in deep learning, is basic in its construction yet proves to be effective in its application. Unfortunately, extensive testing, tuning, and validation with larger datasets are outside of the scope of this project.
- Path Forward: Future enhancements should focus on redefining defender-ball carrier contact, optimizing TCM parameters, expanding the dataset, and creating new robust defensive metrics.

In conclusion, while this project represents an initial step, it sets the stage for deeper understanding and more sophisticated interpretations of an effective NFL defense.

[**Github Repository**](https://github.com/allanpaiz/Defensive_Stopping_Power)

# 6. Resources
* [1]<a id="a"></a> https://medium.com/kaggle-blog/from-football-newbies-to-nfl-data-champions-a-winners-interview-with-the-zoo-391793168714
* [2]<a id="b"></a> https://www.youtube.com/watch?v=0QczhVg5HaI