# To Breakdown or not to Breakdown : An Open Field Tackle Analysis

## Introduction

Tackling in American football is a fundamental aspect of the game, serving as a decisive defensive skill that can shape the course of a play or even an entire game. These dynamic events encompass countless scenarios that elicit a unique response from the defender. Arguably the most difficult of these scenarios occurs in the open field. Without the leverage of teammates or the sideline, proper technique is critical to ensure execution at the point of attack. One such technique is what this paper will refer to as “breaking down” for the tackle. This involves slowing down and adopting a balanced stance. This makes it easier for the defender to react to the movements of the ball carrier. The assumed tradeoff of a break down is an increase in the likelihood of completing a successful tackle for time and distance gained by the ball carrier. The purpose of this report is to isolate open field scenarios and test if breaking down for a tackle is associated with the execution of such.

## Methodology

### Data Preperation

All data used in this analysis was supplied as part of the 2024 NFL Big Data Bowl. Before  breakdowns can be analyzed, all open field tackle situations within the ‘Tracking’ dataset will need to be identified. This paper defines an open field tackle as satisfying the following requirements when the defender is within five yards of the ball carrier:

<ol>
<li> No other players within the tackling lane. This is verified by checking that all players not involved in the open field tackle are outside the $(x_{min}-1,y_{min}-1)$x$(x_{max}+1,y_{max}+1)$ box built off of the tackler and ball carriers positions. The aim is to verify that there is ample space to consider the event as having occurred in the open field. </li>
<br/>
<div>
<img src="Diagrams/checkOpenField.png", style="width:400px; height:600px;"/>
</div>
<br/>
<li> Neither the ball carrier nor the tackler is within 5yards from either sideline. This is to eliminate defenders utilizing the sideline to force a one-way go. </li>

<li> The x distance between the defender and the endzone that he is guarding is less than the distance between the ball carrier and the same endzone. This exclusion aims to remove chase scenarios from the sampled set. </li>

<li> The orientation of both the ballcarrier and the tackler is within 45 degrees from the line perpendicular to the line of scrimmage before the ball is snapped. The intention is to eliminate scenarios where the ball carrier bounces horizontally and instigates a chase from the defender. </li>
<br/>
<div>
<img src="Diagrams/checkFaceForward.png", style="width:600px; height:400px;"/>
</div>
</ol>

Please note that all items except for exclusion 1 are not applied when the distance between the ball carrier and the defender is less than 1yd. A collision is assumed at distances less than 1 yard which could falsely violate said rules.

<br/>

Each _tackle_ and _pff_missedTackle_ within the _tackles_ dataset is then independently held against the above requirements. This is done by iterating through each defender in the _tackles_ dataset Only _pff_missedTackle_ events are evaluated on plays where an assist is recorded as an assist implies dependency of multiple defenders. Additionally, all identified open field tackle scenarios where a defensive lineman (NT, DT, DE) is the potential tackler have also been removed. The result is a set of __225 open field scenarios with a tackle success rate of 67.1%__. 

The key features identified during the data scrubbine process are mean acceleration for both the ball carrier (_cAccMean_) and the potential tackler (_tAccMean_). Both are calculated within the $1yd-5yd$ range. Also, both values are created from the difference in the _s_ column so that deceleration values are captured with a negative value and vice-versa. The tackler and ball carrier acceleration values are then utilized to construct categorical variables. This is defined as such for each carrier and tackler identified, $x$:
<br/>

<nl>
<li> $\text{xSlowed} = (\text{xAccMean} <= -1\frac{yd}{s^{s}})$
<li> $\text{xSteady} = (-1\frac{yd}{s^{s}} < \text{xAccMean} < 1\frac{yd}{s^{s}}$)
<li> $\text{xIncrease} = (\text{xAccMean} >=  1\frac{yd}{s^{s}}$)
</nl>

### Analysis
As previously stated, the intention of this report is to understand if there is an association between the ball carrier breaking down for a tackle and the execution of assignment. The method chosen to prove such is the Chi-Square Test of Association. The variables to be tested are the tSlowed variable indicating that a player has significantly decelerated within the final 5 yards and the resulting tackle indication. For the purposes of this report, we will assume that tSlowed is indicative of the defender breaking down. Before proceeding with this test ample quantities of samples will need to be validated. This can be done by validating that all expected frequencies have a value of 5.

In [53]:
display(slow_expected)

tackle,0,1
tSlowed,Unnamed: 1_level_1,Unnamed: 2_level_1
0,26.604444,55.031111
1,46.071111,95.297778


With expected frequency counts validated, prerequisites are met to proceed with the model. An alpha value of 0.05 is assumed. The hypotheses for this test are stated below:
<br/>
<br/>
H0: The defender breaking down is not associated with the tackle outcome
<br/>
H1: The defender breaking down is associated with the tackle outcome
<br/>
<br/>
Running the test against the stated hypothesis returns X^2 = 1.449 and p = 0.229. This well in excess of our alpha so the null hypothesis is accepted. Now that the test has been completed, more investigative work is warranted. Below is a matrix of successful tackle percentage given both ball carrier and tackler acceleration behavior with 5 yards of contact.


In [54]:
display(dfAccMatrix)

Unnamed: 0,tSlowed,tSteady,tIncrease
cSlowed,0.68,0.73,0.5
cSteady,0.63,0.81,0.6
cIncrease,0.25,0.84,0.82


The above display indicates that situations where the tackler maintained velocity through the last 5 yards (_tSteady_) outperformed the other tackler acceleration categories across all types of carrier acceleration profiles. By performing the same Chi-Square Test of Association for _tSteady_, (There are sufficient expected frequencies) the result is X^2 = 3.967 and p = 0.046. Hence, the alternative hypothesis can be accepted. tSteady is associated with the tackle outcome.

## Discussion
Between the deceleration of the defender not being significantly associated with the tackle outcome and maintaining acceleration being associated with the tackle outcome, results seem promising for defensive minds that believe taking away time and space is key to securing an open field tackle. Due to limited sample size and unmeasured factors, further investigation will be needed to better express these initial results. The break down is one of many actions prescribed in most open field tackle procedures. Others factors include but are not limited to helmet placement (cross body v Rugby style), lowering the center of mass, wrapping up the ball carrier, driving through the point of contact, maintaining horizontal leverage, etc. Some of these are measurable with the provided data set and some are not. Regardless, additional data and feature development would allow for more precise and significant results via more advanced analytical methods.


## References
<br/>
Anton, Howard, et al. Applied Finite Mathematics. Brooks/Cole Thomson Learning, 2001.
<br/>
<br/>
Durrett, Richard. Elementary Probability for Applications. Cambridge University Press, 2016.
<br/>
<br/>
Michael Lopez, Thompson Bliss, Ally Blake, Andrew Patton, Jonathan McWilliams, Addison Howard, Will Cukierski. (2023). NFL Big Data Bowl 2024. Kaggle. https://kaggle.com/competitions/nfl-big-data-bowl-2024

### Git

https://github.com/ammoore45/NFLBDB2024

In [51]:


import pandas as pd
import datetime
import numpy as np
import os
import seaborn as sns
from sklearn.feature_selection import chi2
import warnings
from IPython.display import display
warnings.filterwarnings('ignore')

dfOFT = pd.read_csv('C:\\Users\\austi\\Documents\\2024_Data_Bowl\\Data_Exploration\\chi_data.csv')

X = dfOFT[['tSlowed']]
y = dfOFT['tackle']

slow_observed = pd.crosstab(X.tSlowed, y, margins=False)
slow_expected = pd.crosstab(X.tSlowed, y, margins=False)

for row in range(0, slow_expected.shape[0]):
    for column in range(0, slow_expected.shape[1]):
        exp = (np.sum(slow_observed.iloc[row, :]) * np.sum(slow_observed.iloc[:, column])) / 225
        slow_expected.iloc[row, column] = exp
   

chi_scores = chi2(X,y)
chi_values = pd.Series(chi_scores[0], index=X.columns)
chi_values.sort_values(ascending=False, inplace=True)
chi_slow=chi_values[0].item()

p_values = pd.Series(chi_scores[1], index=X.columns)
p_values.sort_values(ascending=False, inplace=True)
p_slow=p_values[0].item()

dfAccMatrix = pd.DataFrame({'tSlowed':[0.68,0.63,0.25], 'tSteady':[0.73,0.81,0.84], 'tIncrease':[0.50, 0.60, 0.82]}, index = ['cSlowed', 'cSteady', 'cIncrease'])
dfAccMatrix = dfAccMatrix.style.set_caption("Tackle Success Rate Matrix").background_gradient(cmap='BuGn')
dfAccMatrix = dfAccMatrix
display(dfAccMatrix)



X2 = dfOFT[['tSteady']]

stead_observed = pd.crosstab(X2.tSteady, y, margins=False)
stead_expected = pd.crosstab(X2.tSteady, y, margins=False)

for row in range(0, stead_expected.shape[0]):
    for column in range(0, stead_expected.shape[1]):
        exp = (np.sum(stead_observed.iloc[row, :]) * np.sum(stead_observed.iloc[:, column])) / 225
        stead_expected.iloc[row, column] = exp

chi_scores = chi2(X2,y)
chi_values = pd.Series(chi_scores[0], index=X2.columns)
chi_values.sort_values(ascending=False, inplace=True)
chi_stead=chi_values[0].item()

p_values = pd.Series(chi_scores[1], index=X2.columns)
p_values.sort_values(ascending=False, inplace=True)
p_stead=p_values[0].item()

Unnamed: 0,tSlowed,tSteady,tIncrease
cSlowed,0.68,0.73,0.5
cSteady,0.63,0.81,0.6
cIncrease,0.25,0.84,0.82
