## Make a DENT: Defensive Efficacy in Normalized Tackles (Undergraduate Track)

Across sports, one of the hardest aspects of matches to track is defensive output. So much of defense takes place in the abstract, in the moments around the play. Most importantly, you can’t determine exactly what a good defensive play prevented because it was prevented. All there is to analyze are the moments where the defender comes in contact with the attack – a fielder diving for a fly ball, a center closing out to make a block in basketball, a linebacker making a tackle in football. You can look at the speed, the positioning, the physicalities of these defensive plays, but that only gives you a portion of the picture. What’s the impact that these plays have on the future of the match?  
Defensively, tackling is the most important aspect of a sport often called “tackle football.” A great tackle has been shown to be the difference between winning and losing a championship, as the 2000 Rams know well. Of course, what makes Mike Jones’ tackle of Kevin Dyson so great isn’t necessarily its technique, but rather its impact.  
Following this philosophy, we broke our analysis into two parts: technique and impact. The first gave us a baseline number, and the second was a multiplier on that baseline.

The baseline is broken into three parts: (1) physical, (2) process, and (3) success. The physical aspect depends on the speed and the angle, and is essentially a measure of how aggressive the tackle was. We defined an aggressive tackle as moving toward the ball carrier, rather than waiting for their movement, which is rewarded because it stops the play with fewer yards gained by the defense. We categorized types of tackles as shown in the diagram on the side. Note that the green arrow is the tackler, and the blue is the ball carrier. The angle of the vector between the tackler and ball carrier plays an important role in how difficult the tackle is, and the speed of the defender is more important in determining the skill of the tackle than the offensive player. So, the physical aspect of our metric is the dot product between the ball carrier’s speed and the square of the defender’s speed.  
physical =  $\sqrt{s_D^2*s_0*cos(\theta_min)}$


The process portion highlighted how effective the tackle was once it began. A better tackle ends the play without letting the ball go any further, so we used the distance between first contact and the ball carrier hitting the ground. To do this, we used the tracking data to determine when the tackling player first makes contact with the ball carrier, which we determined to be when the defender was within one yard of the ball carrier. We stop recording when the ball carrier is deemed “tackled” by the tracking dataset. However, the issue of forward progress came up. This means that a player dragged back 7 yards from where the ball was spotted was counting in favor of the tackler, which it should not be. We solved this by using the result of the play rather than the tracking data. However, this brought up another edge case in which the player goes slightly out of bounds but isn’t caught by the referees, and later overturned by replay review. The tracking data still treats the tackle as where the player went down while the plays data puts the play result where the player actually went out of bounds, which gave a huge favor to the player making the tackle. We once again fixed this by choosing the lower of the values for x(end):  $\sqrt{(x_start - x_end)^2}$

Finally, we factored in the success of the tackle: a tackle with a higher baseline before is a more difficult tackle, so it is penalized less for failing than a safer tackle (which should not fail at the NFL level). A successful tackle kept its baseline number, and an unsuccessful tackle followed this formula: baseline =(physical + process - max(physical + process)) *.5

Of course, this is all just the baseline score for a tackle. The difficult part of quantifying defense is measuring what happens afterwards, e.g., how important the tackle was, and how much pressure there was on the tackle, because a good defensive play necessarily obscures what would have happened. With that in mind, we considered a variety of variables, including the down, time into the game, and the win probability before the tackle, in creating the second half of our metric.  
Getting a metric for the leverage of the play and the quality of the play result (leveraged result) allows us to get an overall metric that more closely adheres to what makes a good tackle. A tackle that occurs 2 yards after the line to gain is worse than a tackle that occurs two yards before it. Often what separates those two tackles are factors unrelated to the tackle itself, like prior placement. However, the point of this metric is for teams to be able to identify defenders who make a significant positive impact through their tackling, which necessarily includes positioning and other defensive playmaking.


The multiplier depends not on the total yardage gained, but the yardage gained with regard to the yardage needed to get a first down (or touchdown, if the offensive team is in an ‘end goal’ situation). This, combined with the multiplier being a piecewise function at the line to gain, creates a large drop off for tackles that occur after the line to gain.  
The multiplier is then scaled linearly by the time remaining in the game when the play occurs. This is meant to represent the increase in pressure and ability to change the game later in the game. Using change in win probability as a factor in the model would be, in our view, misguided, because the tackler cannot be credited with all of the change in win probability. Time left in the game is a factor in the change in win probability, so we isolated it, as well as other events that contribute to swings in win probability, such as allowing a first down (scaled by the down the play occurs on).  
Multiplying the tackle’s baseline score with its leverage score gives us its DENT: Defensive Efficacy in Normalized Tackles, named after Richard Dent, the Hall of Fame defensive end who helped lead the Chicago Bears’ infamous Monsters of the Midway defense in 1985 to a Lombardi trophy, as a tribute to our own home at the Chicago Midway. A player’s DENT+ score takes their average DENT score on all tackles, scaled such that the league average is 100, just as ERA+ and OPS+ are.   
One thing that is not standardized in DENT+ is player position. pDENT+ compares only players of the same position, where the average DENT+ value for each position is 100.   
One issue with the implementation of DENT was that some missed tackles that are included in the dataset occur before tracking data starts. Roughly 70 plays were excluded from our analysis because the attempted tackle occurred before the final ball carrier got the ball, most likely due to a missed sack attempt. We do not have sufficient data to grade these missed tackles so we exclude them.


# Results

**Physical:**
The physical data tended to favor players who make tackles down the field in the open field, namely defensive backs, corners and safeties. Because of that, people who have worse cover skills often jump near the top of the list since they allow catches at high speed and make tackles off of that. With that in mind, it is important to keep in mind that this project is looking at players’ ability to tackle and not cover. Make downfield tackles at high speed after getting burned on a route still is an important factor to consider. However, we admit that for cornerbacks and safeties, the data is noisy at best.

In [None]:
plt.figure(figsize=(8, 6))
plt.bar(positions.position, positions.DENTplus, color='skyblue')

plt.xlabel('Positions')
plt.ylabel('DENT+')
plt.title('DENT+ by Position')

plt.show()


shows the difference between each of the positions, and how defensive linemen get the short end of the stick in terms of their scores.

**Leverage:**
The highest play on our combined leveraged results metric (not DENT) is Quandre Diggs’ 3rd quarter forced fumble on 4th and Goal at the 1 yardline against the Broncos in Week 1. This makes a great deal of sense. That is a very high leverage play and the result is also obviously quite good. Interestingly, the second highest scoring play in our leveraged result metric is a forced fumble on 4th and Goal on the 1 yard line by the Seahawks against the Broncos. This one was forced by Uchenna Nwosu. By contrast, the lowest leveraged results score occurred on a 50 gain on 3rd and 1 in Falcons vs. Browns. The reason for that gain is poor coverage by the defender rather than the tackle itself but the result is still a negative one for the defender’s overall abilities. However, this aspect is normalized through the efficiency portion of our statistic.

**DENT and DENT+:**
The top 5 players in terms of average DENT and DENT+ per tackle over the timespan of this data are Benjamin St-Juste, Jaelan Phillips, Darnay Holmes, Tre Flowers, and A.J. Parker. 
The position that performed the best in terms of average DENT or DENT+ was defensive back. This might represent that their tackles are more valuable and more infrequent. This metric does not attempt to measure the quality of defensive back coverage. Getting burned in coverage and making a good tackle leads to a good DENT score but is still a bad outcome for the defense.


In [None]:
dentData["pDENTplus"] = dentData.DENT * 100 / dentData.DENT.mean()
for p in range(len(grouped.index)):
    dentData.at[p, "pDENTplus"] = dentData["DENTplus"][p] / positions.loc[positions.position == dentData["position"][p]].DENT[0]
grouped = dentData.groupby("nflId").agg({
    'displayName': 'first',
    'team': 'first',
    'efficacy': 'mean',
    'physProcWSuccess': 'mean',
    'tacklesMade': 'count',
    'position': 'first',
    "DENT": 'mean',
    "DENTplus": 'mean',
    "pDENTplus": 'mean'
})
grouped.loc[grouped.tacklesMade >= 10].sort_values("pDENTplus", ascending = False).head()

# Top Individual Game pDENT+ performances

In [None]:
dentData["pDENTplus"] = dentData.DENT * 100 / dentData.DENT.mean()
for p in range(len(grouped.index)):
    dentData.at[p, "pDENTplus"] = dentData["DENTplus"][p] / positions.loc[positions.position == dentData["position"][p]].DENT[0]
grouped = dentData.groupby(["gameId", "nflId"]).agg({
    'displayName': 'first',
    'team': 'first',
    'efficacy': 'mean',
    'physProcWSuccess': 'mean',
    'tacklesMade': 'count',
    'position': 'first',
    "DENT": 'mean',
    "DENTplus": 'mean',
    "pDENTplus": 'mean'
})
grouped.loc[grouped.tacklesMade >= 10].sort_values("pDENTplus", ascending = False)[:20]

# Appendix

In [None]:
import numpy as np
import pandas as pd
import altair as alt
import matplotlib.pyplot as plt

In [None]:
players_df = pd.read_csv("/kaggle/input/nfl-big-data-bowl-2024/players.csv", encoding='unicode_escape')
games_df = pd.read_csv("/kaggle/input/nfl-big-data-bowl-2024/games.csv", encoding='unicode_escape')
plays_df = pd.read_csv("/kaggle/input/nfl-big-data-bowl-2024/plays.csv", encoding='unicode_escape')
tackles_df = pd.read_csv("/kaggle/input/nfl-big-data-bowl-2024/tackles.csv", encoding='unicode_escape')
process_df = pd.read_csv("/kaggle/input/processdata/processData.csv", encoding='unicode_escape')


tracking_df = pd.DataFrame()

for week in range(1, 10):
    file_path = f"/kaggle/input/nfl-big-data-bowl-2024/tracking_week_{week}.csv"

    week_data = pd.read_csv(file_path, encoding='unicode_escape')

    tracking_df = pd.concat([tracking_df, week_data], ignore_index=True)


In [None]:
common_columns = ['gameId', 'playId']

all_data = pd.merge(plays_df, tackles_df, on=common_columns, how='left')

all_data = pd.merge(all_data, games_df, on='gameId', how='left')

all_data = pd.merge(all_data, players_df, on='nflId', how='left')

all_data = pd.merge(all_data, process_df, on=['gameId', 'playId', 'nflId'], how='left')

# Physical

In [None]:
def physical(o_speed, d_speed, rel_angle):
    factor = d_speed * d_speed * o_speed
    if rel_angle <= 90:
        return factor*np.cos(np.radians(rel_angle))
    else:
        return 1.5*np.sqrt(factor)*np.cos(np.radians(rel_angle))

In [None]:
tackles = tackles_df.loc[tackles_df.tackle == 1].copy().reset_index(drop = 'True')
tackles.loc[:,'rel_angle'] = 0
tackles.loc[:,'O_speed'] = 0
tackles.loc[:,'D_speed'] = 0
tackles.loc[:,'physical'] = 0
for gameID in tackles.gameId.unique():
    playIds = tackles.loc[tackles.gameId == gameID].playId.unique()
    for playID in playIds:
        tackler = tackles_df.loc[tackles_df.gameId == gameID].loc[tackles_df.playId == playID].loc[tackles_df.tackle==True]['nflId'].values[0]
        ball_carrier = plays_df.loc[plays_df.gameId == gameID].loc[plays_df.playId == playID]['ballCarrierId'].values[0]
        temp_df = tracking_df.loc[tracking_df.gameId == gameID].loc[tracking_df.playId == playID]
        tackler_df = temp_df.loc[temp_df.nflId == tackler][['nflId','s','dir', 'event']].reset_index(drop=True)
        bc_df = temp_df.loc[temp_df.nflId == ball_carrier][['nflId','s','dir', 'event']].reset_index(drop=True)
        if len(tackler_df.loc[(tackler_df.event=='tackle')|(tackler_df.event=='out_of_bounds')]) == 0:
            continue
        else:
            if len(tackler_df.loc[(tackler_df.event=='tackle')]) != 0:
                tackle_frame = tackler_df.loc[tackler_df.event=='tackle'].index[0]
                x = 1
            elif len(tackler_df.loc[(tackler_df.event=='out_of_bounds')]) != 0:
                tackle_frame = tackler_df.loc[tackler_df.event=='out_of_bounds'].index[0]
                x = 0.5
            o_speed = bc_df.iloc[tackle_frame-4:tackle_frame+1]['s'].mean()
            d_speed = tackler_df.iloc[tackle_frame-4:tackle_frame+1]['s'].mean()
            count = 0
            for i in np.arange(tackle_frame-4,tackle_frame+1):
                temp = abs(bc_df['dir'][i] - tackler_df['dir'][i])
                count += min(temp, 360-temp)
            angle = count/5
            index = tackles.loc[tackles.gameId == gameID].loc[tackles.playId == playID].index
            tackles.loc[index,'rel_angle']= angle
            tackles.loc[index,'O_speed'] = o_speed
            tackles.loc[index,'D_speed']= d_speed
            tackles.loc[index,'physical'] =  x*physical(o_speed,d_speed,angle)
tackles

In [None]:
def missed_tackle_frame(bc_df, tackler_df):
    merged_df = pd.merge(bc_df, tackler_df, on = 'frameId', how = 'outer')
    merged_df['distance'] = np.sqrt((merged_df['x_x']-merged_df['x_y'])**2+(merged_df['y_x']-merged_df['y_y'])**2)
    if len(merged_df.query('distance<2')) == 0:
        #print(merged_df.distance.min())
        return True, 0
    else:
        frame = merged_df.query('distance<2').index[0]
    return False, frame

In [None]:
#the function calculates the physical part of the score
missed_tackles = tackles_df.loc[tackles_df.pff_missedTackle == 1].copy()
missed_tackles.loc[:,'rel_angle'] = 0
missed_tackles.loc[:,'O_speed'] = 0
missed_tackles.loc[:,'D_speed'] = 0
missed_tackles.loc[:,'physical'] = 0
for gameID in missed_tackles.gameId.unique():
    playIds = missed_tackles.loc[missed_tackles.gameId == gameID].playId.unique()
    for playID in playIds:
        tackler = tackles_df.loc[tackles_df.gameId == gameID].loc[tackles_df.playId == playID].loc[tackles_df.pff_missedTackle==True]['nflId'].values[0]
        ball_carrier = plays_df.loc[plays_df.gameId == gameID].loc[plays_df.playId == playID]['ballCarrierId'].values[0]
        temp_df = tracking_df.loc[tracking_df.gameId == gameID].loc[tracking_df.playId == playID]
        tackler_df = temp_df.loc[temp_df.nflId == tackler][['x','y','nflId','s','dir', 'event','frameId']].reset_index(drop=True)
        bc_df = temp_df.loc[temp_df.nflId == ball_carrier][['x','y','nflId','s','dir', 'event','frameId']].reset_index(drop=True)
        empty, missedtackle_frame = missed_tackle_frame(bc_df, tackler_df)
        if empty:
            continue
        else:
            o_speed = bc_df.iloc[missedtackle_frame-4:missedtackle_frame+1]['s'].mean()
            d_speed = tackler_df.iloc[missedtackle_frame-4:missedtackle_frame+1]['s'].mean()
            count1 = 0
            count2 = 0
            for i in np.arange(max(0,missedtackle_frame-4),missedtackle_frame+1):
                temp = abs(bc_df['dir'][i] - tackler_df['dir'][i])
                count1 += min(temp, 360-temp)
                count2 += 1
            angle = count1/count2
            index = missed_tackles.loc[missed_tackles.gameId == gameID].loc[missed_tackles.playId == playID].index
            missed_tackles.loc[index,'rel_angle']= angle
            missed_tackles.loc[index,'O_speed'] = o_speed
            missed_tackles.loc[index,'D_speed']= d_speed
            missed_tackles.loc[index,'physical'] = physical(o_speed,d_speed,angle)
missed_tackles

In [None]:
assists = tackles_df.loc[tackles_df.assist == 1].copy()
assists['physical'] = None
for gameID in assists.gameId.unique():
    playIds = assists.loc[assists.gameId == gameID].playId.unique()
    for playID in playIds:
        temp_df = assists.loc[assists.gameId==gameID].loc[assists.playId==playID]
        temp_phys = tackles.loc[tackles.gameId == gameID].loc[tackles.playId == playID].loc[tackles.tackle == 1]
        if len(temp_phys) != 0:
            phys = temp_phys['physical'].values[0]
        else:
            phys = 0
        index = assists.loc[assists.gameId == gameID].loc[assists.playId == playID].index
        for i in index:
            assists.at[i, 'physical']= phys/(1+len(temp_df))
assists

In [None]:
physical_data = pd.concat(
    [tackles[['gameId','playId', 'nflId', 'tackle', 'assist', 'pff_missedTackle', 'physical']].query('physical != 0'),
     missed_tackles[['gameId','playId', 'nflId', 'tackle', 'assist', 'pff_missedTackle', 'physical']].query('physical != 0'),
     assists[['gameId','playId', 'nflId', 'tackle', 'assist', 'pff_missedTackle', 'physical']].query('physical != 0')])
physical_data

# Process and Success

In [None]:
#all_data["YACwT"] = None
#all_data["YACwS"] = None
#all_data["trueYACON"] = None

In [None]:
matchesToRun = [] #all_data.gameId.unique()
for match in matchesToRun:
    curMatch = all_data.loc[all_data.gameId == match]
    for play in curMatch.playId.unique():
        curPlay = curMatch.loc[curMatch.playId == play]
        tackler = curPlay.nflId_y.unique()[0]

        startingLine = curPlay.absoluteYardlineNumber.unique()[0]
        result = curPlay.prePenaltyPlayResult.unique()[0]
        if curPlay.foulNFLId1.unique()[0] == tackler or curPlay.foulNFLId2.unique()[0] == tackler:
            result = curPlay.playResult.unique()[0]

        playDir = curPlay.playDirection.unique()[0]
        direct = 1
        if playDir == "left":
            direct = -1
        
        endSpot = startingLine + direct*result
        ballCarry = curPlay.ballCarrierId.unique()
        playTrack = tracking_df.loc[tracking_df.gameId == curMatch.gameId[curMatch.index[0]]].loc[tracking_df.playId == play]
        tackleMissed = 0
        if len(curPlay.loc[curPlay.nflId_y == tackler]) > 0:
            if curPlay.loc[curPlay.nflId_y == tackler].pff_missedTackle.unique()[0] == 1:
                tackleMissed = 1
        defense = playTrack.loc[playTrack.nflId == tackler][['frameId', 'x', 'y', 'event']]
        offense = playTrack.loc[playTrack.nflId == ballCarry[0]][['frameId', 'x', 'y', 'event']]
        finalFrame = defense.loc[defense.event == "tackle"].frameId.unique()
        if len(finalFrame) > 0:
            try:
                finalFrame = finalFrame[0]
                curFrame = 1
                curDist = 10
                for frame in range(1, finalFrame+1):
                    dX = defense.loc[defense.frameId == frame]['x']
                    dX = dX[dX.index[0]]
                    dY = defense.loc[defense.frameId == frame]['y']
                    dY = dY[dY.index[0]]
                    oX = offense.loc[offense.frameId == frame]['x']
                    oX = oX[oX.index[0]]
                    oY = offense.loc[offense.frameId == frame]['y']
                    oY = oY[oY.index[0]]
                    dist = np.sqrt((oX-dX)**2 + (oY-dY)**2)
                    closedDist = False
                    if dist < curDist and not closedDist:
                        curDist = dist
                        curFrame = frame
                        if dist < 1:
                            closedDist = True
                    if (dist > 2) and tackleMissed:
                        closedDist = False
                        curDist = dist
                starting = offense.loc[offense.frameId == curFrame]['x']
                ending = offense.loc[offense.frameId == finalFrame]['x']
                distDragged = direct * (ending - starting[starting.index[0]])
                distSpot = direct * (endSpot - starting[starting.index[0]])
                if np.abs(distDragged.values[0]) < np.abs(distSpot):
                    trueYac = distDragged.values[0]
                else:
                    trueYac = distSpot
                all_data.loc[(all_data['playId'] == int(play)) & (all_data['gameId'] == int(match)), 'YACwS'] = distSpot
                all_data.loc[(all_data['playId'] == int(play)) & (all_data['gameId'] == int(match)), 'YACwT'] = distDragged.values[0]
                all_data.loc[(all_data['playId'] == int(play)) & (all_data['gameId'] == int(match)), 'trueYACON'] = trueYac
            except:
                all_data.loc[(all_data['playId'] == int(play)) & (all_data['gameId'] == int(match)), 'YACwT'] = -100


In [None]:
#success
def success (wasSuccessful, physical, process, maxPhysical, maxProcess):
    if (wasSuccessful):
        tackleScore = physical + process
    else:
        tackleScore = .5 * (physical + process - maxPhysical - maxProcess)
    return tackleScore

In [None]:
process_df.rename(columns={"nflId_y": "nflId"}, inplace = True)
physical_data.loc[:, 'process'] = 0
for index, row in process_df.iterrows():
    gameId = row['gameId']
    playId = row['playId']
    process_value = row['trueYACON']

    physical_data.loc[(physical_data['gameId'] == gameId) & (physical_data['playId'] == playId), 'process'] = process_value
physical_data.loc[:, 'physProcWSuccess'] = 0
physical_data['successful'] = ~physical_data['pff_missedTackle'].astype(bool)
maxProcess = physical_data['process'].max()
maxPhysical = physical_data['physical'].max()
physical_data['physProcWSuccess'] = physical_data.apply(lambda row: success(row['successful'], row['physical'], row['process'], maxPhysical, maxProcess), axis=1)


# Efficacy Below

In [None]:
tackles_all_stats = all_data.loc[all_data.tackle == 1]

forcedFumbleArray = []
for ind in tackles_all_stats.index:
    gameTacklesDf = tackles_df[tackles_df['gameId'] == tackles_all_stats['gameId'][ind]]
    playTacklesDf = gameTacklesDf[gameTacklesDf['playId'] == tackles_all_stats['playId'][ind]]
    forcedFumbleTotal = playTacklesDf['forcedFumble'].sum()
    if forcedFumbleTotal != 0:
        forcedFumbleValue = True
    else:
        forcedFumbleValue = False
    forcedFumbleArray.append(forcedFumbleValue)
tackles_all_stats.loc[:,'forcedFumble'] = forcedFumbleArray


In [None]:
def efficacyCalculator(row):
    expectedPoints = row['expectedPoints']+7
    down = row['down']
    secondsIntoGame = (row['quarter']-1)*15*60
    seconds = int(row['gameClock'][-2]+row['gameClock'][-1])
    minutes = int(row['gameClock'].split(':')[0])

    if row['quarter'] == 5:
        secondsIntoGame = secondsIntoGame + ((10*60)-minutes*60-seconds)
    else:
        secondsIntoGame = secondsIntoGame + ((15*60)-minutes*60-seconds)

    winProb = row['preSnapHomeTeamWinProbability']
    forcedFumble = row['forcedFumble']
    playResult = row['playResult']
    yardsToGo = row['yardsToGo']

    multiplier = expectedPoints*down*(1+secondsIntoGame/840)

    multiplier *= (1 - 2*np.abs(.5-winProb))
    if forcedFumble:
        multiplier = multiplier + 7

    scaledDistance = (playResult/yardsToGo)-1
    if scaledDistance < 0:
        multiplier = multiplier * (-(np.cbrt(0.5*scaledDistance+0.5))+1.79)
    else:
        multiplier = multiplier * (-((0.005*scaledDistance)**(1/3))+0.5)

    multiplier = np.log(multiplier+8)
    if multiplier <= 0:
        multiplier = 0
    return multiplier

efficacyArray = []
for i in range(len(tackles_all_stats.index)):
    efficacyArray.append(efficacyCalculator(tackles_all_stats.iloc[i]))
tackles_all_stats.loc[:,'efficacy'] = efficacyArray
tackles_all_stats.sort_values("efficacy", ascending = False)

# Efficacy and Efficiency Combined

In [None]:
dentData = pd.merge(physical_data, tackles_all_stats, on=['gameId', 'playId', 'nflId'], how='left')

In [None]:
dentData.loc[dentData.tackle_y == 1].sort_values("physProcWSuccess", ascending = False)

In [None]:
plt.scatter(dentData.efficacy, dentData.physProcWSuccess)
plt.xlim(2)
plt.ylim(0)
plt.xlabel("Leverage and Result")
plt.ylabel("Efficiency")
plt.title("Efficiency vs Leverage/Result")
plt.show()

# Original DENT (Defensive Efficacy in Normalized Tackles)

In [None]:
dentData["DENT"] = dentData.efficacy * dentData.physProcWSuccess
dentData.rename(columns={'defensiveTeam': 'team', "playId": 'tacklesMade'}, inplace=True)
grouped = dentData.groupby("nflId").agg({
    'displayName': 'first',
    'team': 'first',
    'efficacy': 'mean',
    'physProcWSuccess': 'mean',
    'tacklesMade': 'count',
    'position': 'first',
    "DENT": 'mean'
})
grouped.loc[grouped.tacklesMade >= 10].sort_values("DENT", ascending = False)[:20]

In [None]:
dentData["DENTplus"] = dentData.DENT * 100 / dentData.DENT.mean()
dentData.rename(columns={'defensiveTeam': 'team', "playId": 'tacklesMade'}, inplace=True)
grouped = dentData.groupby("nflId").agg({
    'displayName': 'first',
    'team': 'first',
    'efficacy': 'mean',
    'physProcWSuccess': 'mean',
    'tacklesMade': 'count',
    'position': 'first',
    "DENT": 'mean',
    "DENTplus": 'mean'
})
grouped.loc[grouped.tacklesMade >= 10].sort_values("DENTplus", ascending = False)[:20]

# DENT by positional category

In [None]:
positions = dentData.groupby("position").agg({
    'efficacy': 'mean',
    'physProcWSuccess': 'mean',
    'tacklesMade': 'count',
    'position': 'first',
    "DENT": 'mean',
    "DENTplus": 'mean'
})
positions.loc[positions.tacklesMade >= 10].sort_values("DENTplus", ascending = False)[:20]

# Positionally adjusted DENTplus (pDENTplus)

In [None]:
dentData["pDENTplus"] = dentData.DENT * 100 / dentData.DENT.mean()
for p in range(len(grouped.index)):
    dentData.at[p, "pDENTplus"] = dentData["DENTplus"][p] / positions.loc[positions.position == dentData["position"][p]].DENT[0]
grouped = dentData.groupby("nflId").agg({
    'displayName': 'first',
    'team': 'first',
    'efficacy': 'mean',
    'physProcWSuccess': 'mean',
    'tacklesMade': 'count',
    'position': 'first',
    "DENT": 'mean',
    "DENTplus": 'mean',
    "pDENTplus": 'mean'
})
grouped.loc[grouped.tacklesMade >= 10].sort_values("pDENTplus", ascending = False).head()

# Top Individual Game pDENT+ performances

In [None]:
dentData["pDENTplus"] = dentData.DENT * 100 / dentData.DENT.mean()
for p in range(len(grouped.index)):
    dentData.at[p, "pDENTplus"] = dentData["DENTplus"][p] / positions.loc[positions.position == dentData["position"][p]].DENT[0]
grouped = dentData.groupby(["gameId", "nflId"]).agg({
    'displayName': 'first',
    'team': 'first',
    'efficacy': 'mean',
    'physProcWSuccess': 'mean',
    'tacklesMade': 'count',
    'position': 'first',
    "DENT": 'mean',
    "DENTplus": 'mean',
    "pDENTplus": 'mean'
})
grouped.loc[grouped.tacklesMade >= 10].sort_values("pDENTplus", ascending = False)[:20]

# High Volume Tacklers

In [None]:
grouped.loc[grouped.tacklesMade >= 40].sort_values("pDENTplus", ascending = False)[:20]

Citation: @misc{nfl-big-data-bowl-2024,
    author = {Michael Lopez, Thompson Bliss, Ally Blake, Andrew Patton, Jonathan McWilliams, Addison Howard, Will Cukierski},
    title = {NFL Big Data Bowl 2024},
    publisher = {Kaggle},
    year = {2023},
    url = {https://kaggle.com/competitions/nfl-big-data-bowl-2024}
}