## I. Introduction
Once the ball is in the air, all bets are off. Receivers and defenders must close space, adjust their paths, and get themselves into position before the ball arrives. For quarterbacks and play callers, understanding which players can reliably reach the catch point offers valuable insight into how much of the field they can influence on a given play.

We define this ability through a metric called **Presence**.

Presence measures how likely a player is to reach the landing point of the ball based on their movement at each frame and their. Instead of focusing on whether the pass is ultimately caught or defended, Presence isolates the ability to arrive on time and *be in position* to make a play, regardless of the final outcome.


## II. Motivation
We focus this project on contested catches because whether or not you make a play is to some extent independent of whether or not you made it there. A defensive back could have their hand on the ball and the receiver could still come down with it. While the ball is in the air, what matters most is *getting there*, and then catching or defending it is the next step, but outside of the scope of this project. Understanding the space on the field that a player can cover fundamentally changes the approach that offenses can take, and allows quarterbacks to know exactly where to put the ball in order to both maximize his receiver's chances, but also minimizing the chance of a PBU or interception. 

If we know that a receiver can cover more space horizontally than the DB lined up on him, then in or out breaking routes become much more effective. If he has more range vertically, then we may want to consider deeper go or post routes. If a safety can't crash to the spot on the field we want to target in time, then we can feel much more confident that our play design will work. This approach to understanding player presence will allow us to guide QB decision making and offensive play calling.

## III. Defining Presence

Given a combination of factors, you could put a hundred different players in the same situation where they need to get to the ball. Based on how difficult of a play it is, only some players may be able to reach the ball and be in position to make a play.

The way we define success is being within a yard and a half of the landing spot of the ball at the end of the play. A frame is considered a "success frame" if the final frame is within that yard and a half, and a "failure frame" otherwise. 

For instance, on this play, Tracy Walker III got a hand on the ball despite Christian Watson catching it. We would still consider this a successful rep, because Watson had to make a spectacular play to come down with the ball. Cameron Sutton, however, did not get to the ball on this rep, so would be considered as not reaching the spot.

![](https://github.com/MPuram12/BigDataBowl2025/blob/main/Videos/Watson%20Video.gif?raw=true)


## IV. Modeling

To predict whether or not a player will reach their spot in time is a complicated questions that we try to break down as simply as possible. We are essentially breaking likelihood down into the amount of movement required by the player given their current context (speed, direction, acceleration) in the amount of time remaining until the ball lands. Most importantly, we are considering player idendity in the training process in order to learn the "effect" that a player has on their likelihoods. 

Features:
- Time to ball landing (seconds).
- Distance to ball (yards).
- Velocity (yards/second).
- Angle between current path and optimal path to ball.
- Acceleration (yards/s^2).
- Player Identity.

Target: 
- Whether or not player was within 1.5 yards of the ball landing spot in the final frame.

We wanted to prioritize using a model that effectively values player identity, and after trying a few different gradient boosting models we landed on H2O's deeplearning function, which is what we used for this. It excels with using a categorical feature like player ID, and so we created a 4-layer network that reads in spatiotemporal and kinematic information for each player, and outputs a prediction for whether or not they will reach the ball by the end of the play.

Using this model, we can create predictions both with and without player ID, and the prediction without player ID is considered the "average" likelihood for that scenario. For instance, on that Christian Watson play above, he had a 96.49% chance of reaching the ball, while the average likelihood would have been 98.83%, meaning just about any player could have reached the ball there if in his same scenario at the time of throw.

We trained our model on 80% of all output frames included in the data. Player ID was by a large margin the biggest explanatory variable for the variance in outcomes. After aggregating across all player indicator variables (one for each player), ID accounts for 97.2% of total model importance, followed by distance (1.03%), time (.75%), and angle/speed/acceleration (all < .5%). This indicates that who the player is matters far more to the model than any measured spatiotemporal or kinematic variable, which is exactly the goal of this project. Our model learns to rely on these player IDs for its predictions. Of course, on a frame-by-frame basis, player ID ranks as fifth (out of six) in importance, ahead of only acceleration, and each player ID matters a bit more or a bit less as the model learns to weigh them differently. 

We overall found our model to be incredibly accurate, correctly predicting whether or not a player would reach the ball by the end of the play at a 97.59% rate. Our AUC (Area Under the ROC Curve) is .9971, meaning our model can almost perfectly separate positive frames from negative frames across all thresholds. 

![alt text](https://github.com/MPuram12/BigDataBowl2025/blob/main/Images/model%20stats.png?raw=true)

## V. Results

### Leaderboards
Using this new "effect", which is the difference in prediction with and without player IDs, we can identify the top performing players on a frame-by-frame basis. These players are consistently adding significant value to their reach probabilities, and their name carries weight within our model. 

Starting with our top 10 Cornerbacks, Wide Receivers, Safeties, and Tight Ends:

![alt text](https://github.com/MPuram12/BigDataBowl2025/blob/main/Images/top10.png?raw=true)


And our bottom 10:

![alt text](https://github.com/MPuram12/BigDataBowl2025/blob/main/Images/bottom10.png?raw=true)

### Player Analysis (min 500 frames)

#### Starling Thomas V (+5.59%)
- Thomas V stands out among cornerbacks with a presence score of 0.056. This does, however, go to show that presence doesn't make you an exceptional player. Thomas had a pretty tough rookie season, coming in as a UDFA and getting burned for 5 touchdowns, 37 receptions, and a 131.0 QBR on 49 targets in 12 games (7 starts). He improved in these categories in 2024, allowing 34 completions on 58 targets, 3 TDs and a 100.6 rating, but allowed more yards per completion in 17 games (15 starts). His ability to stay in the play allowed him to improve in his sophomore season, and unfortunately we couldn't see what he had in store for 2025 as he tore his ACL in training camp.


#### Brandon Aiyuk (+6.6%)
- Aiyuk is the highest among all players in presence, with a score of 0.066. In 2023, Aiyuk had a career year going for 1,342 yards, 75 catches and 7 TDs on 105 targets. He only ran a 4.5 at pro day, but is considered an elite route runner. He excels at getting to his spot on time. Since then, he has also battled a season ending ACL injury which shortened his 2024 campaign and he hasn't played a snap yet in 2025.


#### Gabe Davis (-14.52%)
- Davis ranks as the player with the lowest presence among qualified players. He has a career 55% completion rate when targeted, and he caught just 45 of 81 targets with only 3 drops and a 84.7 passer rating when targeted. Davis was frequently highlighted in the 2020 draft as having a limited route tree, and lacked the ability to explode into routes. These ultimately are what limit his presence on the field, making him the lowest graded player on this metric.


### Play Analysis

Not only can we understand what players take up more space, we can apply this to live plays. Take the earlier Christian Watson catch, for example. When thrown, he had a beat on both of his defenders, and there was a small window that Love could throw into that would allow *only* Watson to have a chance at the ball. Instead, Love wasn't able to get the ball far enough and Watson had to make a spectacular grab to come down with the ball. The spot Love threw to was one of very few locations that gave *both* defenders a chance to make a play on the ball. 


![alt text](https://github.com/MPuram12/BigDataBowl2025/blob/main/Videos/Watson%20Animation.gif?raw=true)






On the flip side, we will look at a completion in which the receiver had the biggest "effect" on the prediction: Jayden Reed in Week 10 versus the Pittsburgh Steelers. Reed was running a Crosser, and Love threw the ball right around when he started making his break, meaning none of the involved players were running towards the landing zone yet.

Link to the play if you're curious: https://youtu.be/kZah4221wfE?si=uDOCMoBTLvfOodmt&t=702  

<div style="display: flex; align-items: center;">
  <img src="https://github.com/MPuram12/BigDataBowl2025/blob/main/Videos/Reed%20Animation.gif?raw=true" style="height: 350px; margin-right: 20px;">
  <img src="https://github.com/MPuram12/BigDataBowl2025/blob/main/Images/reed%20fig.png?raw=true" style="height: 350px;">
</div>

At the time of the throw, Reed had a 90.34% reach probability. Had we not known who the receiver was, our model would have predicted a 20.68% chance, which is about a 70% difference. At the time of the throw only Reed's reachable zone contains the landing spot for the football, which is exactly what QBs should be striving for. Trenton Thompson has the next best chance with a 22.62% reach likelihood, but ultimately never really has a chance to reach the ball here. With only a second left, Reed's zone is by far the only one containing the landing zone, giving him a nearly guaranteed chance at an uncontested catch. He goes on to complete this catch and run for an additional 11 yards before being pushed out of bounds.


## VI. Wrap-Up

### Limitations
- Spotty data; some catches have receivers listed as >3 yards away from the landing spot.
- No ball tracking, so take the animations with a grain of salt.
- It's not entirely clear when the ball is caught on a given play, so take the animations with a grain of salt.
- There's not much pre-existing info about whether the pass was “contested” by the defender, so we had to use our own (slightly arbitrary) definition.


### Conclusion
Presence provides a way to evaluate pass plays through the lens of movement and positioning rather than outcomes alone. By estimating whether players can realistically reach the landing point of the ball, the metric highlights differences in range and spatial control that are not always reflected in traditional statistics.

Across individual plays and controlled comparisons, Presence aligns with known player profiles. Players with stronger movement abilities tend to exhibit larger reachable areas and higher reach probabilities, while those with weaker movement abilities show a more constrained spatial influence. The ability to get to the ball is the most important part of the forward pass, and this project provides a framework for identifying exceptional players and identifying throwing windows. 

### Looking Ahead
We wish we had time to do it for this competition, but we would have loved to make this interactive. Being able to draw out the route trees and see where your available zones are going to be is what we see as the potential for this project, and could allow play-callers to get a bit more aggressive when targeting defenders with low presence scores.


## VII. Appendix
- All of our code can be found [here](https://github.com/MPuram12/BigDataBowl2025)
- Final word count: 1995
- Final figure count: 7

*We would like to give a special thanks to Dr. Scott Powers (Rice University) for his guidance and support throughout this process!*
