## I. Introduction
Once the ball is in the air, all bets are off. Receivers and defenders must close space, adjust their paths, and get themselves into position before the ball arrives. For quarterbacks and play callers, understanding which players can reliably reach the catch point offers valuable insight into how much of the field they can influence on a given play.

We define this ability through a metric called **Presence**.

Presence measures how likely a player is to reach the landing point of the ball based on their movement at each frame and their. Instead of focusing on whether the pass is ultimately caught or defended, Presence isolates the ability to arrive on time and *be in position* to make a play, regardless of the final outcome.


## II. Motivation
We focus this project on contested catches because whether or not you make a play is to some extent independent of whether or not you made it there. A defensive back could have their hand on the ball and the receiver could still come down with it. While the ball is in the air, what matters most is *getting there*, and then catching or defending it is the next step, but outside of the scope of this project. Understanding the space on the field that a player can cover fundamentally changes the approach that offenses can take, and allows quarterbacks to know exactly where to put the ball in order to both maximize his receiver's chances, but also minimizing the chance of a PBU or interception. 

If we know that a receiver can cover more space horizontally than the DB lined up on him, then in or out breaking routes become much more effective. If he has more range vertically, then we may want to consider deeper go or post routes. If a safety can't crash to the spot on the field we want to target in time, then we can feel much more confident that our play design will work. This approach to understanding player presence will allow us to guide QB decision making and offensive play calling.

## III. Defining Presence

Given a combination of factors, you could put a hundred different players in the same situation where they need to get to the ball. Based on how difficult of a play it is, only some players may be able to reach the ball and be in position to make a play.

The way we define success is being within a yard and a half of the landing spot of the ball at the end of the play. A frame is considered a "success frame" if the final frame is within that yard and a half, and a "failure frame" otherwise. 

For instance, on this play, Tracy Walker III got a hand on the ball despite Christian Watson catching it. We would still consider this a successful rep, because Watson had to make a spectacular play to come down with the ball. Cameronn Sutton, however, did not get to the ball on this rep, so would be considered as not reaching the spot.

What we will show later, is that this was a poor throw by Jordan Love. His guy had a beat on both defenders, and the underthrow here made this both a more difficult play for Watson, but also gave both defenders a chance to defend this pass.

![](https://github.com/MPuram12/BigDataBowl2025/blob/main/Watson%20Catch.gif?raw=true)


## IV. Modeling

To predict whether or not a player will reach their spot in time is a complicated questions that we try to break down as simply as possible. We are essentially breaking likelihood down into the amount of distance and the change in direction that a player must cover based on how quickly they are moving or accelerating and how much time they have left until the ball lands. Additionally, we are considering player idendity in the training process in order to learn the "effect" that a player has on their likelihoods. 

Features:
- Time to ball landing (seconds).
- Distance to ball (yards).
- Velocity (yards/second).
- Angle between current path and optimal path to ball.
- Acceleration (yards/s^2).
- Player Identity.

Target: 
- Whether or not player was within 1.5 yards of the ball landing spot in the final frame.


Using this model, we can create predictions both with and without player ID, and the prediction without player ID is considered "average" likelihood for that scenario. For instance, on that Christian Watson play above, 

## V. Results

### Leaderboards
Using this new "effect", which is the difference in prediction with and without player IDs, we can identify the top performing players on a frame-by-frame basis. These players are consistently adding significant value to their reach probabilities, and their name carries weight in our model. 

Starting with our top 10 Cornerbacks, Wide Receivers, Safeties, and Tight Ends:

![alt text](Images/top10.png)


And our bottom 10:

![alt text](Images/bottom10.png)

### Player Analysis


<video controls src="Videos/Chase Animation.mp4" title="Title"></video>

Effect;    prob(reach | pID) - prob(reach | no pID)

Week 5 2023, Bengals @ Cardinals, 13:47 remaining in 3rd Quarter

At the moment of the throw:

prob(reach | J Chase) = .951

prob(reach | Avg Player) = .264

Effect = .687

![](https://github.com/MPuram12/BigDataBowl2025/blob/main/chase%20play%20video.gif?raw=true)


In this play, our model immediately recognizes Jamarr Chase as having an exceptionally above average likelihood of making this play (95% as opposed to 26% if not considering pID). It also recognizes that none of the defenders have a chance at making this play as soon as the ball is thrown. 

![](https://github.com/MPuram12/BigDataBowl2025/blob/main/chase%20play.gif?raw=true)

We can further visualize this effect by looking at an elite player such as chase, compared to a player with one of the lower effects, Wandale Robinson. 

Chase's reachable area is deeper, wider, and overall much better in this setting (2 seconds of air time, moving 15mph). 

![](https://github.com/MPuram12/BigDataBowl2025/blob/main/heatmaps.png?raw=true)


![alt text](<maps3.png>)

Additionally, we can look at top players by their total effect. Because total effect isn't the most intuitive stat, we are instead showing players' average effects (but the top 10 is by total effect, which can be interpreted as the 10 players with the most total value added across all frames). Below are the top ten cornerbacks, wide receivers, and safeties.

![](https://github.com/MPuram12/BigDataBowl2025/blob/main/leaderboards.png?raw=true)

![alt text](CB_table.png)![alt text](WR_table.png)![alt text](S_table.png)

![alt text](CB_table_bottom.png)![alt text](WR_table_bottom.png)![alt text](S_table_bottom.png)

## VI. Limitations
- Spotty data; some catches have player >3 yards away from landing spot.
- Doesn’t perform well on plays like the one to the right, especially when player makes sudden last-second movement such as diving, especially back towards the ball.
- No ball tracking
- Unclear when ball is caught on a given play
- Not much pre-existing info about whether pass was “contested” by defender, so had to use our own definition.


For instance, Reynolds was listed as >3 yards away from ball on this catch. Our model considered him as not reaching the ball on this play, and said his probability to reach was 0. 


![](https://github.com/MPuram12/BigDataBowl2025/blob/main/reynolds%20catch.png?raw=true)

## VII. Next Steps
- Animate player areas
- Create Kaggle Report
- More visualization work
- Model improvements (if possible)
