Effective Separation Index (ESI): A Context-Aware Receiver Metric
NFL Big Data Bowl 2026 ‚Äì University Track Submission

Author: Qudsiya Siddique
Date: November 2025

üìò 1. Introduction

This notebook demonstrates the end-to-end workflow for developing the Effective Separation Index (ESI) ‚Äî a normalized measure of receiver effectiveness in creating space after a pass is thrown.
The metric is computed using NFL Next Gen Stats tracking data from the Big Data Bowl 2026.
It adjusts raw receiver‚Äìdefender separation for both route difficulty and defensive coverage tightness.

In [None]:
# Core libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import glob, os


 üß© Data Loading (Weeks 1‚Äì2)
The weekly tracking files (`input_2023_w*.csv`, `output_2023_w*.csv`) are stored in the `/train` directory,  
while the `supplementary.csv` file is located in the parent folder.  
This cell loads only **Weeks 1 and 2** to ensure a fast, reproducible runtime suitable for public Kaggle environments.

In [None]:
train_path = "/kaggle/input/nfl-big-data-bowl-2026-analytics/114239_nfl_competition_files_published_analytics_final/train"
root_path = "/kaggle/input/nfl-big-data-bowl-2026-analytics/114239_nfl_competition_files_published_analytics_final"

# --- Load only Week 1 and Week 2 input/output files ---
input_files = [
    os.path.join(train_path, "input_2023_w01.csv"),
    os.path.join(train_path, "input_2023_w02.csv")
]

output_files = [
    os.path.join(train_path, "output_2023_w01.csv"),
    os.path.join(train_path, "output_2023_w02.csv")
]



In [None]:
# --- Load supplementary from parent folder ---
supplementary_path = os.path.join(root_path, "supplementary_data.csv")

# --- Load dataframes ---
input_df = pd.concat([pd.read_csv(f, low_memory=False) for f in input_files], ignore_index=True)
output_df = pd.concat([pd.read_csv(f, low_memory=False) for f in output_files], ignore_index=True)
supp_df = pd.read_csv(supplementary_path, low_memory=False)


In [None]:
input_df.head()

In [None]:
supp_df.head()

In [None]:
output_df.head()

In [None]:
output_df["game_id"] = output_df["game_id"].astype(str).str.replace(r"\.0$", "", regex=True)
output_df["play_id"] = output_df["play_id"].astype(str).str.replace(r"\.0$", "", regex=True) 

In [None]:
supp_df["game_id"] = supp_df["game_id"].astype(str).str.replace(r"\.0$", "", regex=True)
supp_df["play_id"] = supp_df["play_id"].astype(str).str.replace(r"\.0$", "", regex=True)
print("Output sample IDs:", output_df[["game_id", "play_id"]].head(2).values)
print("Supp sample IDs:", supp_df[["game_id", "play_id"]].head(2).values)

üîó Data Merging and Player Role Filtering
This step combines the **output tracking data** with the **supplementary play context** using  
`game_id` and `play_id` as merge keys.  

Then, player metadata (names, positions, and roles) is merged from the input data.  
Finally, we filter to keep only two roles relevant to the pass-in-air event:
- **Targeted Receiver**
- **Defensive Coverage**

In [None]:
merge_keys = ["game_id", "play_id"]
merged_df = output_df.merge(supp_df, on=merge_keys, how="left")
print("Merged shape:", merged_df.shape)

In [None]:
input_df.columns

In [None]:
merged_df.columns

In [None]:
player_info_cols = [
    "game_id", "play_id", "nfl_id",
    "player_name", "player_side", "player_position", "player_role"
]

player_info = input_df[player_info_cols].drop_duplicates()
for df in [player_info, merged_df]:
    df["game_id"] = df["game_id"].astype(str).str.replace(r"\.0$", "", regex=True)
    df["play_id"] = df["play_id"].astype(str).str.replace(r"\.0$", "", regex=True)


In [None]:
merged_df = merged_df.merge(
    player_info,
    on=["game_id", "play_id", "nfl_id"],
    how="left"
)
print("New merged_df shape:", merged_df.shape)
print("Sample columns:", [col for col in merged_df.columns if 'player' in col or 'role' in col])


In [None]:
roles_of_interest = ["Targeted Receiver", "Defensive Coverage"]
filtered_df = merged_df[merged_df["player_role"].isin(roles_of_interest)].copy()
print(f"Filtered data shape: {filtered_df.shape}")
print("Unique roles:", filtered_df['player_role'].dropna().unique())

### üìê Receiver‚ÄìDefender Separation Calculation
This step computes the **Euclidean distance** between each targeted receiver and all defensive players  
within the same play and frame.  
For each frame, we retain only the *minimum distance*, representing how closely the defender is covering the receiver.

The formula used is:

\[
\text{distance} = \sqrt{(x_r - x_d)^2 + (y_r - y_d)^2}
\]

where \( (x_r, y_r) \) are receiver coordinates and \( (x_d, y_d) \) are defender coordinates.


In [None]:
receivers_df = filtered_df[filtered_df["player_role"] == "Targeted Receiver"].copy()
defenders_df = filtered_df[filtered_df["player_role"] == "Defensive Coverage"].copy()

print("Receivers:", receivers_df.shape, " | Defenders:", defenders_df.shape)

In [None]:
pairs_df = receivers_df.merge(
    defenders_df,
    on=["game_id", "play_id", "frame_id"],
    suffixes=("_rec", "_def")
)

pairs_df["distance"] = np.sqrt(
    (pairs_df["x_rec"] - pairs_df["x_def"])**2 +
    (pairs_df["y_rec"] - pairs_df["y_def"])**2
)

In [None]:
separation_df = (
    pairs_df.groupby(["game_id", "play_id", "frame_id"], as_index=False)["distance"]
    .min()
    .rename(columns={"distance": "min_separation"})
)

receivers_df = receivers_df.merge(
    separation_df,
    on=["game_id", "play_id", "frame_id"],
    how="left"
)

In [None]:
print("Receiver dataset shape:", receivers_df.shape)
print(receivers_df[["game_id", "play_id", "frame_id", "player_name", "min_separation"]].head())

The resulting dataset now includes a new column, **`min_separation`**,  
which quantifies the smallest distance (in yards) between the targeted receiver and the nearest defender for each frame.  
This metric forms the foundation for evaluating how ‚Äúopen‚Äù a receiver is throughout the play.


### üßπ Data Cleaning for Exploratory Analysis
Before analyzing separation patterns, we remove any rows missing key contextual information:
- **`min_separation`** ‚Äî the computed distance between receiver and nearest defender  
- **`route_of_targeted_receiver`** ‚Äî the type of route run  
- **`team_coverage_man_zone`** ‚Äî whether the defense used man or zone coverage  
- **`pass_result`** ‚Äî the outcome of the play (Complete, Incomplete, etc.)  

This ensures that all visualizations are based on complete, interpretable records.


In [None]:
receivers = receivers_df.copy()

receivers_clean = receivers.dropna(
    subset=["min_separation", "route_of_targeted_receiver", "team_coverage_man_zone", "pass_result"]
)
receivers_clean.shape

In [None]:
route_sep = (
    receivers_clean.groupby("route_of_targeted_receiver")["min_separation"]
    .mean()
    .sort_values(ascending=False)
    .head(15)
)


### üìä Exploratory Data Analysis (EDA): Receiver Separation Patterns
In this section, we visualize how the **receiver‚Äìdefender separation** varies across three important contextual dimensions:
1. **Route Type** ‚Äì Different passing routes create different amounts of space.
2. **Coverage Scheme** ‚Äì Man vs Zone coverage affects defensive positioning.
3. **Pass Result** ‚Äì Whether greater separation translates into completed passes.

Each plot highlights the role of movement, scheme, and execution in shaping passing outcomes.


In [None]:
plt.figure(figsize=(10,5))
sns.barplot(x=route_sep.values, y=route_sep.index, palette="viridis")
plt.title("Average Receiver Separation by Route Type", fontsize=14)
plt.xlabel("Average Minimum Separation (yards)")
plt.ylabel("Route Type")
plt.tight_layout()
plt.show()

In [None]:
plt.figure(figsize=(6,4))
sns.boxplot(
    data=receivers_clean,
    x="team_coverage_man_zone",
    y="min_separation",
    palette="coolwarm"
)
plt.title("Separation Distribution by Coverage Type", fontsize=13)
plt.xlabel("Coverage Type")
plt.ylabel("Min Separation (yards)")
plt.tight_layout()
plt.show()

In [None]:
plt.figure(figsize=(6,4))
sns.boxplot(
    data=receivers_clean,
    x="pass_result",
    y="min_separation",
    order=["C", "I"],
    palette="Set2"
)
plt.title("Receiver Separation vs Pass Result", fontsize=13)
plt.xlabel("Pass Result (C = Complete, I = Incomplete)")
plt.ylabel("Min Separation (yards)")
plt.tight_layout()
plt.show()

In [None]:
summary = (
    receivers_clean.groupby(["team_coverage_man_zone", "pass_result"])["min_separation"]
    .agg(["mean", "std", "count"])
    .reset_index()
    .sort_values("mean", ascending=False)
)

display(summary.head(10))

#### üß© Observations
- **Short routes** such as *Screen* and *Flat* generate the largest separation, consistent with quick-release plays.
- **Zone coverage** produces greater average spacing but with higher variability.
- **Completed passes** generally occur when receivers achieve **1‚Äì2 yards more separation** than in incomplete or intercepted plays.

These trends validate the spatial intuition behind receiver performance and lay the foundation for creating the **Effective Separation Index (ESI)** metric in the next step.


### üßÆ Defining the Effective Separation Index (ESI)
While raw separation reflects physical distance between receiver and defender,  
it doesn‚Äôt account for **play difficulty** ‚Äî some routes are inherently easier to separate on,  
and some coverages are naturally tighter.

To make the metric context-aware, we normalize the separation using two factors:

| Factor | Description | Typical Range |
|:-------|:-------------|:---------------|
| **Route Difficulty (RD)** | Adjusts for how hard the route is to create separation. Deep routes (e.g., *Go*, *Post*) are penalized slightly more. | 0.8 ‚Äì 1.3 |
| **Coverage Tightness (CT)** | Adjusts for how restrictive the defense is. Man coverage is tighter than zone. | 1.0 ‚Äì 1.2 |

The **Effective Separation Index (ESI)** is defined as:
\[
\text{ESI} = \frac{\text{min\_separation}}{\text{Route Difficulty} \times \text{Coverage Tightness}}
\]

Higher ESI ‚Üí greater effectiveness at creating space *relative to context*.


In [None]:
route_difficulty_map = {
    "SCREEN": 0.8, "FLAT": 0.8, "ANGLE": 1.0, "WHEEL": 1.0,
    "CROSS": 1.0, "OUT": 1.0, "HITCH": 0.9, "CORNER": 1.2,
    "POST": 1.2, "SLANT": 1.1, "IN": 1.1, "GO": 1.3
}

coverage_map = {
    "MAN_COVERAGE": 1.2,
    "ZONE_COVERAGE": 1.0
}

In [None]:

receivers_esi = receivers_df.copy()

receivers_esi["route_difficulty"] = receivers_esi["route_of_targeted_receiver"].map(route_difficulty_map).fillna(1.0)
receivers_esi["coverage_tightness"] = receivers_esi["team_coverage_man_zone"].map(coverage_map).fillna(1.0)


In [None]:
receivers_esi["ESI"] = receivers_esi["min_separation"] / (receivers_esi["route_difficulty"] * receivers_esi["coverage_tightness"])
print(receivers_esi[["route_of_targeted_receiver", "team_coverage_man_zone", "min_separation", "ESI"]].head())
esi_summary = (
    receivers_esi.groupby("route_of_targeted_receiver")["ESI"]
    .mean()
    .sort_values(ascending=False)
)
display(esi_summary.head(10))

Each receiver frame now has an **ESI value**, representing how well the player created separation  
given the route‚Äôs inherent difficulty and defensive coverage type.

This normalization allows us to compare performances across different routes and coverages fairly,  
providing a more meaningful metric of ‚Äúgetting open.‚Äù


### üìä Analyzing the Effective Separation Index (ESI)
Now that the **Effective Separation Index (ESI)** has been computed,  
we can analyze how this context-aware metric varies across:
1. **Route Type** ‚Äì Which routes are most efficient when normalized for difficulty?
2. **Player** ‚Äì Which receivers consistently generate effective separation?
3. **Pass Result** ‚Äì How strongly does ESI correlate with successful completions?

These analyses demonstrate the interpretability and practical utility of ESI  
as a unified measure of receiver effectiveness.


In [None]:
plt.figure(figsize=(6,4))
sns.boxplot(data=receivers_esi, x="team_coverage_man_zone", y="ESI", palette="mako")
plt.title("Effective Separation Index by Coverage Type", fontsize=13)
plt.xlabel("Coverage Type")
plt.ylabel("ESI (Normalized Separation)")
plt.tight_layout()
plt.show()

In [None]:
player_esi = receivers_esi.groupby("player_name")["ESI"].mean().sort_values(ascending=False).head(10)
display(player_esi)


In [None]:
esi_df = receivers_esi.dropna(subset=["ESI", "pass_result"])
top_players = (
    esi_df.groupby("player_name")["ESI"]
    .mean()
    .sort_values(ascending=False)
    .head(10)
)

In [None]:
plt.figure(figsize=(8,5))
sns.barplot(x=top_players.values, y=top_players.index, palette="crest")
plt.title("Top 10 Players by Effective Separation Index (ESI)", fontsize=14)
plt.xlabel("Average ESI (Normalized Separation)")
plt.ylabel("Player Name")
plt.tight_layout()
plt.show()

In [None]:
top_routes = (
    esi_df.groupby("route_of_targeted_receiver")["ESI"]
    .mean()
    .sort_values(ascending=False)
    .head(10)
)

plt.figure(figsize=(8,5))
sns.barplot(x=top_routes.values, y=top_routes.index, palette="viridis")
plt.title("Top 10 Routes by Effective Separation Index (ESI)", fontsize=14)
plt.xlabel("Average ESI")
plt.ylabel("Route Type")
plt.tight_layout()
plt.show()

In [None]:
esi_df["is_complete"] = esi_df["pass_result"].apply(lambda x: 1 if x == "C" else 0)
corr = esi_df[["ESI", "is_complete"]].corr().iloc[0,1]

plt.figure(figsize=(6,4))
sns.boxplot(data=esi_df, x="pass_result", y="ESI", order=["C","I"], palette="coolwarm")
plt.title(f"ESI Distribution by Pass Result (corr = {corr:.2f})", fontsize=13)
plt.xlabel("Pass Result (C = Complete, I = Incomplete)")
plt.ylabel("Effective Separation Index (ESI)")
plt.tight_layout()
plt.show()

In [None]:
summary = (
    esi_df.groupby("pass_result")["ESI"]
    .agg(["mean","std","count"])
    .reset_index()
)
display(summary)

#### üß© Key Observations
- **Short lateral routes** such as *Screen* and *Flat* retain high ESI values even after normalization,  
  confirming their efficiency in creating quick space.
- **Deep routes** (*Post*, *Corner*, *Go*) show lower ESI, reflecting tighter coverage and greater difficulty.
- **Top receivers** (e.g., Lawrence Cager, Damien Harris) demonstrate strong spatial efficiency across contexts.
- The **positive correlation (r ‚âà 0.30)** between ESI and pass completion confirms that the metric captures  
  meaningful performance information ‚Äî higher ESI corresponds to a higher probability of completed passes.
