
# Offensive DNA of NFL Teams: K-Means Clustering

Every franchise puts its own spin on moving the ball. Some air it out, others grind on the ground. In this
notebook we use real 2023 play-by-play data to quantify each team's style and then let k-means clustering
group similar offenses together.



## The k-means objective
Given team features \(x_i\), k-means partitions teams into \(k\) groups by minimizing the distance from each
team to its cluster center:
\[ \sum_{i=1}^n \lVert x_i - \mu_{c_i} Vert^2 \]
This creates clusters where teams inside a group play a similar brand of football.


In [None]:

import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

# load team-level summary data
teams = pd.read_csv('../data/team_summary_2023.csv')
print(f"Loaded {len(teams)} teams")
teams.head()



## Clustering by play style
We'll cluster teams using two simple features:

* `pass_rate` – fraction of plays that are passes
* `epa_per_play` – average expected points added per play

These give a sense of both **tendency** and **efficiency**.


In [None]:

features = teams[['pass_rate','epa_per_play']]
km = KMeans(n_clusters=3, random_state=0)
teams['cluster'] = km.fit_predict(features)
teams.sort_values('cluster').head()



## Visualizing the clusters
A scatter plot makes the groupings easy to see.


In [None]:

plt.figure(figsize=(8,6))
for c, group in teams.groupby('cluster'):
    plt.scatter(group['pass_rate'], group['epa_per_play'], label=f'Cluster {c}')
for _, row in teams.iterrows():
    plt.text(row['pass_rate']+0.001, row['epa_per_play']+0.001, row['posteam'], fontsize=8)
plt.xlabel('Pass Rate')
plt.ylabel('EPA per Play')
plt.legend()
plt.title('Offensive Play Style Clusters (2023)')
plt.tight_layout()
plt.show()



## Takeaways
K-means separates high-tempo passing attacks from run-heavy units and everything in between. While the
features here are simple, the approach scales—add more metrics like pace, formation usage, or motion rate
to build a richer fingerprint of each team's offensive DNA.
