# Analyzing running routes within each cluster

## Importing the libraries

In [1]:
import pandas as pd
import matplotlib.pyplot as plt

## Loading the data

We will need Play and Player Play datasets along with the clusters dataset that we've got from our clustering model.

In [4]:
plays = pd.read_csv('data/plays.csv')

In [2]:
player_play = pd.read_csv('data/player_play.csv')

In [3]:
movers_clusters = pd.read_csv('data/movers_clusters.csv')

## Some data preprocessing

We will need to leave only several columns for this analysis, so let's clean the data little bit.

In [6]:
players = player_play
players_merged = pd.merge(players, plays, on=['gameId', 'playId'])
players_merged_filtered = players_merged[players_merged['teamAbbr'] == players_merged['possessionTeam']] # Leaving only players from the possession team

In [22]:
# List of columns to keep
columns_to_keep = [
    "gameId", "playId", "nflId", "teamAbbr", "routeRan"
]

# Keep only these columns from the dataset
players_filtered = players_merged_filtered[columns_to_keep]

In [23]:
players_filtered.head()

Unnamed: 0,gameId,playId,nflId,teamAbbr,routeRan
0,2022090800,56,35472,BUF,
1,2022090800,56,42392,BUF,
2,2022090800,56,42489,BUF,IN
3,2022090800,56,44875,BUF,
4,2022090800,56,44985,BUF,OUT


We removed all the columns except for the column representing the route ran by player. Now, we want to leave only players that made pre-snap motion, and also add clusters.

In [24]:
players_with_clusters = pd.merge(players_filtered, movers_clusters, on=['gameId', 'playId', 'nflId'])

Since we assigned clusters only to players that were in motion at the moment of the snap, columns that have no cluster represent players that didn't have pre-snap movements. So we can drop them. We will also drop all the outliers that we detected during clusters (cluster values of -1).

In [25]:
players_with_clusters = players_with_clusters.dropna(subset=['cluster'])
players_with_clusters = players_with_clusters.loc[players_with_clusters['cluster'] != -1]

Now, let's look at the routeRan column.

In [26]:
players_with_clusters.head()

Unnamed: 0,gameId,playId,nflId,teamAbbr,routeRan,cluster
0,2022090800,80,47857,BUF,,1
2,2022090800,212,47879,BUF,IN,0
3,2022090800,236,52536,BUF,CORNER,3
4,2022090800,299,44881,LA,,6
5,2022090800,343,44881,LA,,6


Let's convert all NaN values to 'Not Ran'.

In [27]:
players_with_clusters['routeRan'] = players_with_clusters['routeRan'].fillna('Not Ran')