## Play Extraction: Script README

Short script to: 
- format raw game table data from Athena
- extract a specific play from the formatted game table
- determine the player ball possession at each timestamp
- save the Dataframe as a Python pickle for easier and quicker read/write. Script will save the pickle to the specified notebook path.

Ball possession is assigned a boolean value based on the proximity of a player's (x,y) coordinates to the ball's (x,y) coordinates. The specific threshold is determined in the script constants under 'BALL_CARRIER_DISTANCE_THRESHOLD'


To load pickled Dataframe, use the following code snippet:

import pickle

df = pd.read_pickle('data.pickle')

In [1]:
import pandas as pd
import pickle

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)  

In [9]:
### SCRIPT CONSTANTS ###

BALL_CARRIER_DISTANCE_THRESHOLD = 0.5

PLAYID = 1789
GAMEID = 2019092209

In [10]:
df = pd.read_pickle('all_tracking_data-'+str(GAMEID)+'.pickle')

In [11]:
df['gameid'] = df['gameid'].astype('int64')
df['playid'] = df['playid'].astype('int64')
df['player_x'] = df['player_x'].astype('float')
df['player_y'] = df['player_y'].astype('float')
df['ball_x'] = df['ball_x'].astype('float')
df['ball_y'] = df['ball_y'].astype('float')
df['player_time'] = pd.to_datetime(df['player_time'])
df['has_ball'] = (abs(df['player_x']-df['ball_x']) < BALL_CARRIER_DISTANCE_THRESHOLD) & (abs(df['player_y']-df['ball_y']) < BALL_CARRIER_DISTANCE_THRESHOLD)
df['rounded_x'] = df['player_x'].round().astype(int)
df['rounded_y'] = df['player_y'].round().astype(int)
df['rounded_coord'] = list(zip(df['rounded_x'], df['rounded_y']))

df.dtypes

year                                       object
gameid                                      int64
starttime                                  object
endtime                                    object
playid                                      int64
play_type                                  object
nflid                                      object
esbid                                      object
team_abbr                                  object
firstname                                  object
lastname                                   object
positiongroup                              object
position                                   object
height                                    float64
weight                                    float64
player_time                        datetime64[ns]
player_x                                  float64
player_y                                  float64
player_z                                   object
player_directional_acceleration            object


In [12]:
mask = df['playid'].values == PLAYID
play_df = df[mask]
play_df = play_df.sort_values(by=['player_time'])

In [14]:
mask = play_df['has_ball'].values == True
play_df[mask]

Unnamed: 0,year,gameid,starttime,endtime,playid,play_type,nflid,esbid,team_abbr,firstname,lastname,positiongroup,position,height,weight,player_time,player_x,player_y,player_z,player_directional_acceleration,player_direction_in_degrees,player_orientation_in_degrees,player_speed,player_delta_acceleration,home_away,offense_or_defense,ball_time,ball_x,ball_y,has_ball,rounded_x,rounded_y,rounded_coord
633847,2019,2019092209,2019-09-22T21:22:07.958,2019-09-22T21:22:36.958,1789,pass,2555295,SHE495754,NYG,Sterling,Shepard,WR,WR,70.0,201.0,2019-09-22 21:22:08.000,27.63,28.92,0,0.78,122.68,39.59,0.31,0.39,0,1,2019-09-22T21:22:08.000,33.94,29.6,False,28,29,"(28, 29)"
271416,2019,2019092209,2019-09-22T21:22:07.958,2019-09-22T21:22:36.958,1789,pass,2562736,SLA500077,NYG,Darius,Slayton,WR,WR,73.0,190.0,2019-09-22 21:22:08.000,27.53,29.78,0,0.62,67.41,107.75,0.53,0.32,0,1,2019-09-22T21:22:08.000,33.94,29.6,False,28,30,"(28, 30)"
230031,2019,2019092209,2019-09-22T21:22:07.958,2019-09-22T21:22:36.958,1789,pass,2555331,COR258226,NYG,Cody,Core,WR,WR,75.0,205.0,2019-09-22 21:22:08.000,27.55,28.59,0,0.88,7.46,285.9,0.31,0.09,0,1,2019-09-22T21:22:08.000,33.94,29.6,False,28,29,"(28, 29)"
632744,2019,2019092209,2019-09-22T21:22:07.958,2019-09-22T21:22:36.958,1789,pass,2555295,SHE495754,NYG,Sterling,Shepard,WR,WR,70.0,201.0,2019-09-22 21:22:08.100,27.61,28.9,0,1.53,234.71,62.96,0.31,1.47,0,1,2019-09-22T21:22:08.100,33.95,29.61,False,28,29,"(28, 29)"
271113,2019,2019092209,2019-09-22T21:22:07.958,2019-09-22T21:22:36.958,1789,pass,2562736,SLA500077,NYG,Darius,Slayton,WR,WR,73.0,190.0,2019-09-22 21:22:08.100,27.58,29.81,0,0.34,65.98,115.39,0.48,-0.03,0,1,2019-09-22T21:22:08.100,33.95,29.61,False,28,30,"(28, 30)"
229816,2019,2019092209,2019-09-22T21:22:07.958,2019-09-22T21:22:36.958,1789,pass,2555331,COR258226,NYG,Cody,Core,WR,WR,75.0,205.0,2019-09-22 21:22:08.100,27.55,28.62,0,0.7,351.96,270.62,0.3,0.04,0,1,2019-09-22T21:22:08.100,33.95,29.61,False,28,29,"(28, 29)"
230028,2019,2019092209,2019-09-22T21:22:07.958,2019-09-22T21:22:36.958,1789,pass,2555331,COR258226,NYG,Cody,Core,WR,WR,75.0,205.0,2019-09-22 21:22:08.200,27.55,28.64,0,0.56,343.42,270.62,0.25,-0.17,0,1,2019-09-22T21:22:08.200,33.94,29.61,False,28,29,"(28, 29)"
271170,2019,2019092209,2019-09-22T21:22:07.958,2019-09-22T21:22:36.958,1789,pass,2562736,SLA500077,NYG,Darius,Slayton,WR,WR,73.0,190.0,2019-09-22 21:22:08.200,27.61,29.83,0,0.36,62.18,115.39,0.39,-0.34,0,1,2019-09-22T21:22:08.200,33.94,29.61,False,28,30,"(28, 30)"
633755,2019,2019092209,2019-09-22T21:22:07.958,2019-09-22T21:22:36.958,1789,pass,2555295,SHE495754,NYG,Sterling,Shepard,WR,WR,70.0,201.0,2019-09-22 21:22:08.200,27.6,28.89,0,0.85,251.11,90.25,0.18,0.77,0,1,2019-09-22T21:22:08.200,33.94,29.61,False,28,29,"(28, 29)"
633836,2019,2019092209,2019-09-22T21:22:07.958,2019-09-22T21:22:36.958,1789,pass,2555295,SHE495754,NYG,Sterling,Shepard,WR,WR,70.0,201.0,2019-09-22 21:22:08.300,27.57,28.89,0,0.86,268.02,99.33,0.25,0.77,0,1,2019-09-22T21:22:08.300,33.96,29.62,False,28,29,"(28, 29)"


In [8]:
save_name = "playDF_"+str(PLAYID)

play_df.to_pickle('play_dfs/'+save_name+'.pickle')