# Preparing for Individual Analysis


I have a very large imbalanced dataset that I'm working with. It is looking at players who have been injured in football, and there were 37 players who were injured among 2442 players. Each of these players has been involved in multiple plays. 

To further complicate things, the timing of recorded data does not start or stop at the beginning and end of each play, so the start/end times need to be determined individually. 


Two things I think I should investigate: 

1. Players who do get injured, looking at the games/plays when they were injured, and then in games/plays when they were not injured. I'm going to refer to this as "Individual Performance Analysis"

2. Players who do get injured in those games/plays compared to others who did not get injured in their plays. To do this, I would like to stratify by the player's position, so that there is equal representation of the high frequency positions for positive injury, but for those that did not result in injury. This will be referred to as "Cross-Player Analysis"

I will be using machine learning to analyze and predict the conditions indicative of injury. 


This file looks at the first of these. 

In [1]:
import polars as pl
import os

In [2]:
path = "F:/Data/Clean_Data/"
file = "All_Tracking.parquet"
All_Tracking = pl.read_parquet(os.path.join(path, file))
All_Tracking.head()

PlayKey,time,x,y,dir,o,Angle_Diff,Displacement,Speed,omega_dir,omega_o,omega_diff,Position,p_magnitude,L_dir,L_diff,J_magnitude,torque,torque_internal,InjuryType,InjuryKey,PlayerActivity,ImpactType,OpponentKey
str,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,cat,f32,f32,f32,f32,f32,f32,str,str,cat,cat,str
"""23259-144-2342""",0.0,35.419998,30.889999,152.799988,122.390015,30.41,,,,,,"""OLB""",,,,,,,"""No Injury""","""32410-144-2342""","""Blocked""","""Helmet-to-body""",
"""23259-144-2342""",0.1,35.41,30.889999,173.179993,123.769989,49.41,0.009998,0.099983,3.556983,0.24085,3.316133,"""OLB""",10.888172,7.065558,4.610995,,,,"""No Injury""","""32410-144-2342""","""Blocked""","""Helmet-to-body""",
"""23259-144-2342""",0.2,35.400002,30.879999,172.160004,125.140015,47.02,0.014141,0.141411,-0.178022,0.239115,0.417137,"""OLB""",15.39967,-0.353622,0.580018,10.890249,-74.191803,-40.309769,"""No Injury""","""32410-144-2342""","""Blocked""","""Helmet-to-body""",
"""23259-144-2342""",0.3,35.400002,30.85,178.660004,126.179993,52.48,0.029999,0.299988,1.134465,0.18151,0.952954,"""OLB""",32.668671,2.25349,1.325058,24.348551,26.071119,7.450394,"""No Injury""","""32410-144-2342""","""Blocked""","""Helmet-to-body""",
"""23259-144-2342""",0.4,35.419998,30.84,157.75,126.920013,30.83,0.022358,0.223578,-3.649485,0.129158,3.778644,"""OLB""",24.347622,-7.249302,5.254103,30.797871,-95.027924,39.290459,"""No Injury""","""32410-144-2342""","""Blocked""","""Helmet-to-body""",


We need to separate the player ID, aka GSISID, from the game/play information on this. 

In [3]:
All_Tracking_Split = All_Tracking.with_columns([
    pl.col("PlayKey").str.split('-').list.first().alias('PlayerID')
    , pl.col("PlayKey").str.split('-').list.slice(1).list.join('-').alias('GamePlay')
])

All_Tracking_Split.head()

PlayKey,time,x,y,dir,o,Angle_Diff,Displacement,Speed,omega_dir,omega_o,omega_diff,Position,p_magnitude,L_dir,L_diff,J_magnitude,torque,torque_internal,InjuryType,InjuryKey,PlayerActivity,ImpactType,OpponentKey,PlayerID,GamePlay
str,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,cat,f32,f32,f32,f32,f32,f32,str,str,cat,cat,str,str,str
"""23259-144-2342""",0.0,35.419998,30.889999,152.799988,122.390015,30.41,,,,,,"""OLB""",,,,,,,"""No Injury""","""32410-144-2342""","""Blocked""","""Helmet-to-body""",,"""23259""","""144-2342"""
"""23259-144-2342""",0.1,35.41,30.889999,173.179993,123.769989,49.41,0.009998,0.099983,3.556983,0.24085,3.316133,"""OLB""",10.888172,7.065558,4.610995,,,,"""No Injury""","""32410-144-2342""","""Blocked""","""Helmet-to-body""",,"""23259""","""144-2342"""
"""23259-144-2342""",0.2,35.400002,30.879999,172.160004,125.140015,47.02,0.014141,0.141411,-0.178022,0.239115,0.417137,"""OLB""",15.39967,-0.353622,0.580018,10.890249,-74.191803,-40.309769,"""No Injury""","""32410-144-2342""","""Blocked""","""Helmet-to-body""",,"""23259""","""144-2342"""
"""23259-144-2342""",0.3,35.400002,30.85,178.660004,126.179993,52.48,0.029999,0.299988,1.134465,0.18151,0.952954,"""OLB""",32.668671,2.25349,1.325058,24.348551,26.071119,7.450394,"""No Injury""","""32410-144-2342""","""Blocked""","""Helmet-to-body""",,"""23259""","""144-2342"""
"""23259-144-2342""",0.4,35.419998,30.84,157.75,126.920013,30.83,0.022358,0.223578,-3.649485,0.129158,3.778644,"""OLB""",24.347622,-7.249302,5.254103,30.797871,-95.027924,39.290459,"""No Injury""","""32410-144-2342""","""Blocked""","""Helmet-to-body""",,"""23259""","""144-2342"""


In [4]:
len(All_Tracking_Split)

61972

In [7]:
All_Tracking_Split.n_unique("PlayerID")

132

I need to find all of the PlayerID where the players were injured. This should be a list of 37 unique IDs. 

In [15]:
injured_players = All_Tracking_Split.filter(pl.col("InjuryType") == 'Concussion').select(['PlayerID']).unique()

Now that I know the ID of the injured Players, I want to find all of the plays and games they've been in.

In [17]:
injured_player_plays = All_Tracking_Split.join(injured_players, on='PlayerID', how='inner')

I want to know how many plays these players have been in. This means I am now grouping by the GamePlays. 

In [25]:
unique_gameplays = injured_player_plays.group_by('PlayerID').agg(
    pl.col('GamePlay').n_unique().alias('UniqueGamePlays')
)

unique_gameplays

PlayerID,UniqueGamePlays
str,u32
"""33838""",1
"""32214""",1
"""31950""",2
"""30171""",1
"""32783""",1
…,…
"""31313""",1
"""32410""",1
"""27654""",1
"""33813""",1


In [29]:
path = "F:/Data/Clean_Data/"
file = "TrackingConcussions.parquet"
df = pl.read_parquet(os.path.join(path, file))
df.head()

PlayKey,time,x,y,o,dir,GSISID,PlayerActivity,ImpactType,OpponentKey,InjuryKey,Angle_Diff,Displacement,Speed,vx,vy,omega_dir,omega_o,omega_diff,Position,Height_m,Weight_kg,Chest_rad_m,px,py,moment,moment_upper,p_magnitude,L_dir,L_diff,Jx,Jy,J_magnitude,torque,torque_internal
str,f32,f32,f32,f32,f32,i32,cat,cat,str,str,f32,f32,f32,f32,f32,f32,f32,f32,cat,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32
"""31023-29-538""",0.0,89.660004,46.279999,97.72998,165.809998,31023,"""Tackling""","""Helmet-to-body""","""31941-29-538""","""31023-29-538""",68.080002,,,,,,,,"""WR""",1.88,90.699997,0.191,,,1.654413,1.158089,,,,,,,,
"""31023-29-538""",0.1,89.650002,46.299999,96.940002,172.790009,31023,"""Tackling""","""Helmet-to-body""","""31941-29-538""","""31023-29-538""",75.849998,0.022362,0.22362,-0.100021,0.200005,1.218243,-0.137876,1.356119,"""WR""",1.88,90.699997,0.191,-9.071938,18.140415,1.654413,1.158089,20.282375,2.015477,1.570507,,,,,
"""31023-29-538""",0.2,89.639999,46.290001,96.100006,179.269989,31023,"""Tackling""","""Helmet-to-body""","""31941-29-538""","""31023-29-538""",83.169998,0.014142,0.141425,-0.100021,-0.099983,1.13097,-0.146608,1.277578,"""WR""",1.88,90.699997,0.191,-9.071938,-9.068478,1.654413,1.158089,12.82721,1.871091,1.479549,0.0,-27.208893,27.208893,-1.443858,-0.909575
"""31023-29-538""",0.3,89.629997,46.27,95.220001,175.76001,31023,"""Tackling""","""Helmet-to-body""","""31941-29-538""","""31023-29-538""",80.540001,0.022362,0.22362,-0.100021,-0.200005,-0.612605,-0.15359,0.459014,"""WR""",1.88,90.699997,0.191,-9.071938,-18.140415,1.654413,1.158089,20.282375,-1.013501,0.531579,0.0,-9.071938,9.071938,-28.845924,-9.479699
"""31023-29-538""",0.4,89.610001,46.27,93.910004,172.970001,31023,"""Tackling""","""Helmet-to-body""","""31941-29-538""","""31023-29-538""",79.059998,0.019997,0.199966,-0.199966,0.0,-0.486951,-0.228637,0.258313,"""WR""",1.88,90.699997,0.191,-18.136955,0.0,1.654413,1.158089,18.136955,-0.805618,0.29915,-9.065018,18.140415,20.27928,2.078832,-2.324294


In [30]:
len(df)

39777

In [31]:
len(All_Tracking)

61972

In [36]:
path = "F:/Data/Processing_data/concussion_output"
file = "NGS-2016-post.parquet"
df = pl.read_parquet(os.path.join(path, file))
df.head()

PlayKey,x,y,o,dir,GSISID,time,Angle_Diff,Position,Height_m,Weight_kg,Chest_rad_m,Displacement,Speed,vx,vy,omega_dir,omega_o,omega_diff,px,py,moment,moment_upper,p_magnitude,L_dir,L_diff,Jx,Jy,J_magnitude,torque,torque_internal
str,f32,f32,f32,f32,i32,f32,f32,cat,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32
"""19714-322-1031""",8.26,23.6,-77.75,68.830002,19714,0.0,146.580002,"""P""",1.88,97.519997,0.191,,,,,,,,,,0.296469,0.207528,,,,,,,,
"""19714-322-1031""",8.27,23.620001,-79.32,57.130005,19714,0.1,136.449997,"""P""",1.88,97.519997,0.191,0.022361,0.223612,0.100002,0.200005,-2.042036,-0.274017,1.768019,9.752223,19.504446,0.296469,0.207528,21.806633,-0.6054,0.366914,,,,,
"""19714-322-1031""",8.26,23.639999,-80.57,57.630005,19714,0.2,138.199997,"""P""",1.88,97.519997,0.191,0.022359,0.223595,-0.100002,0.199986,0.087267,-0.218166,0.305433,-9.752223,19.502586,0.296469,0.207528,21.80497,0.025872,0.063386,-19.504446,-0.00186,19.504446,6.312719,-3.03528
"""19714-322-1031""",8.26,23.66,-81.660004,56.130005,19714,0.3,137.789993,"""P""",1.88,97.519997,0.191,0.02,0.200005,0.0,0.200005,-0.2618,-0.190241,0.071558,0.0,19.504446,0.296469,0.207528,19.504446,-0.077615,0.01485,9.752223,0.00186,9.752223,-1.034873,-0.485355
"""19714-322-1031""",8.25,23.65,-83.580002,52.259995,19714,0.4,135.839996,"""P""",1.88,97.519997,0.191,0.014142,0.141425,-0.100002,-0.100002,-0.675443,-0.335103,0.34034,-9.752223,-9.752223,0.296469,0.207528,13.791726,-0.200248,0.07063,-9.752223,-29.256668,30.839235,-1.226325,0.557798


In [37]:
len(df)

1175924