### Introduction to week 5 workbook: Creating a (simple) global metric

In week 5 we have looked at how many metrics have been developed in order to try and capture “global indicators” like sleep quality, body battery, VO2max, etc.  These are exciting -- but also we need to be mindful that they are hard to test and validate in some (many!) cases.  With that said -- I'm going to "play around" a little with the data to make a simple global metric.  

### Import Libraries and Dataset 

In [1]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sn
import matplotlib.dates as mdates
import datetime
import plotly.graph_objects as go

import scipy.stats as stats

df = pd.read_csv("../Data/FH.csv",)

In [2]:
df.columns

Index(['Unnamed: 0', 'Timestamp', 'Seconds', 'Velocity', 'Acceleration',
       'Odometer', 'Latitude', 'Longitude', 'Heart Rate', 'Player Load',
       'AthleteID'],
      dtype='object')

In [3]:
df.head()

Unnamed: 0.1,Unnamed: 0,Timestamp,Seconds,Velocity,Acceleration,Odometer,Latitude,Longitude,Heart Rate,Player Load,AthleteID
0,0,9/30/2018 12:21:49 PM,0.0,0.06,-0.041234,0.0,42.263222,-83.741055,122,0.0,Athlete 1
1,1,9/30/2018 12:21:49 PM,0.1,0.06,-0.025926,0.0,42.263223,-83.741055,122,0.0,Athlete 1
2,2,9/30/2018 12:21:49 PM,0.2,0.06,-0.011945,0.0,42.263223,-83.741055,122,0.0,Athlete 1
3,3,9/30/2018 12:21:49 PM,0.3,0.09,0.048539,0.0,42.263223,-83.741055,122,0.0,Athlete 1
4,4,9/30/2018 12:21:49 PM,0.4,0.08,0.021406,0.0,42.263223,-83.741055,122,0.0,Athlete 1


In [4]:
df.drop(['Unnamed: 0', 'Latitude', 'Longitude', 'Heart Rate'], axis=1, inplace=True) # we can drop the previous index column ('unnamed') and some others

In [5]:
master=df.set_index(['Timestamp', 'AthleteID'], inplace=False) # we'll use a multi-index to keep time in the index and the player ID #
master.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Seconds,Velocity,Acceleration,Odometer,Player Load
Timestamp,AthleteID,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
9/30/2018 12:21:49 PM,Athlete 1,0.0,0.06,-0.041234,0.0,0.0
9/30/2018 12:21:49 PM,Athlete 1,0.1,0.06,-0.025926,0.0,0.0
9/30/2018 12:21:49 PM,Athlete 1,0.2,0.06,-0.011945,0.0,0.0
9/30/2018 12:21:49 PM,Athlete 1,0.3,0.09,0.048539,0.0,0.0
9/30/2018 12:21:49 PM,Athlete 1,0.4,0.08,0.021406,0.0,0.0


In [6]:
master['farthest']=master['Odometer'].diff(300) # 30 second window of recent distance covered (in meters) -- this will provide us perspective on the anaerobic capacity of the player


In [7]:
master['ThreeMinuteDistance']=master['Odometer'].diff(1800) # 3 minute window of recent distance covered (in meters).

In [8]:
master.loc[master['ThreeMinuteDistance'] <0,'ThreeMinuteDistance'] = np.nan
master.loc[master['farthest'] <0,'farthest'] = np.nan

The rationale for the simple performance measure is that Field Hockey requires high velocity -- but also considerable stamina.  One could weight the different measures variably to 

In [9]:
master.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Seconds,Velocity,Acceleration,Odometer,Player Load,farthest,ThreeMinuteDistance
Timestamp,AthleteID,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
9/30/2018 12:21:49 PM,Athlete 1,0.0,0.06,-0.041234,0.0,0.0,,
9/30/2018 12:21:49 PM,Athlete 1,0.1,0.06,-0.025926,0.0,0.0,,
9/30/2018 12:21:49 PM,Athlete 1,0.2,0.06,-0.011945,0.0,0.0,,
9/30/2018 12:21:49 PM,Athlete 1,0.3,0.09,0.048539,0.0,0.0,,
9/30/2018 12:21:49 PM,Athlete 1,0.4,0.08,0.021406,0.0,0.0,,


In [10]:
MaxValues_df=master.groupby('AthleteID').agg([max])
print(MaxValues_df)

            Seconds Velocity Acceleration  Odometer Player Load farthest  \
                max      max          max       max         max      max   
AthleteID                                                                  
Athlete 1   8759.98     7.24     5.981192  10036.75       891.5   151.38   
Athlete 10  8749.93     6.79     5.580821   7269.15       615.6   150.52   
Athlete 11  8749.93     6.56     3.825559   4791.84       449.0   122.10   
Athlete 12  8749.91     7.38     4.360430   8551.21       734.4   177.37   
Athlete 13  8749.93     6.33     7.034368   6345.99       609.2   118.57   
Athlete 14  7855.38     6.58     4.908402   9906.41       951.3   118.65   
Athlete 15  8749.93     6.61    11.318196   5001.16       540.0   115.69   
Athlete 17  8749.93     7.71     5.889059   9896.45       888.5   121.59   
Athlete 18  8759.92     6.63     5.557953   6326.56       688.5   111.38   
Athlete 19  8749.92     6.85     4.089531   7952.18       781.6   134.38   
Athlete 2   

In [11]:
MaxValues_df['zscores_vel'] = stats.zscore(MaxValues_df['Velocity'])
MaxValues_df['zscores_far'] = stats.zscore(MaxValues_df['farthest'])
MaxValues_df['zscores_three'] = stats.zscore(MaxValues_df['ThreeMinuteDistance'])
print(MaxValues_df)

            Seconds Velocity Acceleration  Odometer Player Load farthest  \
                max      max          max       max         max      max   
AthleteID                                                                  
Athlete 1   8759.98     7.24     5.981192  10036.75       891.5   151.38   
Athlete 10  8749.93     6.79     5.580821   7269.15       615.6   150.52   
Athlete 11  8749.93     6.56     3.825559   4791.84       449.0   122.10   
Athlete 12  8749.91     7.38     4.360430   8551.21       734.4   177.37   
Athlete 13  8749.93     6.33     7.034368   6345.99       609.2   118.57   
Athlete 14  7855.38     6.58     4.908402   9906.41       951.3   118.65   
Athlete 15  8749.93     6.61    11.318196   5001.16       540.0   115.69   
Athlete 17  8749.93     7.71     5.889059   9896.45       888.5   121.59   
Athlete 18  8759.92     6.63     5.557953   6326.56       688.5   111.38   
Athlete 19  8749.92     6.85     4.089531   7952.18       781.6   134.38   
Athlete 2   

In [12]:
# adding up the z-scores to provide my simple global metric! 

MaxValues_df['Metric'] = stats.zscore(MaxValues_df['Velocity']) + stats.zscore(MaxValues_df['farthest'])+stats.zscore(MaxValues_df['ThreeMinuteDistance']) + 10

print(MaxValues_df)

            Seconds Velocity Acceleration  Odometer Player Load farthest  \
                max      max          max       max         max      max   
AthleteID                                                                  
Athlete 1   8759.98     7.24     5.981192  10036.75       891.5   151.38   
Athlete 10  8749.93     6.79     5.580821   7269.15       615.6   150.52   
Athlete 11  8749.93     6.56     3.825559   4791.84       449.0   122.10   
Athlete 12  8749.91     7.38     4.360430   8551.21       734.4   177.37   
Athlete 13  8749.93     6.33     7.034368   6345.99       609.2   118.57   
Athlete 14  7855.38     6.58     4.908402   9906.41       951.3   118.65   
Athlete 15  8749.93     6.61    11.318196   5001.16       540.0   115.69   
Athlete 17  8749.93     7.71     5.889059   9896.45       888.5   121.59   
Athlete 18  8759.92     6.63     5.557953   6326.56       688.5   111.38   
Athlete 19  8749.92     6.85     4.089531   7952.18       781.6   134.38   
Athlete 2   

We added 10 to the z-scores as no one wants to have a performance score that is negative.  However, this individual athlete is likely and was not currently participating fully with the practices or games.  All other scores are near 10 -- which makes sense as the z-scores should sum to zero.