# Missing RPE Data

On some of the game dates, we don't have RPE data. But, we are mainly concerned with the acute/chronic ratio which we can compute for the dates for which it is missing.

In [1]:
import pandas as pd
import numpy as np
from factor_analyzer import FactorAnalyzer
from factor_analyzer.factor_analyzer import calculate_bartlett_sphericity, calculate_kmo
from scipy.stats import pointbiserialr
import matplotlib.pyplot as plt

In [2]:
np.random.seed(5151)
rpe_df = pd.read_csv('./processed_data/processed_rpe.csv')
print(rpe_df.head())


   Unnamed: 0        Date  PlayerID  Training SessionType  Duration  RPE  \
0           0  2017-08-01        15         1    Strength      60.0  4.0   
1           1  2017-08-01         1         1       Speed      60.0  3.0   
2           2  2017-08-01         1         1    Strength      90.0  4.0   
3           3  2017-08-01         3         1       Speed      45.0  5.0   
4           4  2017-08-01         3         1    Strength      90.0  5.0   

   SessionLoad  DailyLoad  AcuteChronicRatio  ObjectiveRating  FocusRating  \
0        240.0      300.0                4.3              6.0          7.0   
1        180.0      540.0                4.3              0.0          0.0   
2        360.0      540.0                4.3              0.0          0.0   
3        225.0      675.0                4.3              7.0          7.0   
4        450.0      675.0                4.3              7.0          7.0   

   BestOutOfMyself  AcuteLoad  ChronicLoad  
0              3.0       42.9

## Drop Duplicate Dates

This occurs when there are multiple training sessions on the same date. We only care about the daily load so we can drop the duplicate sessions.

In [3]:
rpe_df = rpe_df.copy()[['Date', 'PlayerID', 'DailyLoad', 'AcuteChronicRatio', 'AcuteLoad', 'ChronicLoad']]
rpe_df.head()

Unnamed: 0,Date,PlayerID,DailyLoad,AcuteChronicRatio,AcuteLoad,ChronicLoad
0,2017-08-01,15,300.0,4.3,42.9,10.0
1,2017-08-01,1,540.0,4.3,77.1,18.0
2,2017-08-01,1,540.0,4.3,77.1,18.0
3,2017-08-01,3,675.0,4.3,96.4,22.5
4,2017-08-01,3,675.0,4.3,96.4,22.5


In [4]:
rpe_df = rpe_df.drop_duplicates()

## Fill In Missing RPE Data

In [5]:
rpe_df[rpe_df['AcuteChronicRatio'].isnull()]

Unnamed: 0,Date,PlayerID,DailyLoad,AcuteChronicRatio,AcuteLoad,ChronicLoad
265,2017-08-13,6,0.0,,0.0,0.0


This row should not be in the dataset since it is the first entry for this player and they are not training.

In [6]:
rpe_df = rpe_df[~ rpe_df['AcuteChronicRatio'].isnull()]

In [7]:
rpe_df['Date'] = pd.to_datetime(rpe_df['Date'])

Adding missing dates and player ids to rpe data:

In [16]:
dates = pd.date_range(start=min(rpe_df['Date']), end=max(rpe_df['Date']))
players = rpe_df['PlayerID'].unique()
idx = pd.MultiIndex.from_product((dates, players), names=['Date', 'PlayerID'])

rpe_df = rpe_df.set_index(['Date', 'PlayerID']).reindex(idx, fill_value=0).reset_index().sort_values(by=['Date', 'PlayerID'])



Unnamed: 0,Date,PlayerID,DailyLoad,AcuteChronicRatio,AcuteLoad,ChronicLoad,newAcuteLoad
0,2017-08-01,1,540.0,4.3,77.1,18.0,77.1
1,2017-08-01,2,0.0,0.0,0.0,0.0,0.0
2,2017-08-01,3,675.0,4.3,96.4,22.5,96.4
3,2017-08-01,4,0.0,0.0,0.0,0.0,0.0
4,2017-08-01,5,330.0,4.3,47.1,11.0,47.1
5,2017-08-01,6,0.0,0.0,0.0,0.0,0.0
6,2017-08-01,7,0.0,0.0,0.0,0.0,0.0
7,2017-08-01,8,345.0,4.3,49.3,11.5,49.3
8,2017-08-01,9,405.0,4.3,57.9,13.5,57.9
9,2017-08-01,10,640.0,4.3,91.4,21.3,91.4


Use a rolling window to compute acute load and chronic load:

In [21]:
past7Days = rpe_df.groupby('PlayerID').rolling('7d', on='Date')['DailyLoad'].sum().reset_index()
past7Days['newAcuteLoad'] = (past7Days['DailyLoad'] / 7.).round(1)
past7Days = past7Days.drop(columns = 'DailyLoad')

rpe_df = pd.merge(rpe_df, past7Days, how='left', on=['Date', 'PlayerID'])

rpe_df[rpe_df['PlayerID'] == 1]

Unnamed: 0,Date,PlayerID,DailyLoad,AcuteChronicRatio,AcuteLoad,ChronicLoad,newAcuteLoad_x,newAcuteLoad_y,newAcuteLoad_x.1,newAcuteLoad_y.1,newAcuteLoad_x.2,newAcuteLoad_y.2
0,2017-08-01,1,540.0,4.3,77.1,18.0,77.1,77.1,77.1,77.1,77.1,77.1
17,2017-08-02,1,0.0,4.3,77.1,18.0,77.1,77.1,77.1,77.1,77.1,77.1
34,2017-08-03,1,0.0,4.3,77.1,18.0,77.1,77.1,77.1,77.1,77.1,77.1
51,2017-08-04,1,0.0,0.0,0.0,0.0,77.1,77.1,77.1,77.1,77.1,77.1
68,2017-08-05,1,0.0,0.0,0.0,0.0,77.1,77.1,77.1,77.1,77.1,77.1
85,2017-08-06,1,0.0,0.0,0.0,0.0,77.1,77.1,77.1,77.1,77.1,77.1
102,2017-08-07,1,720.0,4.3,180.0,42.0,180.0,180.0,180.0,180.0,180.0,180.0
119,2017-08-08,1,0.0,0.0,0.0,0.0,102.9,102.9,102.9,102.9,102.9,102.9
136,2017-08-09,1,0.0,0.0,0.0,0.0,102.9,102.9,102.9,102.9,102.9,102.9
153,2017-08-10,1,0.0,0.0,0.0,0.0,102.9,102.9,102.9,102.9,102.9,102.9


In [None]:
gps_df = pd.read_csv('./processed_data/processed_gps.csv')
gps_df = gps_df.drop(columns=['Unnamed: 0'])
gps_df.head()


In [None]:
merged_df = gps_df.merge(rpe_df, how='left', on=['Date', 'PlayerID'])
merged_df.head()
