### Libraries

In [10]:
# Libraries
import pandas as pd
import numpy as np
import gspread
import datetime

import os
os.chdir("..")  
from src import config
from src import help_functions as hf

# Configs
pd.set_option("display.max_columns", None)
pd.set_option("display.max_colwidth", None)

### Import and quick check data

In [None]:
# Import and quick check Training data 
googleDrive_client = gspread.authorize(config.DRIVE_CREDENTIALS)
training_data, _ = hf.import_google_sheet(googleDrive_client=googleDrive_client, filename=config.DRIVE_TP_LOG_FILENAMES[0], sheet_index=0)

# "Clean" data
for col in training_data.columns:
    try:
        training_data[col] = training_data[col].apply(hf.safe_convert_to_numeric)
    except ValueError:
        pass 

# Date & Datetime
training_data["Date"] = pd.to_datetime(training_data[["Year", "Month", "Day"]]).dt.date
training_data["Datetime"] = pd.to_datetime(training_data[["Year", "Month", "Day"]])
training_data = training_data.sort_values(by="Date").reset_index(drop=True)

# About
print("Training data about:")
print("-----------------------------------------------------")
print("Todays date: {}".format(datetime.datetime.today().date()))
print("Date range: {} to {}".format(training_data["Date"].min(), training_data["Date"].max()))
print("Duplicated rows = {}".format(training_data[training_data.duplicated(keep=False)].shape[0]))
print("Missing dates = {}".format([d for d in pd.date_range(start=training_data["Date"].min(), end=training_data["Date"].max()).date if d not in training_data["Date"].values]))

print("\nDifferent activities and their counts:")
print("-------------------------------------")
activities_count_time = (
    training_data
    .groupby("Activity type")[["Duration [h]"]]
    .agg(
        count=("Duration [h]", "count"),
        total_duration=("Duration [h]", "sum")
        )
    .reset_index()
    .sort_values(by="total_duration", ascending=False)
    )

for _, row in activities_count_time.iterrows():
    print("{} ~> {:.2f} hours ({} act.)".format(row["Activity type"], row["total_duration"], row["count"]))

Data about:
-----------------------------------------------------
Todays date: 2025-08-23
Date range: 2024-09-13 to 2025-08-22
Duplicated rows = 0
Missing dates = []

Different activities and their counts:
-------------------------------------
Trail Running ~> 276.44 hours (149 act.)
Road Biking ~> 80.82 hours (33 act.)
Running ~> 78.25 hours (76 act.)
Indoor Biking ~> 71.04 hours (51 act.)
Mountain Biking ~> 16.99 hours (9 act.)
Hiking ~> 15.35 hours (6 act.)
Road biking ~> 3.74 hours (2 act.)
Lap Swimming ~> 0.21 hours (1 act.)


In [16]:
# Import and quick check Daily data
googleDrive_client = gspread.authorize(config.DRIVE_CREDENTIALS)
daily_data, _ = hf.import_google_sheet(googleDrive_client=googleDrive_client, filename=config.DRIVE_TP_LOG_FILENAMES[1], sheet_index=0)

# "Clean" data
for col in daily_data.columns:
    try:
        daily_data[col] = daily_data[col].apply(hf.safe_convert_to_numeric)
    except ValueError:
        pass 

# Date & Datetime
daily_data["Date"] = pd.to_datetime(daily_data[["Year", "Month", "Day"]]).dt.date
daily_data["Datetime"] = pd.to_datetime(daily_data[["Year", "Month", "Day"]])
daily_data = daily_data.sort_values(by="Date").reset_index(drop=True)

# About
print("Daily data about:")
print("-----------------------------------------------------")
print("Todays date: {}".format(datetime.datetime.today().date()))
print("Date range: {} to {}".format(daily_data["Date"].min(), daily_data["Date"].max()))
print("Duplicated rows = {}".format(daily_data[daily_data.duplicated(keep=False)].shape[0]))
print("Missing dates = {}".format([d for d in pd.date_range(start=daily_data["Date"].min(), end=daily_data["Date"].max()).date if d not in daily_data["Date"].values]))

Daily data about:
-----------------------------------------------------
Todays date: 2025-08-23
Date range: 2024-04-15 to 2025-08-22
Duplicated rows = 0
Missing dates = []


### Notes

Data preparation:
- We will take all activities into account regardless if it was real training or not (including Hiking and Swimming or cycling with my girlfriend).
- Given the above, and since in principle I only have one "real" workout per day, we will calculate the total training load in the day, so that each sample is one day. This also solves the problem of the "not real" workouts mentioned above. However, if we could have more serious workouts in the day, we would consider each sample one workout and in order not to distort the TL data, we would want to discard everything that is not a real workout.

Formal definitions:
- $TL_t$ - Training load of the day t
- $TL_{\text{avg},t} = \frac{1}{n} \sum_{i=1}^{n} TL_{t}$ - recent Average Load over n days
- $TL_{\text{max},t} = \max(TL_{t-1}, TL_{t-2}, \dots, TL_{t-n})$ - recent peak load over n sessions
- $RTL_{\text{avg},t} = \frac{TL_t}{TL_{\text{avg},t}}$ - relative to recent average
- $RTL_{\text{peak},t} = \frac{TL_t}{TL_{\text{max},t}}$ - relative to recent peak
- $RTL^*_t = \alpha \cdot RTL_{\text{avg},t} + (1 - \alpha) \cdot RTL_{\text{peak},t} \quad 0 \le \alpha \le 1$ - Composite Metric

Notes:
- Using a rolling average (TL_avg) or rolling peak (TL_max) captures your recent training state. It contextualizes today’s session: a 300 TRIMP session might be heavy for someone who has only done 200 TRIMP sessions recently, but normal for someone consistently doing 400.
- RTL_avg shows relative load compared to baseline adaptation (sustained training).
- RTL_peak shows relative load compared to recent maximum stress, highlighting spikes that may be riskier.
- Combining them with a weight α gives a balanced view, accounting for both consistency and acute stress.

Potential improvements:
- Instead of a fixed n sessions, you could weight recent sessions more (exponential moving average): $TL_{\text{EMA},t} = \frac{\sum_{i=1}^{n} TL_{t-i} \cdot \lambda^{i-1}}{\sum_{i=1}^{n} \lambda^{i-1}}, \quad 0 < \lambda \le 1$.
- Nonlinear combination of average and peak. Sometimes spikes should count more than their linear weight: $RTL^*_t = \left(RTL_{\text{avg},t}\right)^\beta \cdot \left(RTL_{\text{peak},t}\right)^{1-\beta}, \quad \beta \in [0,1]$.
- Multi-metric composite. Instead of using just TL, we could combine distance, duration, and intensity into a single RTL vector: $\mathbf{RTL}_t = [RTL_{TL,t}, RTL_{D,t}, RTL_{T,t}, RTL_{HR,t}]$
- Fatigue / readiness adjustment. Relative load could be normalized by recovery metrics (HRV, sleep, resting HR): $TL_{\text{adj},t} = \frac{RTL_t}{1 + f(\text{HRV}_t, \text{rest}_t)}$

Benefits:
- Captures context: It tells you if today’s session is heavy relative to your current state.
- Avoids overtraining risk: spikes above recent peak are immediately visible.
- Enables better training decisions: you can plan lighter or heavier sessions relative to recent load.
- Makes metrics comparable across athletes or periods by using ratios rather than absolute values.

Additional: Using RTL for smarter periodization
- The classic idea: you don’t just train hard every day. You want to match training stress to recent load and recovery, which is essentially auto-regulated periodization.
- Low recent load - higher intensity is safe - RTL will be below 1, signaling that a spike is acceptable.
- High recent load - reduce intensity - If RTL is already above 1 (or near peak), adding a hard session risks overtraining or injury.
- Goal: structured variation - microcycle approach. 
- This is where the above potential improvements help: 
    - Exponential weighting for recent load - metric is sensitive to very recent fatigue.
    - Nonlinear combination of avg and peak - Spikes in load are highlighted more strongly.
    - Multi-metric RTL - You might have a “distance-heavy” day but low intensity, or vice versa. (High RTL_distance - limit long runs & High RTL_HR - limit high-intensity intervals).
    - Recovery-adjusted RTL - You can have a “hard” day only if both recent load is low and body is ready.

$$
\text{Next session intensity} =
\begin{cases} 
\text{Hard}, & \text{if } RTL_{\text{adj}} < 0.8 \\[2mm]
\text{Moderate}, & \text{if } 0.8 \le RTL_{\text{adj}} \le 1.2 \\[1mm]
\text{Easy / Recovery}, & \text{if } RTL_{\text{adj}} > 1.2
\end{cases}
$$

Youu can tune the thresholds based on athlete type, sport, or training phase. This produces an adaptive low-low-hard pattern automatically.