# Rehab Strength Dashboard â€“ EDA Baseline

## Purpose
This notebook establishes the exploratory data analysis (EDA) foundation
for understanding training load, sleep behavior, and recovery signals
during rehabilitation.

The goal is not prediction, but clarity:
- What is normal?
- What is noise?
- What could indicate risk?


## Data Dictionary Reference

The full data dictionary for this project is maintained in:

`data_schema/data_dictionary.md`

This notebook assumes the definitions, granularity,
and limitations described there.


In [51]:
import pandas as pd
import numpy as np

## WORKOUTS

In [52]:
workouts = pd.read_csv("/Users/polux9589/Desktop/gym-ml-performance/data/raw/strong.csv")   #Upload workouts data
workouts.info()

<class 'pandas.DataFrame'>
RangeIndex: 6327 entries, 0 to 6326
Data columns (total 12 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Date           6327 non-null   str    
 1   Workout Name   6327 non-null   str    
 2   Duration       6327 non-null   str    
 3   Exercise Name  6327 non-null   str    
 4   Set Order      6327 non-null   str    
 5   Weight         6327 non-null   float64
 6   Reps           6327 non-null   float64
 7   Distance       6327 non-null   int64  
 8   Seconds        6327 non-null   float64
 9   Notes          185 non-null    str    
 10  Workout Notes  0 non-null      float64
 11  RPE            4458 non-null   float64
dtypes: float64(5), int64(1), str(6)
memory usage: 593.3 KB


In [53]:
workouts.columns

Index(['Date', 'Workout Name', 'Duration', 'Exercise Name', 'Set Order',
       'Weight', 'Reps', 'Distance', 'Seconds', 'Notes', 'Workout Notes',
       'RPE'],
      dtype='str')

In [54]:
workouts.describe(include="all")

Unnamed: 0,Date,Workout Name,Duration,Exercise Name,Set Order,Weight,Reps,Distance,Seconds,Notes,Workout Notes,RPE
count,6327,6327,6327,6327,6327,6327.0,6327.0,6327.0,6327.0,185,0.0,4458.0
unique,199,17,61,70,11,,,,,37,,
top,2025-12-19 10:49:20,CHEST & BACK,1h 25m,Incline Bench Press (Barbell),Rest Timer,,,,,Machine brand: LifeFitness,,
freq,67,984,332,327,1615,,,,,46,,
mean,,,,,,67.315157,7.954481,0.0,34.392919,,,8.02288
std,,,,,,67.659043,5.259003,0.0,59.984666,,,1.079067
min,,,,,,0.0,0.0,0.0,0.0,,,6.0
25%,,,,,,0.0,0.0,0.0,0.0,,,7.0
50%,,,,,,50.0,10.0,0.0,0.0,,,8.0
75%,,,,,,105.0,12.0,0.0,100.0,,,9.0


In [55]:
workouts.shape

(6327, 12)

In [56]:
workouts["Date"]= pd.to_datetime(workouts["Date"], format="%Y-%m-%d %H:%M:%S")
workouts["Date"] = workouts["Date"].dt.date
workouts["Date"]

0       2024-09-22
1       2024-09-22
2       2024-09-22
3       2024-09-22
4       2024-09-22
           ...    
6322    2026-01-26
6323    2026-01-26
6324    2026-01-26
6325    2026-01-26
6326    2026-01-26
Name: Date, Length: 6327, dtype: object

## SLEEP

In [57]:
sleep = pd.read_excel("/Users/polux9589/Desktop/gym-ml-performance/data/raw/Sleep_Garmin.xlsx")  #Upload sleep data
sleep.info()

<class 'pandas.DataFrame'>
RangeIndex: 42 entries, 0 to 41
Data columns (total 12 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   Sleep Score 4 Weeks  42 non-null     datetime64[us]
 1   Score                42 non-null     int64         
 2   Resting Heart Rate   42 non-null     int64         
 3   Body Battery         42 non-null     int64         
 4   Pulse Ox             42 non-null     str           
 5   Respiration          42 non-null     float64       
 6   HRV Status           42 non-null     int64         
 7   Quality              42 non-null     str           
 8   Duration             42 non-null     str           
 9   Sleep Need           42 non-null     str           
 10  Bedtime              42 non-null     object        
 11  Wake Time            42 non-null     object        
dtypes: datetime64[us](1), float64(1), int64(4), object(2), str(4)
memory usage: 4.1+ KB


In [58]:
sleep.shape

(42, 12)

In [59]:
sleep.columns

Index(['Sleep Score 4 Weeks', 'Score', 'Resting Heart Rate', 'Body Battery',
       'Pulse Ox', 'Respiration', 'HRV Status', 'Quality', 'Duration',
       'Sleep Need', 'Bedtime', 'Wake Time'],
      dtype='str')

In [60]:
sleep.sample(10)

Unnamed: 0,Sleep Score 4 Weeks,Score,Resting Heart Rate,Body Battery,Pulse Ox,Respiration,HRV Status,Quality,Duration,Sleep Need,Bedtime,Wake Time
1,2025-12-17,83,45,63,--,15.64,47,Good,8h 31min,8h 0min,22:11:00,06:54:00
9,2025-12-25,78,42,58,--,14.45,46,Fair,6h 8min,8h 0min,01:39:00,07:49:00
7,2025-12-23,87,44,62,--,15.61,44,Good,7h 37min,8h 20min,22:27:00,06:13:00
35,2026-01-20,81,42,51,93.38,14.87,46,Good,7h 1min,7h 40min,00:24:00,07:59:00
30,2026-01-15,86,44,56,91.75,15.79,45,Good,7h 38min,8h 0min,22:15:00,05:55:00
31,2026-01-16,79,41,49,93.85,14.78,46,Fair,6h 12min,8h 0min,00:30:00,06:46:00
26,2026-01-11,86,43,49,93.48,16.02,46,Good,7h 40min,8h 0min,22:31:00,06:37:00
5,2025-12-21,87,43,66,--,15.7,43,Good,8h 1min,8h 0min,22:53:00,07:00:00
32,2026-01-17,79,43,54,93.38,14.94,47,Fair,6h 37min,8h 30min,23:34:00,06:30:00
34,2026-01-19,86,43,51,93.06,15.58,45,Good,8h 24min,8h 0min,22:31:00,07:21:00


In [61]:
sleep.describe(include="all")

Unnamed: 0,Sleep Score 4 Weeks,Score,Resting Heart Rate,Body Battery,Pulse Ox,Respiration,HRV Status,Quality,Duration,Sleep Need,Bedtime,Wake Time
count,42,42.0,42.0,42.0,42,42.0,42.0,42,42,42,42,42
unique,,,,,15,,,3,39,9,36,34
top,,,,,--,,,Good,8h 3min,8h 0min,22:00:00,06:15:00
freq,,,,,26,,,25,2,24,3,5
mean,2026-01-12 00:00:00,84.214286,42.666667,54.619048,,15.417381,46.333333,,,,,
min,2025-12-16 00:00:00,63.0,40.0,30.0,,14.45,43.0,,,,,
25%,2025-12-26 06:00:00,79.25,42.0,51.0,,14.925,46.0,,,,,
50%,2026-01-05 12:00:00,86.0,42.5,54.0,,15.46,46.0,,,,,
75%,2026-01-15 18:00:00,87.0,43.0,58.0,,15.785,47.0,,,,,
max,2026-10-22 00:00:00,98.0,48.0,68.0,,16.81,48.0,,,,,


## HRV

In [62]:
hrv = pd.read_excel("/Users/polux9589/Desktop/gym-ml-performance/data/raw/HRV_Status.xlsx")  #Upload HRV data
hrv.info()

<class 'pandas.DataFrame'>
RangeIndex: 42 entries, 0 to 41
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Date           42 non-null     datetime64[us]
 1   Overnight HRV  42 non-null     str           
 2   Baseline       42 non-null     str           
 3   7d Avg         42 non-null     str           
 4   Stress         41 non-null     float64       
 5   RHR            42 non-null     int64         
dtypes: datetime64[us](1), float64(1), int64(1), str(3)
memory usage: 2.1 KB


In [63]:
hrv.columns

Index(['Date', 'Overnight HRV', 'Baseline', '7d Avg', 'Stress', 'RHR'], dtype='str')

In [64]:
hrv.shape

(42, 6)

In [65]:
hrv.describe(include="all")

Unnamed: 0,Date,Overnight HRV,Baseline,7d Avg,Stress,RHR
count,42,42,42,42,41.0,42.0
unique,,14,5,6,,
top,,47ms,45ms - 53ms,46ms,,
freq,,5,18,14,,
mean,2026-01-05 12:00:00,,,,25.804878,42.666667
min,2025-12-16 00:00:00,,,,19.0,40.0
25%,2025-12-26 06:00:00,,,,23.0,42.0
50%,2026-01-05 12:00:00,,,,25.0,42.5
75%,2026-01-15 18:00:00,,,,29.0,43.0
max,2026-01-26 00:00:00,,,,38.0,48.0
