# Model Monitoring for Whoop Recovery Prediction System

**Part 2**: Applying Evidently AI for data and model monitoring

This notebook implements:
- **Data Drift**: Input feature distribution monitoring (reference vs current)
- **Regression Performance**: MAE, RMSE, R², error distribution
- **Target Drift**: Recovery score distribution shifts
- **Prediction Drift**: Model output distribution over time

Based on: [Evidently Bicycle Demand Monitoring](https://github.com/evidentlyai/community-examples/blob/main/tutorials/bicycle_demand_monitoring_setup.ipynb)

In [1]:
try:
    import evidently
    print(f"Evidently version: {evidently.__version__}")
except ImportError:
    !pip install evidently
    import evidently

  from pandas.core import (


Evidently version: 0.7.20


In [15]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

# Evidently 0.7+ imports (new API)
from evidently import Report, Dataset, DataDefinition, Regression
from evidently.presets import DataDriftPreset, RegressionPreset

## Load Whoop Data and Create Reference/Current Splits

In [16]:
# Load Whoop dataset
df = pd.read_csv('whoop_fitness_dataset_100k.csv')
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(['user_id', 'date']).reset_index(drop=True)

print(f"Dataset shape: {df.shape}")
print(f"Date range: {df['date'].min()} to {df['date'].max()}")
df.head()

Dataset shape: (100000, 39)
Date range: 2023-01-01 00:00:00 to 2024-02-03 00:00:00


Unnamed: 0,user_id,date,day_of_week,age,gender,weight_kg,height_cm,fitness_level,primary_sport,recovery_score,...,activity_strain,avg_heart_rate,max_heart_rate,activity_calories,hr_zone_1_min,hr_zone_2_min,hr_zone_3_min,hr_zone_4_min,hr_zone_5_min,workout_time_of_day
0,USER_00001,2023-01-01,Sunday,56,Female,53.9,167.2,Beginner,Weight Training,57.8,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,USER_00001,2023-01-02,Monday,56,Female,53.9,167.2,Beginner,Weight Training,62.6,...,7.47,121.2,137.6,371.0,11.9,18.0,15.9,11.6,4.6,Evening
2,USER_00001,2023-01-03,Tuesday,56,Female,53.9,167.2,Beginner,Weight Training,81.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,USER_00001,2023-01-04,Wednesday,56,Female,53.9,167.2,Beginner,Weight Training,55.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,USER_00001,2023-01-05,Thursday,56,Female,53.9,167.2,Beginner,Weight Training,62.5,...,15.27,136.4,152.8,623.0,3.7,6.0,13.7,18.6,9.0,Evening


In [17]:
# Split into reference (training period) and current (production simulation)
# Reference: First 70% of data (simulates training data)
# Current: Last 30% of data (simulates production data)

split_date = df['date'].quantile(0.7)
reference_data = df[df['date'] <= split_date].copy()
current_data = df[df['date'] > split_date].copy()

print(f"Reference data: {len(reference_data)} rows ({reference_data['date'].min()} to {reference_data['date'].max()})")
print(f"Current data: {len(current_data)} rows ({current_data['date'].min()} to {current_data['date'].max()})")

Reference data: 70247 rows (2023-01-01 00:00:00 to 2023-09-03 00:00:00)
Current data: 29753 rows (2023-09-04 00:00:00 to 2024-02-03 00:00:00)


## Train Simple Model for Monitoring Demo

In [18]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

# Feature columns for monitoring
numerical_features = ['day_strain', 'sleep_hours', 'sleep_efficiency', 'hrv', 'resting_heart_rate', 
                     'respiratory_rate', 'activity_duration_min', 'activity_strain']
categorical_features = ['day_of_week', 'activity_type']
target = 'recovery_score'
prediction = 'prediction'

# Prepare features (encode categorical)
ref_model = reference_data[numerical_features].copy()
ref_model['day_of_week'] = reference_data['day_of_week'].astype('category').cat.codes
ref_model['activity_type'] = reference_data['activity_type'].astype('category').cat.codes
ref_model['target'] = reference_data[target]

# Train model
model = RandomForestRegressor(n_estimators=50, random_state=42)
model.fit(ref_model[numerical_features + ['day_of_week', 'activity_type']], ref_model['target'])

# Get predictions
ref_model['prediction'] = model.predict(ref_model[numerical_features + ['day_of_week', 'activity_type']])

# Current data with predictions
curr_model = current_data[numerical_features].copy()
curr_model['day_of_week'] = current_data['day_of_week'].astype('category').cat.codes
curr_model['activity_type'] = current_data['activity_type'].astype('category').cat.codes
curr_model['target'] = current_data[target]
curr_model['prediction'] = model.predict(curr_model[numerical_features + ['day_of_week', 'activity_type']])

# Evidently 0.7: Create Dataset objects with DataDefinition for reports
feature_cols = numerical_features + ['day_of_week', 'activity_type']
data_def = DataDefinition(
    regression=[Regression(target="target", prediction="prediction")],
    numerical_columns=feature_cols,
)
ref_dataset = Dataset.from_pandas(ref_model, data_definition=data_def)
curr_dataset = Dataset.from_pandas(curr_model, data_definition=data_def)

print("Model trained. Sample predictions:")
print(ref_model[['target', 'prediction']].head(10))

Model trained. Sample predictions:
   target  prediction
0    57.8      62.622
1    62.6      65.968
2    81.5      78.356
3    55.0      59.998
4    62.5      67.154
5    49.9      54.640
6    50.0      52.358
7    64.2      67.944
8    56.2      58.848
9    72.1      71.684


## 1. Regression Performance Report

In [20]:
regression_report = Report(metrics=[RegressionPreset()])
regression_report.run(current_data=curr_dataset, reference_data=ref_dataset)
regression_report

<evidently.core.report.Report at 0x16d6ce7d0>

## 2. Data Drift Report

In [21]:
drift_report = Report(metrics=[DataDriftPreset()])
drift_report.run(current_data=curr_dataset, reference_data=ref_dataset)
drift_report

<evidently.core.report.Report at 0x1672d01d0>

## 3. Combined Report: Performance + Drift

In [22]:
combined_report = Report(metrics=[RegressionPreset(), DataDriftPreset()])
combined_report.run(current_data=curr_dataset, reference_data=ref_dataset)
combined_report

<evidently.core.report.Report at 0x16d619350>

## 4. Simulate Time-Based Monitoring (Weekly Windows)

In [23]:
# Split current data into weekly windows (simulating production over time)
current_data['week'] = current_data['date'].dt.isocalendar().week
weeks = sorted(current_data['week'].unique())

print(f"Weeks in current period: {weeks}")

# Run monitoring for each week
for week in weeks[:3]:  # First 3 weeks as example
    week_data = current_data[current_data['week'] == week].copy()
    week_model = week_data[numerical_features].copy()
    week_model['day_of_week'] = week_data['day_of_week'].astype('category').cat.codes
    week_model['activity_type'] = week_data['activity_type'].astype('category').cat.codes
    week_model['target'] = week_data[target]
    week_model['prediction'] = model.predict(week_model[numerical_features + ['day_of_week', 'activity_type']])
    
    week_dataset = Dataset.from_pandas(week_model, data_definition=data_def)
    week_report = Report(metrics=[RegressionPreset(), DataDriftPreset()])
    week_report.run(current_data=week_dataset, reference_data=ref_dataset)
    print(f"\n--- Week {week} Report ---")
    week_report

Weeks in current period: [1, 2, 3, 4, 5, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52]

--- Week 1 Report ---

--- Week 2 Report ---

--- Week 3 Report ---


## 5. Save Report for Dashboard

In [24]:
# Save report as HTML for sharing (Evidently 0.7: run() returns Snapshot with save_html)
report_result = combined_report.run(current_data=curr_dataset, reference_data=ref_dataset)
report_result.save_html("whoop_monitoring_report.html")
print("Report saved to: whoop_monitoring_report.html")
print("Open in browser to view interactive dashboard.")

Report saved to: whoop_monitoring_report.html
Open in browser to view interactive dashboard.


## Summary

This monitoring setup provides:
- **I/O Distribution Decay**: DataDriftPreset detects input feature distribution shifts
- **Model Outcome Drift**: RegressionPreset tracks MAE, RMSE, R², error distribution
- **Anomalies**: Evidently flags significant drift and performance degradation

For production: Run weekly/daily, log metrics to MLflow, set up alerts when drift share > 0.5 or MAE > 15.