# Garmin Health Data + RouterSense Network Analysis

**Objective:** Analyze correlations between phone usage patterns and physiological responses

**Data Sources:**
- Garmin: Heart rate, stress level, body battery (Nov 18, 2025)
- RouterSense: Network activity, app usage (Nov 18, 2025)

**Analysis Goals:**
1. Load and merge datasets
2. Explore data quality and coverage
3. Calculate correlations
4. Identify patterns and insights
5. Visualize relationships

## 1. Setup and Imports

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import json
import warnings

warnings.filterwarnings('ignore')

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

# Set plot style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

print("‚úì Libraries loaded successfully")

‚úì Libraries loaded successfully


## 2. Load Garmin Health Data

In [3]:
# Load minute-level health data
garmin_minute = pd.read_csv('../output/garmin_parsed/garmin_minute_health.csv')
garmin_minute['datetime'] = pd.to_datetime(garmin_minute['datetime'])

print(f"üìä Garmin Minute-Level Data Loaded")
print(f"   Records: {len(garmin_minute):,}")
print(f"   Time range: {garmin_minute['datetime'].min()} to {garmin_minute['datetime'].max()}")
print(f"\n   Data coverage:")
print(f"   - Heart Rate: {(garmin_minute['heart_rate'].notna().sum() / len(garmin_minute) * 100):.1f}%")
print(f"   - Stress Level: {(garmin_minute['stress_level'].notna().sum() / len(garmin_minute) * 100):.1f}%")
print(f"   - Body Battery: {(garmin_minute['body_battery'].notna().sum() / len(garmin_minute) * 100):.1f}%")

garmin_minute.head(10)

üìä Garmin Minute-Level Data Loaded
   Records: 3,179
   Time range: 2025-11-18 05:00:00+00:00 to 2025-11-19 05:00:00+00:00

   Data coverage:
   - Heart Rate: 36.8%
   - Stress Level: 44.3%
   - Body Battery: 45.2%


Unnamed: 0,datetime,heart_rate,stress_level,body_battery
0,2025-11-18 05:00:00+00:00,,,
1,2025-11-18 05:00:00+00:00,,,
2,2025-11-18 05:01:00+00:00,,,
3,2025-11-18 05:01:00+00:00,,34.0,18.0
4,2025-11-18 05:02:00+00:00,89.0,,
5,2025-11-18 05:02:00+00:00,,61.0,18.0
6,2025-11-18 05:03:00+00:00,99.0,,
7,2025-11-18 05:03:00+00:00,,,
8,2025-11-18 05:03:00+00:00,,65534.0,18.0
9,2025-11-18 05:04:00+00:00,93.0,,


In [4]:
# Load hourly activity data
garmin_activity = pd.read_csv('../output/garmin_parsed/garmin_hourly_activity.csv')
garmin_activity['datetime'] = pd.to_datetime(garmin_activity['datetime'])

print(f"üìä Garmin Hourly Activity Data Loaded")
print(f"   Records: {len(garmin_activity):,}")
print(f"   Time range: {garmin_activity['datetime'].min()} to {garmin_activity['datetime'].max()}")

garmin_activity.head()

üìä Garmin Hourly Activity Data Loaded
   Records: 25
   Time range: 2025-11-18 05:00:00+00:00 to 2025-11-19 05:00:00+00:00


Unnamed: 0,datetime,steps,calories,active_minutes,activity_intensity_avg
0,2025-11-18 05:00:00+00:00,2.01787e-43,0,0,58
1,2025-11-18 06:00:00+00:00,1.601684e-42,20,0,93
2,2025-11-18 07:00:00+00:00,3.0464229999999997e-42,57,0,84
3,2025-11-18 08:00:00+00:00,5.780355999999999e-42,121,0,83
4,2025-11-18 09:00:00+00:00,0.0,0,0,51


## 3. Aggregate Garmin to Hourly

In [5]:
# Aggregate minute health data to hourly
garmin_hourly_health = garmin_minute.groupby(pd.Grouper(key='datetime', freq='H')).agg({
    'heart_rate': ['mean', 'min', 'max', 'std'],
    'stress_level': ['mean', 'min', 'max'],
    'body_battery': ['mean', 'min', 'max']
}).reset_index()

# Flatten column names
garmin_hourly_health.columns = ['datetime', 
                                  'heart_rate_avg', 'heart_rate_min', 'heart_rate_max', 'heart_rate_std',
                                  'stress_avg', 'stress_min', 'stress_max',
                                  'body_battery_avg', 'body_battery_min', 'body_battery_max']

print(f"‚úì Aggregated to hourly: {len(garmin_hourly_health)} hours")
garmin_hourly_health.head()

‚úì Aggregated to hourly: 25 hours


Unnamed: 0,datetime,heart_rate_avg,heart_rate_min,heart_rate_max,heart_rate_std,stress_avg,stress_min,stress_max,body_battery_avg,body_battery_min,body_battery_max
0,2025-11-18 05:00:00+00:00,81.644444,73.0,99.0,5.452921,1207.892857,15.0,65534.0,17.457627,16.0,18.0
1,2025-11-18 06:00:00+00:00,82.981818,76.0,100.0,5.09717,8770.983333,21.0,65534.0,15.4,14.0,16.0
2,2025-11-18 07:00:00+00:00,77.36,70.0,96.0,7.261191,10940.566667,13.0,65534.0,13.516667,13.0,15.0
3,2025-11-18 08:00:00+00:00,71.08,62.0,89.0,7.000991,7660.2,3.0,65534.0,15.266667,15.0,17.0
4,2025-11-18 09:00:00+00:00,71.114286,65.0,81.0,4.357356,12.883333,6.0,40.0,21.3,17.0,26.0


In [None]:
# Merge health and activity data
garmin_hourly = garmin_hourly_health.merge(garmin_activity, on='datetime', how='outer')

print(f"‚úì Combined Garmin data: {len(garmin_hourly)} hourly records")
print(f"\nColumns: {list(garmin_hourly.columns)}")

garmin_hourly.head()

## 4. Load RouterSense Data

In [None]:
# Load RouterSense data
phone_data = pd.read_csv('../data/phone_overall_activities.csv')

print(f"üìä RouterSense Data Loaded")
print(f"   Total records: {len(phone_data):,}")
print(f"\nColumns: {list(phone_data.columns)}")

phone_data.head()

In [None]:
# Parse datetime and metadata
phone_data['Time'] = pd.to_datetime(phone_data['Time'])

# Parse metadata JSON
def parse_metadata(metadata_str):
    try:
        return json.loads(metadata_str) if pd.notna(metadata_str) else {}
    except:
        return {}

phone_data['metadata_dict'] = phone_data['Metadata'].apply(parse_metadata)
phone_data['domain'] = phone_data['metadata_dict'].apply(lambda x: x.get('domain', 'unknown'))
phone_data['app'] = phone_data['metadata_dict'].apply(lambda x: x.get('app', 'unknown'))

print(f"‚úì Parsed metadata")
print(f"   Unique domains: {phone_data['domain'].nunique()}")
print(f"   Unique apps: {phone_data['app'].nunique()}")
print(f"   Time range: {phone_data['Time'].min()} to {phone_data['Time'].max()}")

## 5. Filter RouterSense for Nov 18, 2025

In [None]:
# Filter for November 18, 2025 only
target_date = pd.to_datetime('2025-11-18')
phone_nov18 = phone_data[
    (phone_data['Time'].dt.date == target_date.date())
].copy()

print(f"üìÖ Filtered RouterSense for November 18, 2025")
print(f"   Records: {len(phone_nov18):,} (from {len(phone_data):,} total)")
print(f"   Time range: {phone_nov18['Time'].min()} to {phone_nov18['Time'].max()}")
print(f"   Total data: {phone_nov18['Total (MB)'].sum():.2f} MB")

phone_nov18.head()

## 6. Aggregate RouterSense to Hourly

In [None]:
# Aggregate to hourly
phone_hourly = phone_nov18.groupby(pd.Grouper(key='Time', freq='H')).agg({
    'Total (MB)': 'sum',
    'Sent (MB)': 'sum',
    'Received (MB)': 'sum',
    'domain': 'count',  # Number of connections
    'app': lambda x: x.nunique()  # Unique apps per hour
}).reset_index()

# Rename columns
phone_hourly.columns = ['datetime', 'total_mb', 'sent_mb', 'received_mb', 'connection_count', 'unique_apps']

print(f"‚úì Aggregated RouterSense to hourly: {len(phone_hourly)} hours")
print(f"   Total data: {phone_hourly['total_mb'].sum():.2f} MB")
print(f"   Total connections: {phone_hourly['connection_count'].sum():,}")

phone_hourly.head()

## 7. Merge Garmin + RouterSense Data

In [None]:
# Merge datasets on datetime
combined = phone_hourly.merge(garmin_hourly, on='datetime', how='inner')

# Add time features
combined['hour'] = combined['datetime'].dt.hour
combined['day_of_week'] = combined['datetime'].dt.dayofweek
combined['is_weekend'] = combined['day_of_week'].isin([5, 6])

print(f"üîó Merged Dataset Created")
print(f"   Records: {len(combined)} hours")
print(f"   Date: {combined['datetime'].dt.date.unique()}")
print(f"\nüìä Available Metrics:")
print(f"   Network: total_mb, sent_mb, received_mb, connection_count, unique_apps")
print(f"   Health: heart_rate_avg, stress_avg, body_battery_avg")
print(f"   Activity: steps, calories, active_minutes, activity_intensity_avg")

combined.head(10)

In [None]:
# Data quality check
print("üìä Data Quality Summary\n")
print(combined.describe())

In [None]:
# Check for missing values
print("‚ö†Ô∏è  Missing Values:\n")
missing = combined.isnull().sum()
missing_pct = (missing / len(combined) * 100).round(1)
missing_df = pd.DataFrame({'Count': missing, 'Percentage': missing_pct})
print(missing_df[missing_df['Count'] > 0])

## 8. Exploratory Data Analysis

In [None]:
# Hourly patterns
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Network usage by hour
axes[0, 0].plot(combined['hour'], combined['total_mb'], marker='o', linewidth=2)
axes[0, 0].set_title('Network Usage by Hour', fontsize=14, fontweight='bold')
axes[0, 0].set_xlabel('Hour of Day')
axes[0, 0].set_ylabel('Total MB')
axes[0, 0].grid(True, alpha=0.3)

# Heart rate by hour
axes[0, 1].plot(combined['hour'], combined['heart_rate_avg'], marker='o', color='red', linewidth=2)
axes[0, 1].set_title('Heart Rate by Hour', fontsize=14, fontweight='bold')
axes[0, 1].set_xlabel('Hour of Day')
axes[0, 1].set_ylabel('Heart Rate (BPM)')
axes[0, 1].grid(True, alpha=0.3)

# Stress by hour
axes[1, 0].plot(combined['hour'], combined['stress_avg'], marker='o', color='orange', linewidth=2)
axes[1, 0].set_title('Stress Level by Hour', fontsize=14, fontweight='bold')
axes[1, 0].set_xlabel('Hour of Day')
axes[1, 0].set_ylabel('Stress Level')
axes[1, 0].grid(True, alpha=0.3)

# Body battery by hour
axes[1, 1].plot(combined['hour'], combined['body_battery_avg'], marker='o', color='green', linewidth=2)
axes[1, 1].set_title('Body Battery by Hour', fontsize=14, fontweight='bold')
axes[1, 1].set_xlabel('Hour of Day')
axes[1, 1].set_ylabel('Body Battery')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("‚úì Hourly patterns visualized")

## 9. Correlation Analysis

In [None]:
# Select metrics for correlation
correlation_cols = [
    'total_mb', 'connection_count', 'unique_apps',
    'heart_rate_avg', 'stress_avg', 'body_battery_avg',
    'steps', 'calories', 'active_minutes'
]

# Calculate correlation matrix
corr_matrix = combined[correlation_cols].corr()

# Visualize correlation heatmap
plt.figure(figsize=(12, 10))
sns.heatmap(corr_matrix, annot=True, fmt='.2f', cmap='coolwarm', center=0,
            square=True, linewidths=1, cbar_kws={"shrink": 0.8})
plt.title('Correlation Matrix: Network Usage vs Health Metrics', fontsize=16, fontweight='bold', pad=20)
plt.tight_layout()
plt.show()

print("‚úì Correlation matrix generated")

In [None]:
# Key correlations with network usage
print("üîç Key Correlations with Network Usage (Total MB):\n")
network_corr = corr_matrix['total_mb'].sort_values(ascending=False)
print(network_corr)

## 10. Scatter Plots: Network vs Health

In [None]:
# Create scatter plots
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Network vs Heart Rate
axes[0, 0].scatter(combined['total_mb'], combined['heart_rate_avg'], alpha=0.6, s=100)
axes[0, 0].set_title('Network Usage vs Heart Rate', fontsize=14, fontweight='bold')
axes[0, 0].set_xlabel('Total MB')
axes[0, 0].set_ylabel('Heart Rate (BPM)')
axes[0, 0].grid(True, alpha=0.3)

# Add trend line
z = np.polyfit(combined['total_mb'].dropna(), combined['heart_rate_avg'].dropna(), 1)
p = np.poly1d(z)
axes[0, 0].plot(combined['total_mb'], p(combined['total_mb']), "r--", alpha=0.8, linewidth=2)

# Network vs Stress
axes[0, 1].scatter(combined['total_mb'], combined['stress_avg'], alpha=0.6, s=100, color='orange')
axes[0, 1].set_title('Network Usage vs Stress Level', fontsize=14, fontweight='bold')
axes[0, 1].set_xlabel('Total MB')
axes[0, 1].set_ylabel('Stress Level')
axes[0, 1].grid(True, alpha=0.3)

# Network vs Body Battery
axes[1, 0].scatter(combined['total_mb'], combined['body_battery_avg'], alpha=0.6, s=100, color='green')
axes[1, 0].set_title('Network Usage vs Body Battery', fontsize=14, fontweight='bold')
axes[1, 0].set_xlabel('Total MB')
axes[1, 0].set_ylabel('Body Battery')
axes[1, 0].grid(True, alpha=0.3)

# Connections vs Stress
axes[1, 1].scatter(combined['connection_count'], combined['stress_avg'], alpha=0.6, s=100, color='purple')
axes[1, 1].set_title('Connection Count vs Stress Level', fontsize=14, fontweight='bold')
axes[1, 1].set_xlabel('Connection Count')
axes[1, 1].set_ylabel('Stress Level')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("‚úì Scatter plots generated")

## 11. Time Series Comparison

In [None]:
# Normalize data for comparison
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
normalized = combined[['total_mb', 'heart_rate_avg', 'stress_avg', 'body_battery_avg']].copy()
normalized_scaled = pd.DataFrame(
    scaler.fit_transform(normalized),
    columns=normalized.columns,
    index=combined.index
)

# Plot normalized time series
plt.figure(figsize=(15, 8))
plt.plot(combined['datetime'], normalized_scaled['total_mb'], label='Network Usage', linewidth=2, marker='o')
plt.plot(combined['datetime'], normalized_scaled['heart_rate_avg'], label='Heart Rate', linewidth=2, marker='s')
plt.plot(combined['datetime'], normalized_scaled['stress_avg'], label='Stress Level', linewidth=2, marker='^')
plt.plot(combined['datetime'], normalized_scaled['body_battery_avg'], label='Body Battery', linewidth=2, marker='d')

plt.title('Normalized Time Series: Network Usage vs Health Metrics', fontsize=16, fontweight='bold')
plt.xlabel('Time', fontsize=12)
plt.ylabel('Normalized Value (0-1)', fontsize=12)
plt.legend(fontsize=11, loc='best')
plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

print("‚úì Time series comparison generated")

## 12. Summary Statistics

In [None]:
print("üìä SUMMARY STATISTICS\n")
print("=" * 70)

print("\nüåê Network Usage:")
print(f"   Total data: {combined['total_mb'].sum():.2f} MB")
print(f"   Average per hour: {combined['total_mb'].mean():.2f} MB")
print(f"   Peak hour: {combined.loc[combined['total_mb'].idxmax(), 'datetime']}")
print(f"   Peak usage: {combined['total_mb'].max():.2f} MB")
print(f"   Total connections: {combined['connection_count'].sum():,}")

print("\n‚ù§Ô∏è  Heart Rate:")
print(f"   Average: {combined['heart_rate_avg'].mean():.1f} BPM")
print(f"   Min: {combined['heart_rate_min'].min():.1f} BPM")
print(f"   Max: {combined['heart_rate_max'].max():.1f} BPM")

print("\nüò∞ Stress Level:")
print(f"   Average: {combined['stress_avg'].mean():.1f}")
print(f"   Min: {combined['stress_min'].min():.1f}")
print(f"   Max: {combined['stress_max'].max():.1f}")

print("\nüîã Body Battery:")
print(f"   Average: {combined['body_battery_avg'].mean():.1f}")
print(f"   Min: {combined['body_battery_min'].min():.1f}")
print(f"   Max: {combined['body_battery_max'].max():.1f}")

print("\nüëü Activity:")
print(f"   Total steps: {combined['steps'].sum():,.0f}")
print(f"   Total calories: {combined['calories'].sum():,.0f}")
print(f"   Total active minutes: {combined['active_minutes'].sum():.0f}")

print("\n" + "=" * 70)

## 13. Save Combined Dataset

In [None]:
# Save combined dataset
output_path = '../output/analysis_results/combined_garmin_routersense_nov18.csv'
combined.to_csv(output_path, index=False)

print(f"‚úì Combined dataset saved to: {output_path}")
print(f"   Records: {len(combined)}")
print(f"   Columns: {len(combined.columns)}")

## 14. Key Insights

**Analysis Summary:**

This notebook successfully:
1. ‚úÖ Loaded Garmin health data (heart rate, stress, body battery)
2. ‚úÖ Loaded RouterSense network data
3. ‚úÖ Filtered for November 18, 2025
4. ‚úÖ Aggregated both datasets to hourly
5. ‚úÖ Merged datasets on datetime
6. ‚úÖ Calculated correlations
7. ‚úÖ Generated visualizations

**Next Steps:**
- Add more days of Garmin data for better statistical power
- Analyze specific apps/domains vs health metrics
- Investigate time-lagged correlations
- Add sleep data analysis
- Identify high-stress triggers