# 🏥 Health Tracker Data Analysis Notebook

**Comprehensive Health Data Analysis and Visualization**

This notebook provides detailed analysis of health tracking data including:
- Daily health metrics trends
- Wearable device data visualization
- Progress tracking over time
- Health insights and recommendations

**Tech Stack:** Jupyter Notebook, Pandas, NumPy, Matplotlib, Seaborn

---

## 📦 Import Required Libraries

In [None]:
# Core data science libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta, date
import warnings
warnings.filterwarnings('ignore')

# Statistical analysis
from scipy import stats
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.linear_model import LinearRegression

# Database connection
import sqlite3
import json

# Configure matplotlib for better plots
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 12

print("✅ Libraries imported successfully!")
print(f"📊 Pandas version: {pd.__version__}")
print(f"🔢 NumPy version: {np.__version__}")
print(f"📈 Matplotlib version: {plt.matplotlib.__version__}")

## 🗄️ Data Loading and Initial Analysis

In [None]:
def load_health_data(db_path='health_tracker.db', user_id=1, days=90):
    """
    Load health data from SQLite database
    Returns combined DataFrame with manual entries and wearable data
    """
    conn = sqlite3.connect(db_path)
    
    # Calculate date range
    end_date = datetime.now().date()
    start_date = end_date - timedelta(days=days)
    
    # Query manual health metrics
    manual_query = """
    SELECT 
        DATE(recorded_at) as date,
        metric_type,
        value,
        value_secondary,
        unit,
        notes
    FROM health_metric 
    WHERE user_id = ? AND DATE(recorded_at) >= ?
    ORDER BY recorded_at
    """
    
    manual_df = pd.read_sql_query(manual_query, conn, params=[user_id, start_date])
    
    # Query wearable data
    wearable_query = """
    SELECT 
        date_recorded as date,
        device_type,
        data_type,
        value,
        unit
    FROM wearable_data 
    WHERE user_id = ? AND date_recorded >= ?
    ORDER BY date_recorded
    """
    
    wearable_df = pd.read_sql_query(wearable_query, conn, params=[user_id, start_date])
    
    conn.close()
    
    return manual_df, wearable_df

def transform_health_data(manual_df, wearable_df):
    """
    Transform health data into analysis-ready format
    """
    # Process manual data
    manual_pivot = manual_df.pivot_table(
        index='date', 
        columns='metric_type', 
        values='value', 
        aggfunc='mean'
    ).reset_index()
    
    # Process wearable data
    wearable_pivot = wearable_df.pivot_table(
        index='date', 
        columns='data_type', 
        values='value', 
        aggfunc='mean'
    ).reset_index()
    
    # Merge datasets
    if not manual_pivot.empty and not wearable_pivot.empty:
        health_data = pd.merge(manual_pivot, wearable_pivot, on='date', how='outer')
    elif not manual_pivot.empty:
        health_data = manual_pivot
    else:
        health_data = wearable_pivot
    
    # Convert date column
    health_data['date'] = pd.to_datetime(health_data['date'])
    health_data = health_data.sort_values('date').reset_index(drop=True)
    
    return health_data

# Load and transform data
try:
    manual_df, wearable_df = load_health_data()
    health_data = transform_health_data(manual_df, wearable_df)
    
    print(f"✅ Data loaded successfully!")
    print(f"📅 Date range: {health_data['date'].min()} to {health_data['date'].max()}")
    print(f"📊 Total records: {len(health_data)}")
    print(f"📈 Available metrics: {list(health_data.columns[1:])}")
    
    # Display first few rows
    display(health_data.head())
    
except Exception as e:
    print(f"⚠️ Database not found or empty. Creating sample data for demonstration...")
    
    # Create sample data for demonstration
    dates = pd.date_range(start=datetime.now() - timedelta(days=90), end=datetime.now(), freq='D')
    
    np.random.seed(42)  # For reproducible results
    
    health_data = pd.DataFrame({
        'date': dates,
        'steps': np.random.normal(8500, 2500, len(dates)).astype(int),
        'heart_rate': np.random.normal(75, 12, len(dates)).astype(int),
        'sleep_hours': np.round(np.random.normal(7.5, 1.2, len(dates)), 1),
        'weight': np.round(np.random.normal(70, 2, len(dates)) + np.linspace(0, -1, len(dates)), 1),
        'calories_burned': np.random.normal(2200, 400, len(dates)).astype(int),
        'active_minutes': np.random.normal(45, 20, len(dates)).astype(int)
    })
    
    # Ensure realistic bounds
    health_data['steps'] = health_data['steps'].clip(1000, 20000)
    health_data['heart_rate'] = health_data['heart_rate'].clip(50, 120)
    health_data['sleep_hours'] = health_data['sleep_hours'].clip(4.0, 12.0)
    health_data['weight'] = health_data['weight'].clip(50, 100)
    health_data['calories_burned'] = health_data['calories_burned'].clip(1500, 3500)
    health_data['active_minutes'] = health_data['active_minutes'].clip(0, 180)
    
    print(f"📊 Sample data created with {len(health_data)} records")
    display(health_data.head())

## 📊 Exploratory Data Analysis (EDA)

In [None]:
# Data overview and statistics
print("📋 HEALTH DATA SUMMARY")
print("=" * 50)

# Basic info
print(f"Total days tracked: {len(health_data)}")
print(f"Date range: {health_data['date'].min().strftime('%Y-%m-%d')} to {health_data['date'].max().strftime('%Y-%m-%d')}")
print("\n📈 DESCRIPTIVE STATISTICS")
print("=" * 30)

# Select numeric columns for analysis
numeric_cols = health_data.select_dtypes(include=[np.number]).columns
display(health_data[numeric_cols].describe().round(2))

# Missing data analysis
print("\n❌ MISSING DATA ANALYSIS")
print("=" * 30)
missing_data = health_data.isnull().sum()
missing_percent = (missing_data / len(health_data)) * 100
missing_df = pd.DataFrame({
    'Missing Count': missing_data,
    'Missing %': missing_percent.round(2)
})
display(missing_df[missing_df['Missing Count'] > 0])

# Data quality checks
print("\n🔍 DATA QUALITY CHECKS")
print("=" * 30)

quality_checks = []
if 'steps' in health_data.columns:
    avg_steps = health_data['steps'].mean()
    quality_checks.append(f"Average daily steps: {avg_steps:.0f}")
    quality_checks.append(f"Days with 10k+ steps: {(health_data['steps'] >= 10000).sum()}")

if 'sleep_hours' in health_data.columns:
    avg_sleep = health_data['sleep_hours'].mean()
    quality_checks.append(f"Average sleep hours: {avg_sleep:.1f}")
    quality_checks.append(f"Nights with 7-9 hours: {((health_data['sleep_hours'] >= 7) & (health_data['sleep_hours'] <= 9)).sum()}")

if 'heart_rate' in health_data.columns:
    avg_hr = health_data['heart_rate'].mean()
    quality_checks.append(f"Average resting heart rate: {avg_hr:.0f} BPM")

for check in quality_checks:
    print(f"• {check}")

## 📈 Time Series Analysis and Trends

In [None]:
# Create comprehensive time series visualizations
fig, axes = plt.subplots(2, 2, figsize=(18, 12))
fig.suptitle('Health Metrics Time Series Analysis', fontsize=16, fontweight='bold')

# 1. Daily Steps Trend
if 'steps' in health_data.columns:
    axes[0, 0].plot(health_data['date'], health_data['steps'], marker='o', linewidth=2, markersize=4, alpha=0.7)
    axes[0, 0].axhline(y=10000, color='red', linestyle='--', alpha=0.8, label='10K Goal')
    
    # Add 7-day rolling average
    health_data['steps_7day'] = health_data['steps'].rolling(window=7, center=True).mean()
    axes[0, 0].plot(health_data['date'], health_data['steps_7day'], color='orange', linewidth=3, label='7-day Average')
    
    axes[0, 0].set_title('Daily Steps Trend', fontsize=14, fontweight='bold')
    axes[0, 0].set_ylabel('Steps')
    axes[0, 0].legend()
    axes[0, 0].grid(True, alpha=0.3)

# 2. Sleep Pattern Analysis
if 'sleep_hours' in health_data.columns:
    # Bar plot for sleep hours
    colors = ['red' if x < 7 else 'green' if x <= 9 else 'orange' for x in health_data['sleep_hours']]
    axes[0, 1].bar(health_data['date'], health_data['sleep_hours'], color=colors, alpha=0.6, width=1)
    axes[0, 1].axhline(y=7, color='green', linestyle='--', alpha=0.8, label='Min Recommended')
    axes[0, 1].axhline(y=9, color='green', linestyle='--', alpha=0.8, label='Max Recommended')
    axes[0, 1].set_title('Sleep Hours Pattern', fontsize=14, fontweight='bold')
    axes[0, 1].set_ylabel('Hours')
    axes[0, 1].legend()
    axes[0, 1].grid(True, alpha=0.3)

# 3. Heart Rate Variability
if 'heart_rate' in health_data.columns:
    axes[1, 0].plot(health_data['date'], health_data['heart_rate'], marker='s', linewidth=2, markersize=3, color='crimson')
    
    # Add normal range bands
    axes[1, 0].axhspan(60, 100, alpha=0.2, color='green', label='Normal Range')
    
    # 7-day rolling average
    health_data['hr_7day'] = health_data['heart_rate'].rolling(window=7, center=True).mean()
    axes[1, 0].plot(health_data['date'], health_data['hr_7day'], color='darkred', linewidth=3, label='7-day Average')
    
    axes[1, 0].set_title('Resting Heart Rate Trend', fontsize=14, fontweight='bold')
    axes[1, 0].set_ylabel('BPM')
    axes[1, 0].legend()
    axes[1, 0].grid(True, alpha=0.3)

# 4. Weight Progress (if available) or Calories Burned
if 'weight' in health_data.columns and health_data['weight'].notna().sum() > 10:
    weight_data = health_data.dropna(subset=['weight'])
    axes[1, 1].plot(weight_data['date'], weight_data['weight'], marker='D', linewidth=2, markersize=4, color='green')
    
    # Add trend line
    x_numeric = np.arange(len(weight_data))
    z = np.polyfit(x_numeric, weight_data['weight'], 1)
    p = np.poly1d(z)
    axes[1, 1].plot(weight_data['date'], p(x_numeric), "--", color='red', alpha=0.8, linewidth=2, label='Trend Line')
    
    axes[1, 1].set_title('Weight Progress', fontsize=14, fontweight='bold')
    axes[1, 1].set_ylabel('Weight (kg)')
    axes[1, 1].legend()
    
elif 'calories_burned' in health_data.columns:
    axes[1, 1].plot(health_data['date'], health_data['calories_burned'], marker='v', linewidth=2, markersize=3, color='purple')
    axes[1, 1].set_title('Daily Calories Burned', fontsize=14, fontweight='bold')
    axes[1, 1].set_ylabel('Calories')

# Format x-axis for all subplots
for ax in axes.flat:
    ax.tick_params(axis='x', rotation=45)
    
plt.tight_layout()
plt.show()

# Calculate and display trend analysis
print("\n📊 TREND ANALYSIS SUMMARY")
print("=" * 40)

trend_analysis = {}

for metric in ['steps', 'heart_rate', 'sleep_hours', 'weight']:
    if metric in health_data.columns and health_data[metric].notna().sum() > 10:
        data = health_data[metric].dropna()
        x = np.arange(len(data))
        slope, intercept, r_value, p_value, std_err = stats.linregress(x, data)
        
        trend_direction = "Increasing" if slope > 0 else "Decreasing" if slope < 0 else "Stable"
        significance = "Significant" if p_value < 0.05 else "Not significant"
        
        trend_analysis[metric] = {
            'slope': slope,
            'direction': trend_direction,
            'r_squared': r_value**2,
            'significance': significance
        }
        
        print(f"\n{metric.upper()}:")
        print(f"  • Trend: {trend_direction} ({slope:.4f} per day)")
        print(f"  • R²: {r_value**2:.3f}")
        print(f"  • Statistical significance: {significance}")

# Weekly pattern analysis
print("\n📅 WEEKLY PATTERN ANALYSIS")
print("=" * 35)

health_data['day_of_week'] = health_data['date'].dt.day_name()
health_data['week_number'] = health_data['date'].dt.isocalendar().week

if 'steps' in health_data.columns:
    weekly_steps = health_data.groupby('day_of_week')['steps'].mean().reindex(
        ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
    )
    print("\nAverage steps by day of week:")
    for day, steps in weekly_steps.items():
        print(f"  {day}: {steps:.0f} steps")

## 🎯 Goal Achievement Analysis

In [None]:
# Define health goals
HEALTH_GOALS = {
    'daily_steps': 10000,
    'sleep_hours_min': 7,
    'sleep_hours_max': 9,
    'max_resting_hr': 100,
    'min_active_minutes': 30
}

# Calculate goal achievements
goal_achievements = {}

# Steps goal
if 'steps' in health_data.columns:
    steps_goal_met = health_data['steps'] >= HEALTH_GOALS['daily_steps']
    goal_achievements['Steps (10K+)'] = {
        'achieved_days': steps_goal_met.sum(),
        'total_days': len(health_data['steps'].dropna()),
        'percentage': (steps_goal_met.sum() / len(health_data['steps'].dropna())) * 100
    }

# Sleep goal
if 'sleep_hours' in health_data.columns:
    sleep_goal_met = ((health_data['sleep_hours'] >= HEALTH_GOALS['sleep_hours_min']) & 
                      (health_data['sleep_hours'] <= HEALTH_GOALS['sleep_hours_max']))
    goal_achievements['Sleep (7-9h)'] = {
        'achieved_days': sleep_goal_met.sum(),
        'total_days': len(health_data['sleep_hours'].dropna()),
        'percentage': (sleep_goal_met.sum() / len(health_data['sleep_hours'].dropna())) * 100
    }

# Heart rate goal (staying within normal range)
if 'heart_rate' in health_data.columns:
    hr_goal_met = health_data['heart_rate'] <= HEALTH_GOALS['max_resting_hr']
    goal_achievements['Heart Rate (<100)'] = {
        'achieved_days': hr_goal_met.sum(),
        'total_days': len(health_data['heart_rate'].dropna()),
        'percentage': (hr_goal_met.sum() / len(health_data['heart_rate'].dropna())) * 100
    }

# Create goal achievement visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# Bar chart of goal achievement percentages
goals = list(goal_achievements.keys())
percentages = [goal_achievements[goal]['percentage'] for goal in goals]
colors = ['green' if p >= 80 else 'orange' if p >= 60 else 'red' for p in percentages]

bars = ax1.bar(goals, percentages, color=colors, alpha=0.7)
ax1.set_title('Goal Achievement Rates', fontsize=14, fontweight='bold')
ax1.set_ylabel('Achievement Percentage (%)')
ax1.set_ylim(0, 100)
ax1.axhline(y=80, color='green', linestyle='--', alpha=0.5, label='Excellent (80%+)')
ax1.axhline(y=60, color='orange', linestyle='--', alpha=0.5, label='Good (60%+)')

# Add percentage labels on bars
for bar, pct in zip(bars, percentages):
    ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 1, 
             f'{pct:.1f}%', ha='center', va='bottom', fontweight='bold')

ax1.legend()
ax1.grid(True, alpha=0.3)

# Goal achievement over time (steps example)
if 'steps' in health_data.columns:
    health_data['steps_goal_met'] = health_data['steps'] >= HEALTH_GOALS['daily_steps']
    
    # Calculate rolling 7-day achievement rate
    health_data['goal_rate_7day'] = health_data['steps_goal_met'].rolling(window=7, center=True).mean() * 100
    
    ax2.plot(health_data['date'], health_data['goal_rate_7day'], linewidth=3, color='blue', label='7-day Achievement Rate')
    ax2.axhline(y=80, color='green', linestyle='--', alpha=0.5, label='Target (80%)')
    ax2.fill_between(health_data['date'], health_data['goal_rate_7day'], alpha=0.3, color='blue')
    
    ax2.set_title('Steps Goal Achievement Over Time', fontsize=14, fontweight='bold')
    ax2.set_ylabel('Achievement Rate (%)')
    ax2.set_ylim(0, 100)
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    ax2.tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

# Print detailed goal analysis
print("\n🎯 DETAILED GOAL ANALYSIS")
print("=" * 35)

for goal_name, achievement in goal_achievements.items():
    print(f"\n{goal_name}:")
    print(f"  ✅ Achieved: {achievement['achieved_days']}/{achievement['total_days']} days")
    print(f"  📊 Success rate: {achievement['percentage']:.1f}%")
    
    if achievement['percentage'] >= 80:
        print(f"  🏆 Status: Excellent! Keep up the great work!")
    elif achievement['percentage'] >= 60:
        print(f"  👍 Status: Good progress, room for improvement")
    else:
        print(f"  📈 Status: Needs attention and improvement")

## 🔗 Correlation Analysis

In [None]:
# Correlation analysis between health metrics
numeric_cols = health_data.select_dtypes(include=[np.number]).columns
correlation_matrix = health_data[numeric_cols].corr()

# Create correlation heatmap
plt.figure(figsize=(12, 8))
mask = np.triu(correlation_matrix, k=1)
sns.heatmap(correlation_matrix, 
            mask=mask,
            annot=True, 
            cmap='RdBu_r', 
            center=0, 
            square=True, 
            fmt='.2f',
            cbar_kws={'shrink': 0.8})
plt.title('Health Metrics Correlation Matrix', fontsize=16, fontweight='bold', pad=20)
plt.tight_layout()
plt.show()

# Find strong correlations
print("\n🔗 STRONG CORRELATIONS FOUND")
print("=" * 35)

strong_correlations = []
for i in range(len(correlation_matrix.columns)):
    for j in range(i+1, len(correlation_matrix.columns)):
        corr_value = correlation_matrix.iloc[i, j]
        if abs(corr_value) > 0.5:  # Strong correlation threshold
            metric1 = correlation_matrix.columns[i]
            metric2 = correlation_matrix.columns[j]
            strong_correlations.append((metric1, metric2, corr_value))

if strong_correlations:
    for metric1, metric2, corr in sorted(strong_correlations, key=lambda x: abs(x[2]), reverse=True):
        direction = "Positive" if corr > 0 else "Negative"
        strength = "Very Strong" if abs(corr) > 0.8 else "Strong" if abs(corr) > 0.6 else "Moderate"
        print(f"  • {metric1} ↔ {metric2}: {corr:.3f} ({direction} {strength})")
else:
    print("  No strong correlations found (|r| > 0.5)")

# Scatter plots for interesting correlations
if len(strong_correlations) > 0:
    fig, axes = plt.subplots(1, min(3, len(strong_correlations)), figsize=(15, 5))
    if len(strong_correlations) == 1:
        axes = [axes]
    
    for idx, (metric1, metric2, corr) in enumerate(strong_correlations[:3]):
        valid_data = health_data.dropna(subset=[metric1, metric2])
        
        if len(axes) > idx:
            axes[idx].scatter(valid_data[metric1], valid_data[metric2], alpha=0.6, s=50)
            axes[idx].set_xlabel(metric1.replace('_', ' ').title())
            axes[idx].set_ylabel(metric2.replace('_', ' ').title())
            axes[idx].set_title(f'{metric1} vs {metric2}\nr = {corr:.3f}', fontweight='bold')
            
            # Add trend line
            z = np.polyfit(valid_data[metric1], valid_data[metric2], 1)
            p = np.poly1d(z)
            axes[idx].plot(valid_data[metric1], p(valid_data[metric1]), "--", color='red', alpha=0.8)
            axes[idx].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

## 🏃‍♀️ Activity Pattern Analysis

In [None]:
# Activity level categorization and analysis
if 'steps' in health_data.columns:
    # Categorize activity levels
    def categorize_activity(steps):
        if steps < 5000:
            return 'Sedentary'
        elif steps < 7500:
            return 'Low Active'
        elif steps < 10000:
            return 'Somewhat Active'
        elif steps < 12500:
            return 'Active'
        else:
            return 'Highly Active'
    
    health_data['activity_level'] = health_data['steps'].apply(categorize_activity)
    
    # Create activity analysis visualization
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(16, 12))
    fig.suptitle('Activity Pattern Analysis', fontsize=16, fontweight='bold')
    
    # 1. Activity level distribution
    activity_counts = health_data['activity_level'].value_counts()
    colors = ['red', 'orange', 'yellow', 'lightgreen', 'green']
    ax1.pie(activity_counts.values, labels=activity_counts.index, autopct='%1.1f%%', 
            colors=colors[:len(activity_counts)], startangle=90)
    ax1.set_title('Activity Level Distribution', fontweight='bold')
    
    # 2. Steps by day of week
    day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
    daily_steps = health_data.groupby('day_of_week')['steps'].agg(['mean', 'std']).reindex(day_order)
    
    ax2.bar(daily_steps.index, daily_steps['mean'], yerr=daily_steps['std'], 
            capsize=5, alpha=0.7, color='skyblue')
    ax2.axhline(y=10000, color='red', linestyle='--', alpha=0.8, label='10K Goal')
    ax2.set_title('Average Steps by Day of Week', fontweight='bold')
    ax2.set_ylabel('Steps')
    ax2.legend()
    ax2.tick_params(axis='x', rotation=45)
    ax2.grid(True, alpha=0.3)
    
    # 3. Activity consistency over time (weekly averages)
    health_data['week_start'] = health_data['date'] - pd.to_timedelta(health_data['date'].dt.dayofweek, unit='d')
    weekly_avg = health_data.groupby('week_start')['steps'].mean()
    
    ax3.plot(weekly_avg.index, weekly_avg.values, marker='o', linewidth=2, markersize=6)
    ax3.axhline(y=10000, color='red', linestyle='--', alpha=0.8, label='10K Goal')
    ax3.fill_between(weekly_avg.index, weekly_avg.values, alpha=0.3)
    ax3.set_title('Weekly Average Steps Trend', fontweight='bold')
    ax3.set_ylabel('Average Steps')
    ax3.legend()
    ax3.grid(True, alpha=0.3)
    ax3.tick_params(axis='x', rotation=45)
    
    # 4. Steps distribution histogram
    ax4.hist(health_data['steps'], bins=20, alpha=0.7, color='lightcoral', edgecolor='black')
    ax4.axvline(x=health_data['steps'].mean(), color='blue', linestyle='--', 
                linewidth=2, label=f'Average: {health_data["steps"].mean():.0f}')
    ax4.axvline(x=10000, color='red', linestyle='--', linewidth=2, label='10K Goal')
    ax4.set_title('Steps Distribution', fontweight='bold')
    ax4.set_xlabel('Daily Steps')
    ax4.set_ylabel('Frequency')
    ax4.legend()
    ax4.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Activity statistics
    print("\n🏃‍♀️ ACTIVITY PATTERN INSIGHTS")
    print("=" * 40)
    
    print(f"Average daily steps: {health_data['steps'].mean():.0f}")
    print(f"Most active day: {daily_steps['mean'].idxmax()} ({daily_steps['mean'].max():.0f} steps)")
    print(f"Least active day: {daily_steps['mean'].idxmin()} ({daily_steps['mean'].min():.0f} steps)")
    print(f"Steps consistency (CV): {(health_data['steps'].std() / health_data['steps'].mean()):.2f}")
    
    # Weekend vs weekday analysis
    health_data['is_weekend'] = health_data['date'].dt.dayofweek >= 5
    weekend_avg = health_data[health_data['is_weekend']]['steps'].mean()
    weekday_avg = health_data[~health_data['is_weekend']]['steps'].mean()
    
    print(f"\nWeekend vs Weekday:")
    print(f"  Weekend average: {weekend_avg:.0f} steps")
    print(f"  Weekday average: {weekday_avg:.0f} steps")
    print(f"  Difference: {abs(weekend_avg - weekday_avg):.0f} steps")
    
    if weekend_avg > weekday_avg:
        print(f"  📈 You're more active on weekends!")
    else:
        print(f"  📈 You're more active on weekdays!")

## 😴 Sleep Quality Analysis

In [None]:
if 'sleep_hours' in health_data.columns:
    # Sleep quality categorization
    def categorize_sleep(hours):
        if hours < 6:
            return 'Poor (< 6h)'
        elif hours < 7:
            return 'Fair (6-7h)'
        elif hours <= 9:
            return 'Good (7-9h)'
        else:
            return 'Excessive (> 9h)'
    
    health_data['sleep_quality'] = health_data['sleep_hours'].apply(categorize_sleep)
    
    # Sleep analysis visualization
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(16, 12))
    fig.suptitle('Sleep Quality Analysis', fontsize=16, fontweight='bold')
    
    # 1. Sleep quality distribution
    sleep_counts = health_data['sleep_quality'].value_counts()
    colors = ['red', 'orange', 'green', 'blue']
    ax1.pie(sleep_counts.values, labels=sleep_counts.index, autopct='%1.1f%%', 
            colors=colors[:len(sleep_counts)], startangle=90)
    ax1.set_title('Sleep Quality Distribution', fontweight='bold')
    
    # 2. Sleep hours by day of week
    daily_sleep = health_data.groupby('day_of_week')['sleep_hours'].agg(['mean', 'std']).reindex(day_order)
    
    ax2.bar(daily_sleep.index, daily_sleep['mean'], yerr=daily_sleep['std'], 
            capsize=5, alpha=0.7, color='purple')
    ax2.axhspan(7, 9, alpha=0.2, color='green', label='Recommended Range')
    ax2.set_title('Average Sleep Hours by Day', fontweight='bold')
    ax2.set_ylabel('Hours')
    ax2.legend()
    ax2.tick_params(axis='x', rotation=45)
    ax2.grid(True, alpha=0.3)
    
    # 3. Sleep trend over time
    ax3.plot(health_data['date'], health_data['sleep_hours'], marker='o', 
             linewidth=1, markersize=4, alpha=0.7, color='indigo')
    
    # Add 7-day rolling average
    health_data['sleep_7day'] = health_data['sleep_hours'].rolling(window=7, center=True).mean()
    ax3.plot(health_data['date'], health_data['sleep_7day'], color='red', linewidth=3, label='7-day Average')
    
    ax3.axhspan(7, 9, alpha=0.2, color='green', label='Recommended')
    ax3.set_title('Sleep Hours Trend', fontweight='bold')
    ax3.set_ylabel('Hours')
    ax3.legend()
    ax3.grid(True, alpha=0.3)
    ax3.tick_params(axis='x', rotation=45)
    
    # 4. Sleep hours histogram
    ax4.hist(health_data['sleep_hours'], bins=15, alpha=0.7, color='teal', edgecolor='black')
    ax4.axvline(x=health_data['sleep_hours'].mean(), color='red', linestyle='--', 
                linewidth=2, label=f'Average: {health_data["sleep_hours"].mean():.1f}h')
    ax4.axvspan(7, 9, alpha=0.3, color='green', label='Recommended Range')
    ax4.set_title('Sleep Hours Distribution', fontweight='bold')
    ax4.set_xlabel('Hours')
    ax4.set_ylabel('Frequency')
    ax4.legend()
    ax4.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Sleep statistics
    print("\n😴 SLEEP QUALITY INSIGHTS")
    print("=" * 35)
    
    avg_sleep = health_data['sleep_hours'].mean()
    optimal_nights = ((health_data['sleep_hours'] >= 7) & (health_data['sleep_hours'] <= 9)).sum()
    total_nights = len(health_data['sleep_hours'].dropna())
    
    print(f"Average nightly sleep: {avg_sleep:.1f} hours")
    print(f"Optimal sleep nights: {optimal_nights}/{total_nights} ({(optimal_nights/total_nights)*100:.1f}%)")
    print(f"Sleep consistency (std): ±{health_data['sleep_hours'].std():.1f} hours")
    
    # Weekend vs weekday sleep
    weekend_sleep = health_data[health_data['is_weekend']]['sleep_hours'].mean()
    weekday_sleep = health_data[~health_data['is_weekend']]['sleep_hours'].mean()
    
    print(f"\nWeekend vs Weekday Sleep:")
    print(f"  Weekend average: {weekend_sleep:.1f} hours")
    print(f"  Weekday average: {weekday_sleep:.1f} hours")
    print(f"  Difference: {abs(weekend_sleep - weekday_sleep):.1f} hours")
    
    # Sleep recommendations
    print("\n💡 SLEEP RECOMMENDATIONS:")
    if avg_sleep < 7:
        print(f"  🛏️ Try to increase sleep duration - aim for 7-9 hours")
    elif avg_sleep > 9:
        print(f"  ⏰ Consider reducing sleep time - 7-9 hours is optimal")
    else:
        print(f"  ✅ Great sleep duration! Keep it up!")
    
    if health_data['sleep_hours'].std() > 1.5:
        print(f"  📅 Work on sleep consistency - try to sleep and wake at regular times")
    
    if weekend_sleep > weekday_sleep + 1:
        print(f"  🔄 You're catching up on sleep on weekends - consider earlier weekday bedtime")

## 💡 Health Insights and Recommendations

In [None]:
# Comprehensive health insights generator
def generate_health_insights(data):
    insights = []
    recommendations = []
    
    # Activity insights
    if 'steps' in data.columns:
        avg_steps = data['steps'].mean()
        step_trend = 'increasing' if data['steps'].iloc[-7:].mean() > data['steps'].iloc[:7].mean() else 'decreasing'
        
        insights.append(f"📊 Your average daily steps: {avg_steps:.0f}")
        insights.append(f"📈 Recent trend: {step_trend}")
        
        if avg_steps < 5000:
            recommendations.append("🚶‍♀️ PRIORITY: Increase daily activity - start with 10-minute walks")
        elif avg_steps < 8000:
            recommendations.append("🎯 Good progress! Try to reach 8,000+ steps daily")
        elif avg_steps < 10000:
            recommendations.append("💪 Almost there! Add 15-20 minutes of walking to hit 10K")
        else:
            recommendations.append("🏆 Excellent activity level! Consider varying your activities")
    
    # Sleep insights
    if 'sleep_hours' in data.columns:
        avg_sleep = data['sleep_hours'].mean()
        sleep_consistency = data['sleep_hours'].std()
        
        insights.append(f"😴 Your average sleep: {avg_sleep:.1f} hours")
        insights.append(f"📊 Sleep consistency: ±{sleep_consistency:.1f} hours variation")
        
        if avg_sleep < 7:
            recommendations.append("🛏️ PRIORITY: Increase sleep duration - aim for 7-9 hours")
        elif avg_sleep > 9:
            recommendations.append("⏰ Consider if you're oversleeping - 7-9 hours is optimal")
        
        if sleep_consistency > 1.5:
            recommendations.append("📅 Improve sleep schedule consistency - regular bedtime helps")
    
    # Heart rate insights
    if 'heart_rate' in data.columns:
        avg_hr = data['heart_rate'].mean()
        hr_variability = data['heart_rate'].std()
        
        insights.append(f"❤️ Average resting heart rate: {avg_hr:.0f} BPM")
        
        if avg_hr > 90:
            recommendations.append("❤️ Consider stress management - elevated resting HR detected")
        elif avg_hr < 60:
            recommendations.append("💪 Great cardiovascular fitness indicated by low resting HR")
    
    # Weight insights (if available)
    if 'weight' in data.columns and data['weight'].notna().sum() > 7:
        weight_change = data['weight'].iloc[-7:].mean() - data['weight'].iloc[:7].mean()
        insights.append(f"⚖️ Recent weight change: {weight_change:+.1f} kg")
        
        if abs(weight_change) > 1:
            recommendations.append("⚖️ Significant weight change detected - monitor nutrition and activity")
    
    # Correlation insights
    numeric_cols = data.select_dtypes(include=[np.number]).columns
    if 'steps' in data.columns and 'sleep_hours' in data.columns:
        corr = data['steps'].corr(data['sleep_hours'])
        if abs(corr) > 0.3:
            direction = "positively" if corr > 0 else "negatively"
            insights.append(f"🔗 Your activity and sleep are {direction} correlated (r={corr:.2f})")
    
    return insights, recommendations

# Generate insights
insights, recommendations = generate_health_insights(health_data)

print("\n💡 PERSONALIZED HEALTH INSIGHTS")
print("=" * 45)

for i, insight in enumerate(insights, 1):
    print(f"{i}. {insight}")

print("\n🎯 ACTIONABLE RECOMMENDATIONS")
print("=" * 35)

for i, rec in enumerate(recommendations, 1):
    print(f"{i}. {rec}")

# Health score calculation
def calculate_health_score(data):
    score = 0
    max_score = 0
    
    # Steps score (25 points)
    if 'steps' in data.columns:
        avg_steps = data['steps'].mean()
        step_score = min(25, (avg_steps / 10000) * 25)
        score += step_score
        max_score += 25
    
    # Sleep score (25 points)
    if 'sleep_hours' in data.columns:
        optimal_nights = ((data['sleep_hours'] >= 7) & (data['sleep_hours'] <= 9)).sum()
        total_nights = len(data['sleep_hours'].dropna())
        sleep_score = (optimal_nights / total_nights) * 25 if total_nights > 0 else 0
        score += sleep_score
        max_score += 25
    
    # Heart rate score (25 points)
    if 'heart_rate' in data.columns:
        normal_days = ((data['heart_rate'] >= 60) & (data['heart_rate'] <= 100)).sum()
        total_days = len(data['heart_rate'].dropna())
        hr_score = (normal_days / total_days) * 25 if total_days > 0 else 0
        score += hr_score
        max_score += 25
    
    # Consistency score (25 points)
    consistency_score = 0
    if 'steps' in data.columns:
        step_cv = data['steps'].std() / data['steps'].mean() if data['steps'].mean() > 0 else 1
        consistency_score += max(0, (1 - step_cv) * 12.5)  # Lower CV = higher consistency
    
    if 'sleep_hours' in data.columns:
        sleep_cv = data['sleep_hours'].std() / data['sleep_hours'].mean() if data['sleep_hours'].mean() > 0 else 1
        consistency_score += max(0, (1 - sleep_cv) * 12.5)
    
    score += consistency_score
    max_score += 25
    
    return (score / max_score) * 100 if max_score > 0 else 0

health_score = calculate_health_score(health_data)

print(f"\n🏆 OVERALL HEALTH SCORE")
print("=" * 30)
print(f"Your Health Score: {health_score:.1f}/100")

if health_score >= 80:
    print("🌟 Excellent! You're maintaining great health habits!")
elif health_score >= 65:
    print("👍 Good progress! Small improvements can make a big difference")
elif health_score >= 50:
    print("📈 Room for improvement - focus on consistency")
else:
    print("🎯 Let's work together to improve your health metrics!")

# Create a health score visualization
fig, ax = plt.subplots(1, 1, figsize=(10, 6))
categories = ['Steps\nGoal', 'Sleep\nQuality', 'Heart Rate\nRange', 'Consistency']
scores = []

# Calculate individual scores
if 'steps' in health_data.columns:
    step_score = min(100, (health_data['steps'].mean() / 10000) * 100)
    scores.append(step_score)
else:
    scores.append(0)

if 'sleep_hours' in health_data.columns:
    optimal_nights = ((health_data['sleep_hours'] >= 7) & (health_data['sleep_hours'] <= 9)).sum()
    total_nights = len(health_data['sleep_hours'].dropna())
    sleep_score = (optimal_nights / total_nights) * 100 if total_nights > 0 else 0
    scores.append(sleep_score)
else:
    scores.append(0)

if 'heart_rate' in health_data.columns:
    normal_days = ((health_data['heart_rate'] >= 60) & (health_data['heart_rate'] <= 100)).sum()
    total_days = len(health_data['heart_rate'].dropna())
    hr_score = (normal_days / total_days) * 100 if total_days > 0 else 0
    scores.append(hr_score)
else:
    scores.append(0)

# Consistency score
consistency = 75  # Simplified for visualization
scores.append(consistency)

colors = ['red' if s < 50 else 'orange' if s < 75 else 'green' for s in scores]
bars = ax.bar(categories, scores, color=colors, alpha=0.7)
ax.set_ylim(0, 100)
ax.set_ylabel('Score (%)')
ax.set_title('Health Metrics Breakdown', fontsize=14, fontweight='bold')
ax.grid(True, alpha=0.3)

# Add score labels on bars
for bar, score in zip(bars, scores):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 2, 
             f'{score:.0f}%', ha='center', va='bottom', fontweight='bold')

plt.tight_layout()
plt.show()

## 📊 Summary Report Generation

In [None]:
# Generate comprehensive summary report
def generate_summary_report(data):
    report = {}
    report['analysis_date'] = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    report['data_period'] = {
        'start': data['date'].min().strftime('%Y-%m-%d'),
        'end': data['date'].max().strftime('%Y-%m-%d'),
        'total_days': len(data)
    }
    
    # Activity summary
    if 'steps' in data.columns:
        steps_data = data['steps'].dropna()
        report['activity'] = {
            'avg_daily_steps': int(steps_data.mean()),
            'max_daily_steps': int(steps_data.max()),
            'days_above_10k': int((steps_data >= 10000).sum()),
            'goal_achievement_rate': round((steps_data >= 10000).mean() * 100, 1)
        }
    
    # Sleep summary
    if 'sleep_hours' in data.columns:
        sleep_data = data['sleep_hours'].dropna()
        report['sleep'] = {
            'avg_sleep_hours': round(sleep_data.mean(), 1),
            'sleep_consistency_std': round(sleep_data.std(), 1),
            'optimal_sleep_nights': int(((sleep_data >= 7) & (sleep_data <= 9)).sum()),
            'optimal_sleep_rate': round(((sleep_data >= 7) & (sleep_data <= 9)).mean() * 100, 1)
        }
    
    # Heart rate summary
    if 'heart_rate' in data.columns:
        hr_data = data['heart_rate'].dropna()
        report['heart_rate'] = {
            'avg_resting_hr': int(hr_data.mean()),
            'hr_variability': round(hr_data.std(), 1),
            'days_normal_range': int(((hr_data >= 60) & (hr_data <= 100)).sum()),
            'normal_range_rate': round(((hr_data >= 60) & (hr_data <= 100)).mean() * 100, 1)
        }
    
    # Overall health score
    report['health_score'] = round(calculate_health_score(data), 1)
    
    return report

# Generate and display report
summary_report = generate_summary_report(health_data)

print("\n📋 COMPREHENSIVE HEALTH SUMMARY REPORT")
print("=" * 50)
print(f"Generated: {summary_report['analysis_date']}")
print(f"Data Period: {summary_report['data_period']['start']} to {summary_report['data_period']['end']}")
print(f"Total Days Analyzed: {summary_report['data_period']['total_days']}")

if 'activity' in summary_report:
    activity = summary_report['activity']
    print(f"\n🚶‍♀️ ACTIVITY METRICS:")
    print(f"  • Average daily steps: {activity['avg_daily_steps']:,}")
    print(f"  • Maximum daily steps: {activity['max_daily_steps']:,}")
    print(f"  • Days achieving 10K+ steps: {activity['days_above_10k']}")
    print(f"  • Goal achievement rate: {activity['goal_achievement_rate']}%")

if 'sleep' in summary_report:
    sleep = summary_report['sleep']
    print(f"\n😴 SLEEP METRICS:")
    print(f"  • Average sleep duration: {sleep['avg_sleep_hours']} hours")
    print(f"  • Sleep consistency: ±{sleep['sleep_consistency_std']} hours")
    print(f"  • Optimal sleep nights (7-9h): {sleep['optimal_sleep_nights']}")
    print(f"  • Optimal sleep rate: {sleep['optimal_sleep_rate']}%")

if 'heart_rate' in summary_report:
    hr = summary_report['heart_rate']
    print(f"\n❤️ HEART RATE METRICS:")
    print(f"  • Average resting heart rate: {hr['avg_resting_hr']} BPM")
    print(f"  • Heart rate variability: ±{hr['hr_variability']} BPM")
    print(f"  • Days in normal range (60-100): {hr['days_normal_range']}")
    print(f"  • Normal range rate: {hr['normal_range_rate']}%")

print(f"\n🏆 OVERALL HEALTH SCORE: {summary_report['health_score']}/100")

# Save report to JSON
import json
with open(f'health_report_{datetime.now().strftime("%Y%m%d_%H%M%S")}.json', 'w') as f:
    json.dump(summary_report, f, indent=2)

print("\n💾 Report saved to JSON file for future reference.")
print("\n✅ Health data analysis complete! Use these insights to improve your wellness journey.")

## 🎯 Next Steps and Action Plan

Based on your health data analysis, here are the recommended next steps:

### 📈 **Immediate Actions (This Week)**
1. **Set Daily Goals**: Use the dashboard to set realistic daily targets
2. **Track Consistently**: Ensure daily data entry for accurate analysis
3. **Focus on Priority Areas**: Address the lowest-scoring health metrics first

### 📊 **Short-term Goals (Next Month)**
1. **Improve Goal Achievement**: Aim for 80%+ achievement rate in key metrics
2. **Increase Consistency**: Reduce variability in sleep and activity patterns
3. **Monitor Trends**: Weekly review of progress using this notebook

### 🏆 **Long-term Vision (3+ Months)**
1. **Achieve Health Score 80+**: Maintain excellent health habits
2. **Establish Routines**: Build sustainable healthy lifestyle patterns
3. **Advanced Analysis**: Incorporate additional health metrics and wearable data

### 🔄 **Regular Analysis Schedule**
- **Daily**: Quick dashboard check and goal monitoring
- **Weekly**: Run this notebook for trend analysis
- **Monthly**: Comprehensive health report and goal adjustment

---

**Remember**: This analysis is for informational purposes only. Consult healthcare professionals for medical advice and significant health concerns.

### 📚 **Resources for Health Improvement**
- **Activity**: WHO recommends 150 minutes moderate activity per week
- **Sleep**: National Sleep Foundation guidelines: 7-9 hours for adults
- **Heart Rate**: Consult with healthcare provider for personalized targets
- **Nutrition**: Consider tracking alongside activity for comprehensive health view
