# Health Data Simulation
We simulate realistic health data for 100 users with features like steps, heart rate, sleep quality, workout type, and workout duration.

A derived `performance` metric is calculated based on a weighted correlation with sleep quality, steps, and workout duration, adding a small amount of random noise for variability.


A custom function calculates the correlation between performance and three key metrics:
- Sleep Quality: Recovery's role in performance.
- Steps: The impact of daily physical activity.
- Workout Duration: The influence of workout intensity and duration.

In [1]:
import numpy as np
import random

# Step 1: Simulating Health Data
np.random.seed(42)  # For reproducibility

# Number of users
num_users = 100

# Generate data
data = {
    "user_id": [f"user_{i+1}" for i in range(num_users)],
    "steps": np.random.randint(1000, 20000, size=num_users),
    "heart_rate": np.random.randint(50, 100, size=num_users),
    "sleep_quality": np.random.rand(num_users),  # Between 0 and 1
    "workout_type": np.random.choice(["Cardio", "Strength", "None"], size=num_users, p=[0.4, 0.4, 0.2]),
    "workout_duration": np.random.randint(0, 120, size=num_users)  # Workout duration in minutes
}

# Derive performance score based on sleep quality, steps, and workout (simulate correlation)
data["performance"] = (
    0.5 * data["sleep_quality"] +
    0.3 * (data["steps"] / 20000) +
    0.2 * (data["workout_duration"] / 120) +
    np.random.normal(0, 0.05, num_users)
)


# Trends Analysis
**Correlation Analysis:**
We calculate correlations between sleep quality, steps, workout duration, and performance to quantify their relationships. This identifies the strength and direction of how these factors influence user performance, providing a foundation for actionable insights.

**Summary Statistics:**
Metrics like average steps, sleep quality, and workout duration give a high-level overview of user behavior and health trends, helping us identify baseline patterns across the dataset.

**User Grouping:**
Users are segmented into categories based on activity levels (steps) and sleep quality, enabling us to identify at-risk groups (e.g., sedentary users with poor sleep) and tailor recommendations accordingly.

**Workout Impact Analysis:**
Performance is analyzed by workout type (Cardio, Strength, None), revealing which activity type contributes most to overall performance. This allows us to highlight the benefits of structured exercise.


---



**Why This Analysis is Important?**
This structured approach combines individual-level insights with population-wide trends. By correlating key metrics, segmenting users, and evaluating workout impacts, we ensure data-driven recommendations are both actionable and personalized. It also showcases a holistic understanding of user health dynamics, adding credibility to the analysis.

In [2]:
# Step 2: Analyze Trends
def compute_correlation(x, y):
    """Manually compute correlation between two variables."""
    x_mean, y_mean = np.mean(x), np.mean(y)
    numerator = np.sum((x - x_mean) * (y - y_mean))
    denominator = np.sqrt(np.sum((x - x_mean)**2) * np.sum((y - y_mean)**2))
    return numerator / denominator

# Correlation Analysis
correlation_sleep_performance = compute_correlation(data["sleep_quality"], data["performance"])
correlation_steps_performance = compute_correlation(data["steps"], data["performance"])
correlation_workout_performance = compute_correlation(data["workout_duration"], data["performance"])

# Detailed Analysis
def analyze_trends(data):
    """Analyze and summarize key trends in the dataset."""
    total_users = len(data["user_id"])
    avg_steps = np.mean(data["steps"])
    avg_sleep_quality = np.mean(data["sleep_quality"])
    avg_workout_duration = np.mean(data["workout_duration"])

    print(f"Total Users: {total_users}")
    print(f"Average Steps: {avg_steps:.2f}")
    print(f"Average Sleep Quality: {avg_sleep_quality:.2f}")
    print(f"Average Workout Duration: {avg_workout_duration:.2f} minutes")

    print("\nCorrelations:")
    print(f"Sleep Quality vs. Performance: {correlation_sleep_performance:.2f}")
    print(f"Steps vs. Performance: {correlation_steps_performance:.2f}")
    print(f"Workout Duration vs. Performance: {correlation_workout_performance:.2f}")

    # Group users based on activity levels and sleep quality
    groups = {"Active and Well Rested": 0, "Active but Poor Sleep": 0,
              "Sedentary and Poor Sleep": 0, "Sedentary but Well Rested": 0}

    for i in range(total_users):
        steps = data["steps"][i]
        sleep = data["sleep_quality"][i]

        if steps > 8000 and sleep > 0.6:
            groups["Active and Well Rested"] += 1
        elif steps > 8000 and sleep <= 0.6:
            groups["Active but Poor Sleep"] += 1
        elif steps <= 8000 and sleep <= 0.6:
            groups["Sedentary and Poor Sleep"] += 1
        else:
            groups["Sedentary but Well Rested"] += 1

    print("\nUser Groups:")
    for group, count in groups.items():
        print(f"{group}: {count} users")

 # Analyze the impact of workout type on performance
    performance_by_workout = {"Cardio": [], "Strength": [], "None": []}

    for i in range(total_users):
        workout = data["workout_type"][i]
        performance_by_workout[workout].append(data["performance"][i])

    avg_performance = {k: np.mean(v) if v else 0 for k, v in performance_by_workout.items()}

    print("\nAverage Performance by Workout Type:")
    for workout, avg_perf in avg_performance.items():
        print(f"{workout}: {avg_perf:.2f}")

analyze_trends(data)

Total Users: 100
Average Steps: 9637.30
Average Sleep Quality: 0.51
Average Workout Duration: 61.88 minutes

Correlations:
Sleep Quality vs. Performance: 0.83
Steps vs. Performance: 0.41
Workout Duration vs. Performance: 0.39

User Groups:
Active and Well Rested: 27 users
Active but Poor Sleep: 31 users
Sedentary and Poor Sleep: 23 users
Sedentary but Well Rested: 19 users

Average Performance by Workout Type:
Cardio: 0.54
Strength: 0.49
None: 0.43


# User Insights Generation
Personalized recommendations are at the heart of user engagement and behavior change. For each user, specific insights are generated based on their individual data:
- Poor sleep quality → Suggestions for better sleep hygiene.
- Low steps → Encourage more physical activity.
- No workout → Recommendations to incorporate exercise into their routine.
- Good overall metrics → Positive reinforcement.

In [3]:
# Step 3: Generate Insights
def generate_insights(data):
    """Provide actionable insights for each user based on their data."""
    insights = []
    for i in range(num_users):
        user = {
            "user_id": data["user_id"][i],
            "steps": data["steps"][i],
            "sleep_quality": data["sleep_quality"][i],
            "performance": data["performance"][i],
            "workout_type": data["workout_type"][i],
            "workout_duration": data["workout_duration"][i]
        }

        if user["sleep_quality"] < 0.4:
            advice = "Improve your sleep quality by maintaining a consistent sleep schedule."
        elif user["steps"] < 5000:
            advice = "Increase your daily steps for better performance. Aim for at least 8000 steps."
        elif user["workout_type"] == "None" and user["workout_duration"] == 0:
            advice = "Incorporate regular workouts into your routine for better health outcomes."
        else:
            advice = "Keep up the good work!"

        insights.append({"user_id": user["user_id"], "advice": advice})
    return insights

# Generate insights for users
user_insights = generate_insights(data)

# Step 4: Display Sample Output
print("\nCorrelation between Sleep Quality and Performance:", correlation_sleep_performance)
print("Correlation between Steps and Performance:", correlation_steps_performance)
print("Correlation between Workout Duration and Performance:", correlation_workout_performance)

print("\nSample Insights for Users:")
for insight in user_insights[:5]:
    print(insight)



Correlation between Sleep Quality and Performance: 0.8250194166643615
Correlation between Steps and Performance: 0.4139199851627264
Correlation between Workout Duration and Performance: 0.39300844812364616

Sample Insights for Users:
{'user_id': 'user_1', 'advice': 'Keep up the good work!'}
{'user_id': 'user_2', 'advice': 'Increase your daily steps for better performance. Aim for at least 8000 steps.'}
{'user_id': 'user_3', 'advice': 'Keep up the good work!'}
{'user_id': 'user_4', 'advice': 'Improve your sleep quality by maintaining a consistent sleep schedule.'}
{'user_id': 'user_5', 'advice': 'Keep up the good work!'}


# Performance Analysis: Key Health Factors and Insights

The analysis reveals that sleep quality is the most critical factor influencing performance, with a strong positive correlation of 0.83. Users with better sleep consistently outperform others, making it a key area for targeted improvements. Physical activity, measured through daily steps (correlation: 0.41) and workout duration (correlation: 0.39), also contributes to better performance, albeit to a lesser extent than sleep. Insights show that most users are performing well, but those with low activity levels or poor sleep can significantly enhance their performance by addressing these areas. Encouraging better sleep hygiene and increased physical activity while reinforcing positive habits for high performers ensures a comprehensive approach to improving user outcomes.







