# 🧪 Synthetic Data Hands-On: Simulated Health Dataset
This notebook demonstrates how to generate synthetic health data using NumPy and pandas. The generated data includes age, height, weight, BMI, and a simple disease indicator.

## ✅ Step 1: Generate Random Patient Data

In [1]:
import pandas as pd
import numpy as np

np.random.seed(42)
n = 100  # number of patients

df = pd.DataFrame({
    "age": np.random.randint(18, 80, size=n),
    "height_cm": np.random.normal(loc=170, scale=10, size=n).round(1),
    "weight_kg": np.random.normal(loc=70, scale=15, size=n).round(1)
})

# Calculate BMI and assign hypertension if BMI > 25
df["bmi"] = (df["weight_kg"] / (df["height_cm"] / 100) ** 2).round(1)
df["hypertension"] = (df["bmi"] > 25).astype(int)

df.head()

Unnamed: 0,age,height_cm,weight_kg,bmi,hypertension
0,56,177.0,63.4,20.2,0
1,69,168.3,75.7,26.7,1
2,46,160.9,99.0,38.2,1
3,32,181.9,71.2,21.5,0
4,60,177.9,63.8,20.2,0


## ✅ Step 2: Summarize and Save the Data

In [2]:
df.describe()

Unnamed: 0,age,height_cm,weight_kg,bmi,hypertension
count,100.0,100.0,100.0,100.0,100.0
mean,50.27,170.21,69.036,24.131,0.41
std,19.176403,9.863866,15.712374,6.585256,0.494311
min,19.0,145.0,26.8,10.5,0.0
25%,34.75,162.925,57.85,19.975,0.0
50%,51.5,169.75,70.55,23.0,0.0
75%,68.0,177.05,78.525,28.9,1.0
max,79.0,196.6,104.1,41.2,1.0


In [None]:
df.to_csv("synthetic_health_data.csv", index=False)
print("Synthetic data saved as 'synthetic_health_data.csv'")

## 💬 Discussion
- How realistic is this dataset?
- What assumptions did we make?
- What additional variables or complexity could make this more useful?