Realistic sample dataset for a fictional health and fitness app called PulseFit, built for Snowflake ingestion and modeling practice.
- 500 users, 12 trainers
- ~9 months of activity (Jul 2025 – Apr 2026)
- Realistic signup curves (January bump), churn, upgrades/downgrades
- Engagement tiers (power/regular/light) drive volume across all fact tables
- Correlated behavior: goals influence workout types, meal calories, and weight trajectory
- Intentional messiness: missing country/birth_date, null body_fat_pct and waist_inches, null sleep_hours
| File | Rows (approx) | Format | Notes |
|---|---|---|---|
users.csv |
500 | CSV | dim_user |
trainers.csv |
12 | CSV | dim_trainer (includes 1 Inactive) |
subscriptions.csv |
~450 | CSV | dim_subscription with upgrade/downgrade/churn paths |
goals.csv |
~650 | CSV | user goals with realistic status mix |
workouts.csv |
~15k | CSV | fact_workout, trainer_id sometimes null |
nutrition_logs.csv |
~30k | CSV | fact_nutrition, meal-level |
body_metrics.csv |
~4k | CSV | fact_body_metric, sparse by design |
daily_activity.csv |
~60k | CSV | fact_activity, daily cadence |
chat_messages.json |
~2k | NDJSON | trainer/user messages, Premium/Elite users only |
app_events.json |
~40k | NDJSON | session-grouped with nested metadata |
- CSV ingestion into Snowflake (
COPY INTOfrom internal stage) - JSON ingestion into
VARIANTcolumns +LATERAL FLATTEN - Medallion modeling: raw → staging → facts/dims
- Cohort retention and churn analysis (signup_date + subscription end_date)
- Engagement analytics (sessions per user, DAU/WAU/MAU from app_events)
- Funnel analysis (app_open → workout_logged, app_open → meal_logged)
- Goal attainment vs. activity volume
- Trainer performance (message volume, assigned users, retention of coached users)
- Upload files to an internal stage (
PUT file://... @my_stage). - Create raw tables mirroring CSV schemas.
- Load JSON files into
VARIANTcolumns. - Flatten JSON with
LATERAL FLATTENinto staging tables. - Build dims (user, trainer, date) and facts (workout, nutrition, activity, event).
- Build reporting marts: retention, engagement, coaching impact, revenue.
- ~2% of users have blank
country - ~1% have blank
birth_date - ~8% of body_metrics rows have null
body_fat_pct - ~12% have null
waist_inches - ~5% of daily_activity rows have null
sleep_hours - Some emails have embedded numbers or underscores
- Subscription history includes multi-row users (downgrades/upgrades) requiring window functions to resolve current plan
- Trainer
T3011is Inactive but may appear on older workouts