14 Days of Real-World Data Engineering Challenges with dbt + DuckDB
Learn data engineering by fixing the same problems you'll face in production.
# Setup
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
# Verify
dbt debug
# Start Day 1
cat docs/day01.md
dbt seed --select day01
# Build your solution in models/day01_schema_chaos/
dbt run --select day01
dbt test --select day01| Day | Challenge | Difficulty | Topic |
|---|---|---|---|
| 1 | Schema Chaos | β | JSON parsing, mixed formats |
| 2 | Duplicate Detective | β | Deduplication, window functions |
| 3 | Late Arrivals | ββ | Late-arriving data, date math |
| 4 | NULL Nightmare | ββ | NULL handling, data cleaning |
| 5 | SCD Surprise | βββ | Slowly changing dimensions |
| 6 | Testing Triumph | ββ | dbt tests, data quality |
| 7 | Incremental Intro | βββ | Incremental models |
| 8 | Fan-out Fiasco | ββ | Join explosions, grain |
| 9 | Timezone Terror | βββ | Timezone conversions |
| 10 | Source Freshness | ββ | Freshness tests, monitoring |
| 11 | Data Contracts | βββ | Schema enforcement |
| 12 | Metrics Mayhem | βββ | Metrics layer, semantic models |
| 13 | Refactoring Rage | ββ | Safe refactoring, impact analysis |
| 14 | The Great Migration | ββββ | Full pipeline migration |
advent-of-data/
βββ docs/ # Challenge descriptions
βββ seeds/ # Raw data per day
βββ models/ # Your solutions go here
βββ solutions/ # Reference solutions
βββ tests/ # Custom tests
βββ macros/ # Reusable SQL
- Zero infrastructure β Runs locally, no cloud needed
- Real skills β Same patterns used in production
- Fast β DuckDB is blazingly fast for development
π Happy data engineering!