Companion code repository for Data Transformation: The Definitive Guide by Andrew Madson (O'Reilly, 2026).
SQL-Sushi Co. is the fictional restaurant chain that runs through every chapter of the book. The company operates 35 locations across 5 metros on three POS systems (Clover, Square, Toast), plus a delivery-platform mix. The chapters progressively build, replace, and extend a single data pipeline that transforms raw POS exports into analytics-ready tables. This repo holds the runnable code for every chapter that has it.
Each chapter that includes runnable code has its own directory at the top level. The directory name matches the chapter number and title, lowercase with underscores:
sql-sushi/
├── chapter_12_scheduling_sql_pipelines_python/
│ └── sqlsushi-pipeline-v1/ # the Python-runner pipeline
├── chapter_13_workflow_orchestration_airflow/ (coming)
│ └── sqlsushi-pipeline-v2/ # Airflow replaces the runner
├── chapter_14_sql_frameworks_dbt/ (coming)
│ └── sqlsushi-pipeline-v3/ # dbt replaces the SQL files
├── chapter_15_beyond_sql_spark/ (coming)
│ └── sqlsushi-pipeline-v4/ # Spark replaces Postgres
└── ...
Each chapter directory is self-contained. You can cd into one and run that chapter's code without thinking about anything else. The pipeline-version directories (sqlsushi-pipeline-vN/) let you compare the same logical pipeline across the four progressively more sophisticated implementations the book walks through.
| # | Title | Code |
|---|---|---|
| 1 | Business Challenges and the State of Data Today | no code |
| 2 | Specification Writing | no code |
| 3 | Reproducibility | coming |
| 4 | Backfilling and Reprocessing | coming |
| 5 | Incremental Models | coming |
| 6 | Streaming Data Transformation | coming |
| 7 | Testing and Data Quality | coming |
| 8 | Version Control | no code |
| 9 | CI/CD for Data Pipelines | coming |
| 10 | Observability and Monitoring | coming |
| 11 | Scalability and Performance | coming |
| 12 | Scheduling SQL Pipelines with Python | chapter_12_scheduling_sql_pipelines_python/ |
| 13 | Workflow Orchestration with Apache Airflow | coming |
| 14 | SQL Transformation Frameworks: dbt and SQLMesh | coming |
| 15 | Beyond SQL: Complex Transformations with Apache Spark | coming |
| 16 | Real-Time Data Transformation Patterns | coming |
| 17 | End-to-End Case Study | coming |
| 18 | AI-Assisted Data Engineering | coming |
Each chapter's directory has its own README.md with the specific commands to run that chapter's code, the libraries it pins, and the outputs you should see.
cd chapter_12_scheduling_sql_pipelines_python/sqlsushi-pipeline-v1
cp .env.example .env
docker compose up -d
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python scripts/build_seed_data.py
python scripts/generate_data.py
python scripts/runner.pyRoughly 30 seconds from git clone to a populated analytics.fact_sales. See chapter_12_scheduling_sql_pipelines_python/sqlsushi-pipeline-v1/README.md for what to expect.
Found a bug, typo, or factual error in the book or the code? Please open an issue using the Erratum template. Include the chapter, page or section, and a clear description.
See CONTRIBUTING.md for how to suggest improvements. Pull requests that make the code more correct, more reproducible, or more aligned with current best practices are welcome.
The code in this repository is MIT licensed. See LICENSE. The text of Data Transformation: The Definitive Guide is © O'Reilly Media and is not distributed here.