Skip to content

andymadson/sql-sushi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sql-sushi

Companion code repository for Data Transformation: The Definitive Guide by Andrew Madson (O'Reilly, 2026).

SQL-Sushi Co. is the fictional restaurant chain that runs through every chapter of the book. The company operates 35 locations across 5 metros on three POS systems (Clover, Square, Toast), plus a delivery-platform mix. The chapters progressively build, replace, and extend a single data pipeline that transforms raw POS exports into analytics-ready tables. This repo holds the runnable code for every chapter that has it.

How the repo is organized

Each chapter that includes runnable code has its own directory at the top level. The directory name matches the chapter number and title, lowercase with underscores:

sql-sushi/
├── chapter_12_scheduling_sql_pipelines_python/
│   └── sqlsushi-pipeline-v1/        # the Python-runner pipeline
├── chapter_13_workflow_orchestration_airflow/   (coming)
│   └── sqlsushi-pipeline-v2/        # Airflow replaces the runner
├── chapter_14_sql_frameworks_dbt/   (coming)
│   └── sqlsushi-pipeline-v3/        # dbt replaces the SQL files
├── chapter_15_beyond_sql_spark/     (coming)
│   └── sqlsushi-pipeline-v4/        # Spark replaces Postgres
└── ...

Each chapter directory is self-contained. You can cd into one and run that chapter's code without thinking about anything else. The pipeline-version directories (sqlsushi-pipeline-vN/) let you compare the same logical pipeline across the four progressively more sophisticated implementations the book walks through.

Chapters

# Title Code
1 Business Challenges and the State of Data Today no code
2 Specification Writing no code
3 Reproducibility coming
4 Backfilling and Reprocessing coming
5 Incremental Models coming
6 Streaming Data Transformation coming
7 Testing and Data Quality coming
8 Version Control no code
9 CI/CD for Data Pipelines coming
10 Observability and Monitoring coming
11 Scalability and Performance coming
12 Scheduling SQL Pipelines with Python chapter_12_scheduling_sql_pipelines_python/
13 Workflow Orchestration with Apache Airflow coming
14 SQL Transformation Frameworks: dbt and SQLMesh coming
15 Beyond SQL: Complex Transformations with Apache Spark coming
16 Real-Time Data Transformation Patterns coming
17 End-to-End Case Study coming
18 AI-Assisted Data Engineering coming

Each chapter's directory has its own README.md with the specific commands to run that chapter's code, the libraries it pins, and the outputs you should see.

Quick start (Chapter 12)

cd chapter_12_scheduling_sql_pipelines_python/sqlsushi-pipeline-v1
cp .env.example .env
docker compose up -d

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

python scripts/build_seed_data.py
python scripts/generate_data.py
python scripts/runner.py

Roughly 30 seconds from git clone to a populated analytics.fact_sales. See chapter_12_scheduling_sql_pipelines_python/sqlsushi-pipeline-v1/README.md for what to expect.

Reporting errata

Found a bug, typo, or factual error in the book or the code? Please open an issue using the Erratum template. Include the chapter, page or section, and a clear description.

Contributing

See CONTRIBUTING.md for how to suggest improvements. Pull requests that make the code more correct, more reproducible, or more aligned with current best practices are welcome.

License

The code in this repository is MIT licensed. See LICENSE. The text of Data Transformation: The Definitive Guide is © O'Reilly Media and is not distributed here.

About

Companion repository for Data Transformation: The Definitive Guide

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages