Production-grade dbt patterns across four focused sub-projects: medallion modeling, data quality testing, incremental strategies, and performance & cost optimization.
This repository is published exclusively for educational, academic, and demonstrative purposes.
- All data is fictional — generated programmatically with deterministic seeds. No client data, no real company data, no production data, no PII.
- The code is teaching material, not a consulting deliverable, not a client implementation, not production-tuned configuration.
- No warranty — distributed under Apache 2.0 AS IS. Anyone using this code in a real environment does so at their own risk.
- No confidential information of clients, partners, or third parties is included.
Full legal terms: see
DISCLAIMER.mdandLICENSE.
dbt is the de facto standard for transformation inside the modern warehouse (Snowflake, BigQuery, Redshift, Databricks SQL, DuckDB). These examples mirror what a real analytics engineering team ships to production — not toy notebooks.
Each sub-project is a self-contained dbt project that runs out of the box against DuckDB (zero credentials needed), and is documented for porting to Snowflake / BigQuery / Redshift / Databricks.
| Project | Topic | What it shows |
|---|---|---|
dbt_medallion |
Staging → Intermediate → Marts | Sources, contracts, snapshots, exposures, macros, full warehouse-native medallion |
dbt_testing |
Data quality that doesn't ping at 1 AM | Generic + singular tests, dbt-expectations, unit tests (dbt 1.8+), Elementary patterns |
dbt_incremental |
Incremental strategies side-by-side | merge vs insert_overwrite vs append vs microbatch (1.9), backfill safety, late-arriving data |
dbt_performance |
Cost & performance optimization | Materialization choice, clustering, query tagging, warehouse spend monitoring with metadata |
# Requires Python 3.10+
pip install dbt-duckdb
cd dbt_medallion
dbt deps
dbt seed
dbt build
dbt docs generate && dbt docs serveEach sub-project ships with a profiles.yml.example pointing at a local DuckDB file. To run against Snowflake / BigQuery / Redshift, copy that file to ~/.dbt/profiles.yml and swap the adapter block — instructions are inside each sub-project's README.
| Adapter | Status | Notes |
|---|---|---|
| DuckDB | ✅ Default for examples | Zero credentials, runs in CI |
| Snowflake | ✅ Documented | Performance examples (dbt_performance) include Snowflake-specific patterns |
| BigQuery | ✅ Documented | Incremental microbatch examples include partition/cluster recommendations |
| Redshift | ✅ Documented | merge strategy notes for Redshift's limitations |
| Databricks SQL | ✅ Compatible | Pairs naturally with the databricks-examples companion repo |
- One project = one topic. No cross-imports between sub-projects.
- Seeds are deterministic. Fake data is generated with fixed seeds so test results are reproducible.
- Every model has a
description. Ifdbt docs generateshows blanks, that's a bug. - Every public model has at least one
not_nulland oneuniquetest on its primary key. Period. - No
dbt runin CI — onlydbt build. Because tests are not optional.
Curious how a real team runs dbt in production (Kubernetes, GitHub Actions, approval gates)? See docs/DEPLOYMENT_PATTERN.md — explanatory, not executable.
Apache 2.0 — see LICENSE.
Built by Napoli Data — Data & AI Engineering. We publish in the open what others charge for. If this is what we give away, imagine what we ship under contract.
Companion repo (Lakehouse / PySpark): databricks-examples