Skip to content

NapoliD/dbt-examples

Repository files navigation

dbt Examples — Warehouse-native Analytics Engineering

Production-grade dbt patterns across four focused sub-projects: medallion modeling, data quality testing, incremental strategies, and performance & cost optimization.

⚠️ Educational & academic use only

This repository is published exclusively for educational, academic, and demonstrative purposes.

  • All data is fictional — generated programmatically with deterministic seeds. No client data, no real company data, no production data, no PII.
  • The code is teaching material, not a consulting deliverable, not a client implementation, not production-tuned configuration.
  • No warranty — distributed under Apache 2.0 AS IS. Anyone using this code in a real environment does so at their own risk.
  • No confidential information of clients, partners, or third parties is included.

Full legal terms: see DISCLAIMER.md and LICENSE.


Why this repo exists

dbt is the de facto standard for transformation inside the modern warehouse (Snowflake, BigQuery, Redshift, Databricks SQL, DuckDB). These examples mirror what a real analytics engineering team ships to production — not toy notebooks.

Each sub-project is a self-contained dbt project that runs out of the box against DuckDB (zero credentials needed), and is documented for porting to Snowflake / BigQuery / Redshift / Databricks.


Sub-projects

Project Topic What it shows
dbt_medallion Staging → Intermediate → Marts Sources, contracts, snapshots, exposures, macros, full warehouse-native medallion
dbt_testing Data quality that doesn't ping at 1 AM Generic + singular tests, dbt-expectations, unit tests (dbt 1.8+), Elementary patterns
dbt_incremental Incremental strategies side-by-side merge vs insert_overwrite vs append vs microbatch (1.9), backfill safety, late-arriving data
dbt_performance Cost & performance optimization Materialization choice, clustering, query tagging, warehouse spend monitoring with metadata

Quick start (zero credentials)

# Requires Python 3.10+
pip install dbt-duckdb

cd dbt_medallion
dbt deps
dbt seed
dbt build
dbt docs generate && dbt docs serve

Each sub-project ships with a profiles.yml.example pointing at a local DuckDB file. To run against Snowflake / BigQuery / Redshift, copy that file to ~/.dbt/profiles.yml and swap the adapter block — instructions are inside each sub-project's README.


Tested adapter matrix

Adapter Status Notes
DuckDB ✅ Default for examples Zero credentials, runs in CI
Snowflake ✅ Documented Performance examples (dbt_performance) include Snowflake-specific patterns
BigQuery ✅ Documented Incremental microbatch examples include partition/cluster recommendations
Redshift ✅ Documented merge strategy notes for Redshift's limitations
Databricks SQL ✅ Compatible Pairs naturally with the databricks-examples companion repo

Repository conventions

  • One project = one topic. No cross-imports between sub-projects.
  • Seeds are deterministic. Fake data is generated with fixed seeds so test results are reproducible.
  • Every model has a description. If dbt docs generate shows blanks, that's a bug.
  • Every public model has at least one not_null and one unique test on its primary key. Period.
  • No dbt run in CI — only dbt build. Because tests are not optional.

How dbt is operated in production

Curious how a real team runs dbt in production (Kubernetes, GitHub Actions, approval gates)? See docs/DEPLOYMENT_PATTERN.md — explanatory, not executable.


License

Apache 2.0 — see LICENSE.


About

Built by Napoli Data — Data & AI Engineering. We publish in the open what others charge for. If this is what we give away, imagine what we ship under contract.

Companion repo (Lakehouse / PySpark): databricks-examples

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors