This project demonstrates how to properly configure and organize a dbt project for Materialize, showcasing best practices for building real-time data objects.
This example teaches you how to:
- Configure dbt models as views, indexed views, and materialized views in Materialize
- Set up sources using Materialize's load generators
- Create sinks to export data to Kafka
- Organize your project with proper layer separation (staging, intermediate, marts)
- Use dbt selectors for controlled deployment
- Implement comprehensive testing strategies
- Deploy with CI/CD and blue/green deployments using GitHub Actions
The project uses an auction house scenario to illustrate these concepts, implementing flipper detection, health monitoring, and suspicious activity alerts as practical examples of real-time data transformations.
This project includes complete CI/CD workflows for automated testing and zero-downtime deployments:
- dbt CI - Validates SQL compilation on every pull request
- Deploy to QA - Auto-deploys to QA environment when code merges to
main - Blue/Green Production - Zero-downtime production deployments using native dbt-materialize macros
- Scheduled Cleanup - Automatically removes old deployment environments
Deploy to QA:
git push origin main # Automatic deploymentDeploy to Production:
git tag v1.0.0
git push origin v1.0.0
gh release create v1.0.0 # Triggers blue/green deployment- 📘 CI/CD Setup Guide - Complete setup instructions for GitHub Actions
- 🔄 Blue/Green Deployment - Zero-downtime deployment guide (includes two workflow options)
- ✅ Deployment Checklist - Pre and post-deployment validation
- 🧪 Testing Guide - Comprehensive test suite for workflows and deployments
This project showcases key Materialize features with practical examples:
- Views - Staging and intermediate transformations without storage overhead
- Indexes - Query optimization on intermediate views
- Materialized Views - Real-time maintained analytics in the marts layer
- Sources - Load generator creating streaming auction data
- Sinks - Kafka exports with DEBEZIUM envelope format
📚 See MATERIALIZE_CONCEPTS.md for detailed explanations and examples of each concept.
This project follows dbt best practices for organizing transformations into distinct layers. Understanding this structure is key to building maintainable real-time data pipelines.
materialize_auction_house/
├── dbt_project.yml # Project configuration - defines project name, version, and model defaults
├── profiles.yml # Connection profiles for dev/staging/production environments
├── packages.yml # External package dependencies (e.g., dbt-utils)
├── selectors.yml # Custom selectors for deployment patterns
├── models/ # All data transformations
│ ├── sources/ # Source definitions (Materialize-specific)
│ │ └── *.sql # CREATE SOURCE statements
│ ├── staging/ # Raw data preparation layer
│ │ ├── _sources.yml # Source configurations and tests
│ │ ├── schema.yml # Model documentation and tests
│ │ └── stg_*.sql # Staging transformations (views)
│ ├── intermediate/ # Business logic layer
│ │ ├── schema.yml # Model documentation and tests
│ │ └── int_*.sql # Intermediate transformations (views + indexes)
│ ├── marts/ # Analytics layer
│ │ ├── schema.yml # Model documentation and tests
│ │ └── *.sql # Final models (materialized views)
│ └── sinks/ # Data export layer (Materialize-specific)
│ └── sink_*.sql # Kafka sink definitions
├── seeds/ # Static reference data
│ └── *.csv # CSV files loaded into tables
├── scripts/ # Utility scripts
│ ├── setup.sh # Environment setup
│ └── *.sql # Database initialization
└── docker-compose.yml # Local development environment
The main project configuration file (dbt docs) defines:
- Project metadata (name, version)
- Model materialization defaults by directory
- Variable definitions for environment-specific settings
- Cluster assignments for Materialize workloads
Example from this project:
models:
materialize_auction_house:
staging:
+materialized: view # Simple views for data prep
+schema: staging
+cluster: compute
intermediate:
+materialized: view # Views that can be indexed
+schema: intermediate
+cluster: compute
marts:
+schema: marts
+cluster: compute # Materialized views defined per-modelDefines connection parameters (dbt docs) for different environments:
- Development (local Materialize)
- Staging (test environment)
- Production (Materialize Cloud)
Custom selector definitions (dbt docs) for deployment patterns:
- Run only sources, only transformations, or only sinks
- Exclude certain model types by default
- Group related models for testing
-
Staging Layer (
models/staging/)- One-to-one with source tables
- Minimal transformations (renaming, casting)
- Always views (no persistence needed)
- Naming convention:
stg_<source>_<table>
-
Intermediate Layer (
models/intermediate/)- Business logic and complex joins
- Can be indexed for performance
- Reusable transformations
- Naming convention:
int_<entity>_<verb>
-
Marts Layer (
models/marts/)- Final consumption-ready models
- Often materialized views for real-time results
- Optimized for specific use cases
- Naming convention:
fct_(facts) ordim_(dimensions)
-
Sources (
models/sources/)- Materialize-specific source definitions
- Load generators, Kafka, PostgreSQL, etc.
- Managed by dbt for better lineage
-
Sinks (
models/sinks/)- Materialize-specific data exports
- Stream results to Kafka topics
- Real-time data pipelines
- Docker and Docker Compose
- dbt-materialize adapter:
pip install dbt-materialize - psql (PostgreSQL client) for running setup scripts
-
Start the services:
docker compose up -d
This starts:
- Materialize (port 6875)
- Redpanda/Kafka (port 9092)
- Redpanda Console (port 8080 - web UI for Kafka)
-
Run the setup script:
psql "postgresql://materialize@localhost:6875/materialize" -f scripts/setup_materialize.sqlOr use the automated setup:
./scripts/setup.sh
-
Install dbt dependencies:
pip install dbt-materialize dbt deps
-
Test the connection:
dbt debug --profiles-dir . --profile materialize_auction_house -
Run the project:
dbt seed --profiles-dir . --profile materialize_auction_house # Load known flippers dbt run --selector sources_only --profiles-dir . --profile materialize_auction_house # Create sources (first time only) dbt run --profiles-dir . --profile materialize_auction_house # Create all models except sources dbt test --profiles-dir . --profile materialize_auction_house # Run tests
Or use selectors:
dbt run --selector sources_only --profiles-dir . --profile materialize_auction_house # Only create sources dbt run --profiles-dir . --profile materialize_auction_house # Default: all models except sources/sinks dbt run --selector transformations --profiles-dir . --profile materialize_auction_house # Only staging, intermediate, marts dbt run --selector sinks_only --profiles-dir . --profile materialize_auction_house # Only create sinks
- Materialize Console:
psql "postgresql://materialize@localhost:6875/materialize" - Redpanda Console: http://localhost:8080 (view topics and messages)
- View streaming data:
-- Connect to Materialize psql "postgresql://materialize@localhost:6875/materialize" -- Watch flippers in real-time COPY (SUBSCRIBE (SELECT * FROM public_marts.fct_auction_flippers)) TO STDOUT;
This project includes comprehensive tests for workflows and deployments:
# Quick test using Make
make test # Run all tests (workflow + deployment)
make test-workflows # Validate GitHub Actions workflows only
make test-deployment # Test blue/green deployment macros
make test-dbt # Run dbt data quality tests
# Manual test execution
./scripts/test_workflows.sh # Workflow validation
./scripts/test_blue_green_deployment.sh # Deployment integration testsWhat's tested:
- ✅ GitHub Actions workflow structure and syntax
- ✅ Blue/green deployment macros (
deploy_init,deploy_promote, etc.) - ✅ Atomic swap and rollback capabilities
- ✅ Cluster hydration checks
- ✅ Data quality via dbt tests
See TESTING.md for complete testing documentation.
make clean # Clean all artifacts
docker compose down -v # Stop and remove containers/volumesFor Materialize Cloud or production deployments:
-
Update profiles.yml with your Materialize Cloud credentials:
materialize_auction_house: outputs: production: type: materialize host: <your-host>.aws.materialize.cloud port: 6875 user: <your-email> pass: <your-app-password> database: materialize schema: public cluster: production_cluster sslmode: require
-
Initialize Materialize resources:
-- Create clusters CREATE CLUSTER sources REPLICAS (r1 (SIZE = '25cc')); CREATE CLUSTER compute REPLICAS (r1 (SIZE = '25cc')); CREATE CLUSTER sinks REPLICAS (r1 (SIZE = '25cc')); -- Create source CREATE SOURCE auction_house IN CLUSTER sources FROM LOAD GENERATOR AUCTION (TICK INTERVAL '1s', AS OF 100000) FOR ALL TABLES; -- Configure Kafka for production CREATE SECRET kafka_password AS 'your-password'; CREATE CONNECTION kafka_connection TO KAFKA ( BROKER 'your-kafka-broker:9092', SASL MECHANISMS = 'PLAIN', SASL USERNAME = 'user', SASL PASSWORD = SECRET kafka_password );
-
Run dbt against production:
dbt run --target production --profiles-dir . --profile materialize_auction_house dbt test --target production --profiles-dir . --profile materialize_auction_house
dbt run --profiles-dir . --profile materialize_auction_housedbt run --select staging --profiles-dir . --profile materialize_auction_house # Raw data preparation
dbt run --select intermediate --profiles-dir . --profile materialize_auction_house # Business logic
dbt run --select marts --profiles-dir . --profile materialize_auction_house # Final analyticsdbt test --profiles-dir . --profile materialize_auction_housedbt seed --profiles-dir . --profile materialize_auction_house# Generate documentation
dbt docs generate --profiles-dir . --profile materialize_auction_house
# Serve documentation locally
dbt docs serve --profiles-dir . --profile materialize_auction_house --port 8080The documentation includes:
- Model Lineage: Visual DAG showing data flow from sources to sinks
- Model Documentation: Descriptions and column-level details for all models
- Test Results: Data quality test results integrated into the docs
- Schema Information: Complete catalog of tables, views, and materialized views
The project includes pre-configured selectors for common deployment patterns:
default: Runs all transformations (excludes sources and sinks)sources_only: Only create/update sourcessinks_only: Only create/update sinkstransformations: Only staging, intermediate, and martsstaging_only: Only staging modelsintermediate_only: Only intermediate modelsmarts_only: Only mart modelsflippers: Models related to flipper detectionmetrics: Models related to metrics and monitoringno_sinks: Everything except sinks
Example usage:
# Initial setup - create sources first
dbt run --selector sources_only --profiles-dir . --profile materialize_auction_house
# Regular deployment - transformations only
dbt run --selector transformations --profiles-dir . --profile materialize_auction_house
# Update only metrics models
dbt run --selector metrics --profiles-dir . --profile materialize_auction_house- Views: Efficient transformations without persistence (staging/, intermediate/)
- Materialized Views: Incrementally maintained results (marts/)
- Indexes: Multi-column indexes for join optimization (int_winning_bids)
- Sources: Load generator for streaming data (auction_load_generator)
- Sinks: Real-time Kafka exports with DEBEZIUM envelope (sinks/)
- Clusters: Workload isolation (sources, compute, sinks)
- Live auction status tracking with window functions
- Continuous flipper detection using self-joins
- Streaming health metrics with hourly aggregations
- Real-time suspicious activity alerts with pattern matching
- Modular model architecture with clear layer separation
- 60+ data quality tests including custom tests for real-time data
- Full documentation with column descriptions
- Environment-specific configurations (dev/staging/production)
This project includes custom test macros designed for real-time data validation:
value_in_range: Validates that numeric values fall within expected boundsoutlier_detection: Uses z-score method to identify statistical outliersdata_recency: Ensures data has been updated within a specified time windowmonotonic_increase: Verifies that values only increase over time (useful for counters)referential_integrity_cascade: Tests relationships across multiple table levels
Example usage in models/marts/schema.yml:
- name: total_flips
tests:
- value_in_range:
min_value: 1
max_value: 1000
- outlier_detection:
z_threshold: 3.5
config:
severity: warn # Warning instead of error for expected outliersKey variables in dbt_project.yml:
flipper_threshold_days: Days to consider for flip detection (default: 8)min_profit_margin: Minimum profit margin for flippers (default: 0.20)- Cluster names for sources, compute, and sinks
This example project is provided as-is for demonstration purposes.