pg-warehouse
Mirror PostgreSQL → DuckDB. Build versioned analytical releases. Export Parquet. No cloud required.
Getting data out of PostgreSQL for analytics or ML usually means stitching together Python scripts, cron jobs, and a cloud warehouse you don't need. pg-warehouse replaces that with a single binary: sync tables into an embedded DuckDB, build versioned analytical releases, and export to Parquet or CSV. Everything runs locally, on your machine, with no external dependencies.
| Features | Exist ? |
|---|---|
| Analytical Storage (Columnar + Optimized for Reads) | ✅ |
| Separation from OLTP (Workload Isolation) | ✅ |
| SQL Analytical Engine | ✅ |
| Data Transformation Layer (ETL / ELT) | ✅ |
| Durable Analytical Storage (Files or Tables) | ✅ |
| pg-warehouse | pg_dump | Airbyte | dbt | |
|---|---|---|---|---|
| PostgreSQL sync | Full, incremental, CDC | Full only | Full, incremental, CDC | -- |
| Local analytics | DuckDB (columnar) | -- | -- | DuckDB adapter |
| Parquet/CSV export | Built-in | -- | Via connectors | Via packages |
| Model graph (DAG) | Built-in | -- | -- | Core strength |
| Contracts | Built-in | -- | -- | Via packages |
| Infrastructure | Single binary | Single binary | Docker + Java | Python + adapter |
| Cloud required | No | No | Optional | Optional |
A local-first Data Warehouse engine that mirrors PostgreSQL data into DuckDB using native PostgreSQL CDC. Best for teams that want CDC + SQL transforms + Parquet without standing up Kafka, Spark, or a warehouse.
pg-warehouse uses three DuckDB files following Medallion Architecture:
| File | Layer | Purpose |
|---|---|---|
raw.duckdb |
Bronze | CDC black box. Deduped PostgreSQL mirror. CDC owns it exclusively. |
silver.duckdb |
Silver | Development platform. v0 = raw snapshot, v1 = your models. |
feature.duckdb |
Gold | Analytics output. v0 = silver snapshot, v1 = your features. |
Models use ref() for dependencies. pg-warehouse resolves the DAG and builds in the correct order.
| Doc | Description |
|---|---|
| Architecture | Hexagonal design, layers, port interfaces |
| Quickstart | Full walkthrough with examples |
| State Database | SQLite schema and semantics |
| CDC Guide | Logical replication setup and lifecycle |
| Sync Modes | Full vs. incremental vs. CDC |
| Configuration | YAML reference |
| Open-Core Strategy | OSS vs. commercial boundary |
| Development Workflow | Models, contracts, DAG-resolved builds |
| Multi-DuckDB Architecture | Zero-downtime CDC with three DuckDB files |
pg-warehouse refresh # snapshot raw → silver v0
pg-warehouse build # build all models in DAG orderhttps://github.com/burnside-project/pg-warehouse/blob/main/docs/08-development-workflow.md
Homebrew (macOS / Linux):
brew install burnside-project/tap/pg-warehouseGo install:
go install github.com/burnside-project/pg-warehouse/cmd/pg-warehouse@latestDownload binary — see Releases for Linux, macOS, and Windows (amd64/arm64).
Build from source:
git clone https://github.com/burnside-project/pg-warehouse.git
cd pg-warehouse
make buildpg-warehouse runs from a single working directory. All paths in pg-warehouse.yml are relative to this directory.
~/pg-warehouse/ # Working directory
├── pg-warehouse # Binary
├── pg-warehouse.yml # Configuration
├── raw.duckdb # CDC black box
├── silver.duckdb # Silver development platform
├── feature.duckdb # Feature analytics output
├── .pgwh/
│ └── state.db # SQLite state (sync/CDC/builds)
├── models/
│ ├── silver/ # Silver layer SQL models
│ └── features/ # Feature layer SQL models
├── contracts/ # Data contracts (YAML)
├── releases/ # Release definitions (YAML)
├── out/ # Parquet/CSV exports
└── cdc.log # CDC log (when running via nohup)
1. Initialize — creates DuckDB files, state DB, and scaffolds directories:
$ mkdir -p ~/pg-warehouse && cd ~/pg-warehouse
$ pg-warehouse init --config pg-warehouse.yml2. Start CDC — stream changes from PostgreSQL:
$ pg-warehouse cdc setup --config pg-warehouse.yml
$ nohup pg-warehouse cdc start --config pg-warehouse.yml > cdc.log 2>&1 &3. Refresh — snapshot raw data into silver v0:
$ pg-warehouse refresh4. Build — build all models in DAG order:
$ pg-warehouse build5. Validate — check contracts, models, DAG, releases:
$ pg-warehouse validateSync
- Full table snapshots
- Incremental sync via watermark columns
- CDC streaming via PostgreSQL logical replication (pglogrepl)
- Automatic sync mode detection per table
Analytical Releases
- Models with
ref()andsource()dependency declarations - DAG-resolved build ordering (topological sort, cycle detection)
- Data contracts (YAML) for schema validation
- Named releases with versioned model bundles
- Terraform-style
--planwith SQL validation via EXPLAIN - Build history and promotion tracking in
_meta - Partial builds:
build --select model_name
Multi-DuckDB Architecture
- 3-file isolation: raw (CDC), silver (transforms), feature (analytics)
- Zero-downtime CDC — pipeline never stops CDC
- Reserved schema protection (raw, stage, v0, _meta) with DANGER enforcement
- CDC guardrails: max_lag_bytes, drop_slot_on_exit, health checks
Export
- Parquet export
- CSV export
Developer Experience
- Single binary, zero external dependencies
-
validatecommand for contracts, models, DAG, and SQL -
graphcommand to visualize model dependencies -
historycommand for build and promotion audit trail - SQLite state tracking that survives warehouse rebuilds
-
repaircommand for fixing orphaned builds
| Command | What it does |
|---|---|
refresh |
Snapshot raw.duckdb → silver.duckdb v0 |
validate |
Check contracts, models, DAG, releases, SQL syntax |
build |
Build all models in DAG order |
build --release X --version Y |
Build a specific release |
build --select model_name |
Build one model + its dependencies |
graph |
Show model dependency DAG |
history |
Build + promotion history |
contracts list |
List data contracts |
release list |
List releases |
promote --release X --version Y --env E |
Promote to environment |
inspect tables |
List all DuckDB tables |
cdc status |
Check CDC health |
repair |
Fix orphaned builds, stale locks |
A complete working example with 14 source tables, 6 models, contracts, and releases:
pg-warehouse refresh
pg-warehouse buildSee examples/ecommerce-recipe/README.md for full details including dashboard and AI Q&A.
pg-warehouse uses hexagonal architecture with clean port/adapter separation. The CLI layer (Cobra) calls services that depend only on port interfaces. Adapters for PostgreSQL, DuckDB, SQLite, Parquet, and CSV implement those interfaces. New sources, warehouses, and exporters plug in without changing business logic.
See docs/01-architecture.md for the full design.
The open-source edition covers the full developer workflow: sync, CDC, DuckDB, model graph, contracts, releases, and local export. Production operations -- scheduling, cloud storage export (S3/GCS/Iceberg), remote state, RBAC, and lineage -- are commercial.
See docs/07-open-core.md for the boundary details.
- GitHub Issues -- Bugs and feature requests
- GitHub Discussions -- Questions and ideas
- Contributing -- Development setup and guidelines
- Code of Conduct
- Security Policy
Apache License 2.0 -- Copyright 2025-2026 Burnside Project










