GitHub - burnside-project/pg-warehouse: Local First Data Warehouse with built-in CDC from Postgrsql

pg-warehouse

Mirror PostgreSQL → DuckDB. Build versioned analytical releases. Export Parquet. No cloud required.

Why pg-warehouse?

Getting data out of PostgreSQL for analytics or ML usually means stitching together Python scripts, cron jobs, and a cloud warehouse you don't need. pg-warehouse replaces that with a single binary: sync tables into an embedded DuckDB, build versioned analytical releases, and export to Parquet or CSV. Everything runs locally, on your machine, with no external dependencies.

What makes pg-warehouse a local-first Data Warehouse?

Features	Exist ?
Analytical Storage (Columnar + Optimized for Reads)	✅
Separation from OLTP (Workload Isolation)	✅
SQL Analytical Engine	✅
Data Transformation Layer (ETL / ELT)	✅
Durable Analytical Storage (Files or Tables)	✅

Quick comparison

	pg-warehouse	pg_dump	Airbyte	dbt
PostgreSQL sync	Full, incremental, CDC	Full only	Full, incremental, CDC	--
Local analytics	DuckDB (columnar)	--	--	DuckDB adapter
Parquet/CSV export	Built-in	--	Via connectors	Via packages
Model graph (DAG)	Built-in	--	--	Core strength
Contracts	Built-in	--	--	Via packages
Infrastructure	Single binary	Single binary	Docker + Java	Python + adapter
Cloud required	No	No	Optional	Optional

What does it solve?

A local-first Data Warehouse engine that mirrors PostgreSQL data into DuckDB using native PostgreSQL CDC. Best for teams that want CDC + SQL transforms + Parquet without standing up Kafka, Spark, or a warehouse.

How does it work?

pg-warehouse uses three DuckDB files following Medallion Architecture:

File	Layer	Purpose
`raw.duckdb`	Bronze	CDC black box. Deduped PostgreSQL mirror. CDC owns it exclusively.
`silver.duckdb`	Silver	Development platform. v0 = raw snapshot, v1 = your models.
`feature.duckdb`	Gold	Analytics output. v0 = silver snapshot, v1 = your features.

Models use ref() for dependencies. pg-warehouse resolves the DAG and builds in the correct order.

Refresh — snapshot raw CDC data into silver v0 (14 tables, ~12s)

Build — resolve DAG and build all 6 models: silver → feature → Parquet

Validate — check contracts, models, graph, releases (zero errors = green)

Graph — show model dependency DAG (who depends on whom)

Contracts — list registered data shape contracts

Releases — list versioned release definitions

Promote — deploy a release to an environment (dev/staging/prod)

Inspect — list all tables across all DuckDB files

CDC Status — check replication health (lag, slot, streaming)

Repair — fix orphaned builds and stale locks

Pipeline History

Documentation

Doc	Description
Architecture	Hexagonal design, layers, port interfaces
Quickstart	Full walkthrough with examples
State Database	SQLite schema and semantics
CDC Guide	Logical replication setup and lifecycle
Sync Modes	Full vs. incremental vs. CDC
Configuration	YAML reference
Open-Core Strategy	OSS vs. commercial boundary
Development Workflow	Models, contracts, DAG-resolved builds
Multi-DuckDB Architecture	Zero-downtime CDC with three DuckDB files

pg-warehouse refresh     # snapshot raw → silver v0
pg-warehouse build       # build all models in DAG order

https://github.com/burnside-project/pg-warehouse/blob/main/docs/08-development-workflow.md

Install

Homebrew (macOS / Linux):

brew install burnside-project/tap/pg-warehouse

Go install:

go install github.com/burnside-project/pg-warehouse/cmd/pg-warehouse@latest

Download binary — see Releases for Linux, macOS, and Windows (amd64/arm64).

Build from source:

git clone https://github.com/burnside-project/pg-warehouse.git
cd pg-warehouse
make build

Deployment Layout

pg-warehouse runs from a single working directory. All paths in pg-warehouse.yml are relative to this directory.

~/pg-warehouse/                  # Working directory
├── pg-warehouse                 # Binary
├── pg-warehouse.yml             # Configuration
├── raw.duckdb                   # CDC black box
├── silver.duckdb                # Silver development platform
├── feature.duckdb               # Feature analytics output
├── .pgwh/
│   └── state.db                 # SQLite state (sync/CDC/builds)
├── models/
│   ├── silver/                  # Silver layer SQL models
│   └── features/                # Feature layer SQL models
├── contracts/                   # Data contracts (YAML)
├── releases/                    # Release definitions (YAML)
├── out/                         # Parquet/CSV exports
└── cdc.log                      # CDC log (when running via nohup)

Quickstart (2 minutes)

1. Initialize — creates DuckDB files, state DB, and scaffolds directories:

$ mkdir -p ~/pg-warehouse && cd ~/pg-warehouse
$ pg-warehouse init --config pg-warehouse.yml

2. Start CDC — stream changes from PostgreSQL:

$ pg-warehouse cdc setup --config pg-warehouse.yml
$ nohup pg-warehouse cdc start --config pg-warehouse.yml > cdc.log 2>&1 &

3. Refresh — snapshot raw data into silver v0:

$ pg-warehouse refresh

4. Build — build all models in DAG order:

$ pg-warehouse build

5. Validate — check contracts, models, DAG, releases:

$ pg-warehouse validate

Features

Sync

Full table snapshots
Incremental sync via watermark columns
CDC streaming via PostgreSQL logical replication (pglogrepl)
Automatic sync mode detection per table

Analytical Releases

Models with ref() and source() dependency declarations
DAG-resolved build ordering (topological sort, cycle detection)
Data contracts (YAML) for schema validation
Named releases with versioned model bundles
Terraform-style --plan with SQL validation via EXPLAIN
Build history and promotion tracking in _meta
Partial builds: build --select model_name

Multi-DuckDB Architecture

3-file isolation: raw (CDC), silver (transforms), feature (analytics)
Zero-downtime CDC — pipeline never stops CDC
Reserved schema protection (raw, stage, v0, _meta) with DANGER enforcement
CDC guardrails: max_lag_bytes, drop_slot_on_exit, health checks

Export

Parquet export
CSV export

Developer Experience

Single binary, zero external dependencies
validate command for contracts, models, DAG, and SQL
graph command to visualize model dependencies
history command for build and promotion audit trail
SQLite state tracking that survives warehouse rebuilds
repair command for fixing orphaned builds

Commands

Command	What it does
`refresh`	Snapshot raw.duckdb → silver.duckdb v0
`validate`	Check contracts, models, DAG, releases, SQL syntax
`build`	Build all models in DAG order
`build --release X --version Y`	Build a specific release
`build --select model_name`	Build one model + its dependencies
`graph`	Show model dependency DAG
`history`	Build + promotion history
`contracts list`	List data contracts
`release list`	List releases
`promote --release X --version Y --env E`	Promote to environment
`inspect tables`	List all DuckDB tables
`cdc status`	Check CDC health
`repair`	Fix orphaned builds, stale locks

E-Commerce Recipe

A complete working example with 14 source tables, 6 models, contracts, and releases:

pg-warehouse refresh
pg-warehouse build

See examples/ecommerce-recipe/README.md for full details including dashboard and AI Q&A.

Architecture

pg-warehouse uses hexagonal architecture with clean port/adapter separation. The CLI layer (Cobra) calls services that depend only on port interfaces. Adapters for PostgreSQL, DuckDB, SQLite, Parquet, and CSV implement those interfaces. New sources, warehouses, and exporters plug in without changing business logic.

See docs/01-architecture.md for the full design.

Open Core

The open-source edition covers the full developer workflow: sync, CDC, DuckDB, model graph, contracts, releases, and local export. Production operations -- scheduling, cloud storage export (S3/GCS/Iceberg), remote state, RBAC, and lineage -- are commercial.

See docs/07-open-core.md for the boundary details.

Community

GitHub Issues -- Bugs and feature requests
GitHub Discussions -- Questions and ideas
Contributing -- Development setup and guidelines
Code of Conduct
Security Policy

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
.github		.github
assets		assets
cmd/pg-warehouse		cmd/pg-warehouse
contracts/silver		contracts/silver
docs		docs
examples		examples
internal		internal
models		models
pkg/version		pkg/version
releases		releases
scripts		scripts
test		test
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
GOVERNANCE.md		GOVERNANCE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Why pg-warehouse?

What makes pg-warehouse a local-first Data Warehouse?

Quick comparison

What does it solve?

How does it work?

Refresh — snapshot raw CDC data into silver v0 (14 tables, ~12s)

Build — resolve DAG and build all 6 models: silver → feature → Parquet

Validate — check contracts, models, graph, releases (zero errors = green)

Graph — show model dependency DAG (who depends on whom)

Contracts — list registered data shape contracts

Releases — list versioned release definitions

Promote — deploy a release to an environment (dev/staging/prod)

Inspect — list all tables across all DuckDB files

CDC Status — check replication health (lag, slot, streaming)

Repair — fix orphaned builds and stale locks

Pipeline History

Documentation

Install

Deployment Layout

Quickstart (2 minutes)

Features

Commands

E-Commerce Recipe

Architecture

Open Core

Community

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Why pg-warehouse?

What makes pg-warehouse a local-first Data Warehouse?

Quick comparison

What does it solve?

How does it work?

Refresh — snapshot raw CDC data into silver v0 (14 tables, ~12s)

Build — resolve DAG and build all 6 models: silver → feature → Parquet

Validate — check contracts, models, graph, releases (zero errors = green)

Graph — show model dependency DAG (who depends on whom)

Contracts — list registered data shape contracts

Releases — list versioned release definitions

Promote — deploy a release to an environment (dev/staging/prod)

Inspect — list all tables across all DuckDB files

CDC Status — check replication health (lag, slot, streaming)

Repair — fix orphaned builds and stale locks

Pipeline History

Documentation

Install

Deployment Layout

Quickstart (2 minutes)

Features

Commands

E-Commerce Recipe

Architecture

Open Core

Community

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages