This guide walks you through setting up a small warehouse in Postgres using dbt, from database DDL to dbt project initialization, sources, staging, intermediate, and mart models, plus tests and docs. All steps are explicit and copy/paste-ready.
- Target warehouse: Postgres
- Example domain: simple e-commerce (customers, orders, payments)
- Schemas used:
raw
(source tables) andanalytics
(dbt models)
- Postgres 12+ installed and running locally (or accessible remotely).
- Python 3.9+ with
pip
. - dbt for Postgres:
dbt-core
anddbt-postgres
. - Ability to run
psql
(or a GUI like pgAdmin; commands below usepsql
).
Install dbt packages in a virtual environment (recommended):
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install dbt-core dbt-postgres
Verify:
dbt --version
Connect to your Postgres instance as a superuser (e.g., postgres
) and create a dedicated role, database, and schemas:
-- 2.1 Create role and database
CREATE ROLE dbt_user WITH LOGIN PASSWORD 'dbt_password';
CREATE DATABASE dbt_tutorial OWNER dbt_user;
-- Connect to the new database
\c dbt_tutorial
-- 2.2 Create schemas
CREATE SCHEMA IF NOT EXISTS raw AUTHORIZATION dbt_user;
CREATE SCHEMA IF NOT EXISTS analytics AUTHORIZATION dbt_user;
-- 2.3 Grants (simple approach for local dev)
GRANT USAGE ON SCHEMA raw, analytics TO dbt_user;
GRANT CREATE ON SCHEMA analytics TO dbt_user; -- dbt builds models here
Create raw source tables and load small sample data:
-- 2.4 Raw tables
CREATE TABLE IF NOT EXISTS raw.customers (
id INTEGER PRIMARY KEY,
first_name TEXT,
last_name TEXT,
email TEXT,
created_at TIMESTAMP WITHOUT TIME ZONE
);
CREATE TABLE IF NOT EXISTS raw.orders (
id INTEGER PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES raw.customers(id),
order_date TIMESTAMP WITHOUT TIME ZONE,
status TEXT, -- e.g., placed, shipped, completed, returned, cancelled
total_amount NUMERIC(10,2)
);
CREATE TABLE IF NOT EXISTS raw.payments (
id INTEGER PRIMARY KEY,
order_id INTEGER NOT NULL REFERENCES raw.orders(id),
payment_method TEXT, -- e.g., credit_card, bank_transfer, coupon, gift_card
amount NUMERIC(10,2) NOT NULL,
paid_at TIMESTAMP WITHOUT TIME ZONE
);
-- 2.5 Sample data
INSERT INTO raw.customers (id, first_name, last_name, email, created_at) VALUES
(1, 'Alice', 'Nguyen', 'alice@example.com', '2023-01-05 10:00:00'),
(2, 'Brian', 'Smith', 'brian@example.com', '2023-02-10 09:13:00'),
(3, 'Carla', 'Lopez', 'carla@example.com', '2023-03-21 15:45:00'),
(4, 'Diego', 'Martins', 'diego@example.com', '2023-04-12 08:05:00'),
(5, 'Emily', 'Zhao', 'emily@example.com', '2023-05-30 20:30:00')
ON CONFLICT DO NOTHING;
INSERT INTO raw.orders (id, user_id, order_date, status, total_amount) VALUES
(100, 1, '2023-06-01 10:15:00', 'placed', 100.00),
(101, 1, '2023-06-02 13:00:00', 'completed', 60.00),
(102, 2, '2023-06-03 16:20:00', 'returned', 35.00),
(103, 3, '2023-06-04 18:40:00', 'placed', 75.00),
(104, 5, '2023-06-05 08:05:00', 'cancelled', 150.00)
ON CONFLICT DO NOTHING;
INSERT INTO raw.payments (id, order_id, payment_method, amount, paid_at) VALUES
(1000, 100, 'credit_card', 60.00, '2023-06-01 10:20:00'),
(1001, 100, 'gift_card', 40.00, '2023-06-01 10:21:00'),
(1002, 101, 'credit_card', 60.00, '2023-06-02 13:05:00'),
(1003, 102, 'credit_card', 35.00, '2023-06-03 16:25:00'),
(1004, 104, 'credit_card', 150.00, '2023-06-05 08:07:00')
ON CONFLICT DO NOTHING;
Notes:
- We intentionally include varied statuses and multiple payments per order to demonstrate downstream modeling.
- In real projects, raw tables come from ingestion tools; here we create them directly for learning.
From your chosen workspace directory, initialize a new dbt project (keep your virtualenv active):
mkdir -p ~/workspace && cd ~/workspace
dbt init dbt_pg_tutorial
During dbt init
, choose postgres
and provide connection details. If you prefer editing manually, configure your profile as below.
Edit ~/.dbt/profiles.yml
and add a profile named dbt_pg_tutorial
to match your project name:
# ~/.dbt/profiles.yml
dbt_pg_tutorial:
target: dev
outputs:
dev:
type: postgres
host: 127.0.0.1
user: dbt_user
password: dbt_password
port: 5432
dbname: dbt_tutorial
schema: analytics # dbt creates/uses this schema for models
threads: 4
keepalives_idle: 0
Verify connectivity:
cd dbt_pg_tutorial
dbt debug
Inside the project folder you’ll have dbt_project.yml
and a models/
directory. We’ll organize models by layer:
models/
├── sources.yml # source definitions for raw tables
├── staging/ # cleaned replicas of sources
│ ├── stg_customers.sql
│ ├── stg_orders.sql
│ └── stg_payments.sql
├── intermediate/ # business logic joins/aggregations
│ └── int_order_payments.sql
└── marts/ # final dimensional models
├── dim_customers.sql
└── fct_orders.sql
Ensure dbt_project.yml
recognizes these folders (dbt’s default is fine). Example minimal content:
# dbt_project.yml (inside dbt_pg_tutorial)
name: 'dbt_pg_tutorial'
version: '1.0.0'
config-version: 2
profile: 'dbt_pg_tutorial'
models:
dbt_pg_tutorial:
+materialized: view # default; override per-model when needed
staging:
+schema: staging # optional: place staging views in analytics_staging
intermediate:
+schema: intermediate
marts:
+materialized: table
+schema: marts
Create models/sources.yml
to point dbt at the raw tables:
version: 2
sources:
- name: raw
database: dbt_tutorial
schema: raw
tables:
- name: customers
- name: orders
- name: payments
Staging models standardize column names, types, and apply light cleaning.
Create models/staging/stg_customers.sql
:
{{ config(materialized='view') }}
with source as (
select * from {{ source('raw', 'customers') }}
),
renamed as (
select
id as customer_id,
lower(trim(first_name)) as first_name,
lower(trim(last_name)) as last_name,
lower(trim(email)) as email,
created_at
from source
)
select * from renamed;
Create models/staging/stg_orders.sql
:
{{ config(materialized='view') }}
with source as (
select * from {{ source('raw', 'orders') }}
),
casted as (
select
id as order_id,
user_id as customer_id,
order_date::timestamp as order_timestamp,
lower(status) as status,
total_amount::numeric(10,2) as total_amount
from source
)
select * from casted;
Create models/staging/stg_payments.sql
:
{{ config(materialized='view') }}
with source as (
select * from {{ source('raw', 'payments') }}
),
casted as (
select
id as payment_id,
order_id,
lower(payment_method) as payment_method,
amount::numeric(10,2) as amount,
paid_at::timestamp as paid_at
from source
)
select * from casted;
Aggregate payments by order and compute totals. This model is helpful for the fact table.
{{ config(materialized='view') }}
with payments as (
select
order_id,
sum(amount) as total_paid,
count(*) as payment_count
from {{ ref('stg_payments') }}
group by 1
)
select
o.order_id,
o.customer_id,
o.order_timestamp,
o.status,
o.total_amount,
coalesce(p.total_paid, 0) as total_paid,
coalesce(p.payment_count, 0) as payment_count,
(o.total_amount - coalesce(p.total_paid, 0)) as outstanding_amount
from {{ ref('stg_orders') }} o
left join payments p on o.order_id = p.order_id;
Create a dimension table for customers and a fact table for orders.
Create models/marts/dim_customers.sql
:
{{ config(materialized='table') }}
select
c.customer_id,
initcap(c.first_name) as first_name,
initcap(c.last_name) as last_name,
c.email,
c.created_at
from {{ ref('stg_customers') }} c;
Create models/marts/fct_orders.sql
:
{{ config(materialized='table') }}
select
i.order_id,
i.customer_id,
i.order_timestamp,
i.status,
i.total_amount,
i.total_paid,
i.payment_count,
i.outstanding_amount,
case
when i.outstanding_amount = 0 and i.total_amount > 0 then true
else false
end as is_fully_paid
from {{ ref('int_order_payments') }} i;
Add basic tests for uniqueness and not-null constraints, plus descriptions for docs:
version: 2
models:
- name: stg_customers
description: "Cleaned customers from raw.customers"
columns:
- name: customer_id
tests: [not_null, unique]
- name: email
tests: [not_null]
- name: stg_orders
description: "Cleaned orders from raw.orders"
columns:
- name: order_id
tests: [not_null, unique]
- name: customer_id
tests: [not_null]
- name: stg_payments
description: "Cleaned payments from raw.payments"
columns:
- name: payment_id
tests: [not_null, unique]
- name: order_id
tests: [not_null]
- name: int_order_payments
description: "Aggregated payments per order"
columns:
- name: order_id
tests: [not_null, unique]
- name: dim_customers
description: "Customer dimension"
columns:
- name: customer_id
tests: [not_null, unique]
- name: fct_orders
description: "Order fact table with payments info"
columns:
- name: order_id
tests: [not_null, unique]
- name: customer_id
tests: [not_null]
From the project root (dbt_pg_tutorial
):
# Validate profile/connection
dbt debug
# Build models by layer
dbt run -m staging
dbt run -m intermediate
dbt run -m marts
# Or build everything with tests
dbt build # (equivalent to run + test + snapshot if any)
Run tests and generate docs:
dbt test
dbt docs generate
dbt docs serve # open the web UI with model DAG + docs
In psql
, inspect the built models in the analytics
schema (or analytics_marts
etc., depending on your dbt_project.yml
schema config):
-- Top customers by total fully-paid order amount
select d.customer_id, d.first_name, d.last_name, sum(f.total_amount) as total
from analytics.marts__dim_customers d
join analytics.marts__fct_orders f on d.customer_id = f.customer_id
where f.is_fully_paid
group by 1,2,3
order by total desc;
-- Orders with outstanding amounts
select * from analytics.marts__fct_orders where outstanding_amount > 0;
Note: dbt composes final schema names based on your dbt_project.yml
folder-level +schema
settings. For example, if you set marts
folder to +schema: marts
, the final relation might be analytics_marts.fct_orders
depending on your adapter’s schema naming; some teams prefer a single analytics
schema with folder prefixes added to table names. Adjust folder +schema
settings to your conventions.
If you prefer CSV seeds managed by dbt instead of manual DDL, you can:
- Place CSV files under
seeds/
(e.g.,seeds/customers.csv
,seeds/orders.csv
,seeds/payments.csv
). - Configure a
seeds:
block indbt_project.yml
if needed. - Run
dbt seed
to load data into your target schema (commonly notraw
, but some teams use araw
schema and setseeds: +schema: raw
).
Example dbt_project.yml
snippet:
seeds:
dbt_pg_tutorial:
+schema: raw
Then run:
dbt seed
And update sources.yml
to reference the seeded tables if you load them into raw
.
FATAL: password authentication failed
→ Checkprofiles.yml
credentials and whether the role has access to the database.permission denied for schema analytics
→ Ensuredbt_user
has USAGE and CREATE on the target schema.database "dbt_tutorial" does not exist
→ Make sure you created it and are connecting to the correct host.- Model not found errors → Verify file names and that you’re using
ref('model_name')
andsource('source','table')
correctly. - Docs not opening → If
dbt docs serve
is blocked (firewall), usedbt docs generate
and opentarget/index.html
in a browser manually.
You now have an end-to-end dbt + Postgres setup: raw tables, dbt sources, staging, intermediate, and mart layers, with tests and docs. Customize models, add more sources, and expand marts as your analytics needs grow.