Skip to content

bfozone/dagster

Repository files navigation

Dagster ETL Platform

Production-grade Dagster deployment on Docker Compose with isolated run execution, designed for future migration to OpenShift.

Architecture

                    ┌─────────────┐
                    │   Browser   │
                    └──────┬──────┘
                           │ :3000
                    ┌──────▼──────┐
                    │  Webserver  │──────┐
                    └──────┬──────┘      │
                           │             │ docker.sock
                    ┌──────▼──────┐      │
         ┌─────────│   Daemon    │──────┐│
         │         └─────────────┘      ││
         │ gRPC :4000                   ▼▼
   ┌─────▼───────┐              ┌──────────────┐
   │ Code Location│              │ Run Container │ (ephemeral)
   └─────┬───────┘              └──────┬───────┘
         │                             │
         └──────────┬──────────────────┘
              ┌─────▼─────┐
              │ PostgreSQL │
              └───────────┘
Service Role
postgresql Dagster storage (runs, events, schedules)
dagster-webserver Web UI on port 3000
dagster-daemon Schedules, sensors, run queue coordinator (singleton)
dagster-code-location gRPC server hosting pipeline definitions
run containers Ephemeral, one per pipeline run (DockerRunLauncher)

Quick Start

# 1. Create environment file
cp .env.example .env
# Edit .env and set DAGSTER_PG_PASSWORD, DOCKER_GID

# 2. Build and start
make up

# 3. Open the UI
open http://localhost:3000

Make Commands

Command Description
make up Build images and start all services
make down Stop all services
make restart Full restart (down + up)
make logs Tail all logs (make logs SVC=dagster-daemon for one service)
make ps Show container health status
make sync Install Python deps locally with uv
make test Run tests with pytest
make lint Check code with ruff
make format Auto-format code with ruff
make typecheck Run basedpyright against etl_pipelines
make dev Run the local Dagster dev server with .env loaded
make lock Regenerate uv.lock after changing dependencies
make shell Shell into the code-location container
make clean Stop and delete all volumes (fresh start)

Project Structure

├── docker-compose.yml         # All services, networks, volumes
├── .env.example               # Environment variable template
├── Makefile                   # Common commands
├── system/
│   ├── Dockerfile             # Webserver + daemon image
│   ├── dagster.yaml           # Instance config (storage, run launcher, coordinator)
│   └── workspace.yaml         # Code location pointer
└── user-code/
    ├── Dockerfile             # Code location + run container image
    ├── container_entrypoint.py # Repairs volume permissions before dropping privileges
    ├── pyproject.toml         # Python dependencies (managed with uv)
    ├── tests/                 # Integration and unit tests
    ├── uv.lock                # Deterministic lockfile
    └── etl_pipelines/
        ├── definitions.py     # Dagster Definitions entry point
        ├── assets/
        │   ├── ingestion.py   # raw_orders, raw_customers (daily partitioned)
        │   └── transformation.py  # cleaned_orders, customer_summary
        ├── resources/
        │   └── database.py    # ConfigurableResource for DB connections
        └── schedules/
            └── daily.py       # Daily ETL schedule (06:00 UTC)

Key Design Decisions

  • Two Docker images: dagster-system (webserver/daemon) and dagster-user-code (code-location/runs). Deploy pipeline code independently of infrastructure.
  • DockerRunLauncher: Each run gets its own isolated container. No resource contention between runs.
  • PolarsParquetIOManager: Assets return Polars DataFrames, automatically serialized to Parquet on a shared volume in Docker and to user-code/.dagster/io during local development.
  • Daily partitions: All assets are partitioned by date. Backfill specific dates from the UI.
  • uv: Fast dependency management with deterministic lockfile. --frozen flag in Docker ensures builds fail if lockfile is stale.
  • Non-root containers: All Dockerfiles use a dagster user with group-writable dirs (chmod 775) for OpenShift compatibility.
  • 12-factor config: All settings via environment variables.

Adding a New Asset

# user-code/etl_pipelines/assets/my_asset.py
import polars as pl
import dagster as dg

from etl_pipelines.assets.ingestion import daily_partitions

@dg.asset(
    group_name="my_group",
    partitions_def=daily_partitions,
)
def my_asset(context: dg.AssetExecutionContext) -> pl.DataFrame:
    partition_date = context.partition_key
    # your logic here
    return pl.DataFrame({"col": [1, 2, 3]})

Then register it in definitions.py:

from etl_pipelines.assets.my_asset import my_asset

defs = dg.Definitions(
    assets=[..., my_asset],
    ...
)

Rebuild: make up

Adding a Dependency

cd user-code
uv add pandas        # adds to pyproject.toml + updates uv.lock
make up              # rebuild with new dep

Local Development

make sync                              # install deps locally
make dev                               # load .env, then run Dagster dev locally

Tech Stack

Component Version
Python 3.14
Dagster 1.12.x
Polars 1.x
PostgreSQL 16 (Alpine)
uv latest

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors