Skip to content

Tech Story: Docker Compose production configuration for Station #108

@GitAddRemote

Description

@GitAddRemote

Tech Story

As a platform engineer, I want a production-ready Docker Compose configuration for Station so that all services (NestJS backend, React frontend, PostgreSQL, Redis) start in the correct order, recover from crashes automatically, and can be updated with near-zero downtime.

ELI5 Context

What is Docker Compose? Think of it as a master control panel for all your app's processes. Instead of manually starting NestJS, then Postgres, then Redis — in the right order, with the right environment variables — one command (docker compose up -d) does it all. Docker Compose also restarts services if they crash, and manages the network so services can talk to each other by name (e.g. the backend talks to postgres instead of localhost:5432).

What is a health check? Before marking a service as "ready," Docker checks that it's actually working — not just that the process started. For Postgres, it runs pg_isready. For the backend, it hits GET /health. This matters because you don't want the backend to start before Postgres is accepting connections.

What is graceful shutdown? When you deploy a new version, Docker sends a "please stop" signal (SIGTERM) to the old container. A gracefully configured NestJS app hears this, finishes any in-progress HTTP requests, then exits cleanly. Without this, in-flight requests get cut off mid-response. stop_grace_period: 30s gives NestJS up to 30 seconds to drain before Docker force-kills it.

Why not put .env in the Docker image? The image is published to a public registry (GitHub Container Registry). If .env were baked in, your database passwords and JWT secrets would be visible to anyone who pulls the image. Instead, environment variables are passed at runtime from a .env.production file that lives only on the VPS and is never committed to git.

Technical Elaboration

New file: docker-compose.prod.yml (repo root)

services:
  backend:
    image: ghcr.io/gitaddremote/station-backend:${STATION_VERSION:-latest}
    restart: unless-stopped
    env_file: .env.production
    ports:
      - "127.0.0.1:3001:3001"   # only accessible from localhost (Nginx proxies)
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://localhost:3001/health"]
      interval: 15s
      timeout: 5s
      retries: 3
      start_period: 30s
    stop_grace_period: 30s

  frontend:
    image: ghcr.io/gitaddremote/station-frontend:${STATION_VERSION:-latest}
    restart: unless-stopped
    ports:
      - "127.0.0.1:3000:80"     # Nginx inside container serves React on port 80
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://localhost:80"]
      interval: 15s
      timeout: 5s
      retries: 3

  postgres:
    image: postgres:16-alpine
    restart: unless-stopped
    env_file: .env.production
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${DATABASE_USER} -d ${DATABASE_NAME}"]
      interval: 10s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    restart: unless-stopped
    command: redis-server --requirepass ${REDIS_PASSWORD}
    volumes:
      - redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
      interval: 10s
      timeout: 3s
      retries: 3

volumes:
  postgres_data:
  redis_data:

New file: .env.production.example (committed to repo)

Template of all required environment variables with placeholder values. The real .env.production lives only on the VPS at /opt/station/.env.production and is never committed.

Add new variables required by this issue:

REDIS_PASSWORD=changeme
STATION_VERSION=latest

Backend change: backend/src/main.ts

Add app.enableShutdownHooks() — this is what allows NestJS to listen for SIGTERM and drain in-flight requests before exiting. One line change.

Backend change: add GET /health endpoint

If it doesn't exist: a new HealthController in backend/src/health/ that returns { status: 'ok' } with HTTP 200. Used by Docker health checks and Nginx upstreams.

Update Nginx config from #107

Update infra/nginx/station.drdnt.org.conf to proxy to localhost:3000 (frontend container).

New file: infra/scripts/deploy.sh

The script GitHub Actions runs on the VPS via SSH:

#!/bin/bash
set -euo pipefail
cd /opt/station
docker compose -f docker-compose.prod.yml pull
docker compose -f docker-compose.prod.yml up -d --no-deps backend frontend
docker compose -f docker-compose.prod.yml ps

--no-deps restarts only the named services, leaving Postgres and Redis untouched during routine deploys.

Definition of Done

  • docker-compose.prod.yml committed with all four services (backend, frontend, postgres, redis)
  • All ports bound to 127.0.0.1 (not 0.0.0.0) — services not directly internet-accessible
  • Health checks configured for all services; depends_on with condition: service_healthy for backend
  • stop_grace_period: 30s on backend
  • app.enableShutdownHooks() added to backend/src/main.ts
  • GET /health endpoint returns 200 (backend)
  • .env.production.example committed; real .env.production in .gitignore
  • infra/scripts/deploy.sh written and executable
  • docker compose -f docker-compose.prod.yml up -d starts all services successfully on the VPS
  • All services show healthy in docker compose ps

Dependencies

Metadata

Metadata

Assignees

Labels

backendBackend services and logicconfigConfiguration and feature flagsdatabaseSchema, migrations, indexingfrontendFrontend app and dashboardtech-storyTechnical implementation story

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions