-
Notifications
You must be signed in to change notification settings - Fork 0
Tech Story: Docker Compose production configuration for Station #108
Description
Tech Story
As a platform engineer, I want a production-ready Docker Compose configuration for Station so that all services (NestJS backend, React frontend, PostgreSQL, Redis) start in the correct order, recover from crashes automatically, and can be updated with near-zero downtime.
ELI5 Context
What is Docker Compose? Think of it as a master control panel for all your app's processes. Instead of manually starting NestJS, then Postgres, then Redis — in the right order, with the right environment variables — one command (docker compose up -d) does it all. Docker Compose also restarts services if they crash, and manages the network so services can talk to each other by name (e.g. the backend talks to postgres instead of localhost:5432).
What is a health check? Before marking a service as "ready," Docker checks that it's actually working — not just that the process started. For Postgres, it runs pg_isready. For the backend, it hits GET /health. This matters because you don't want the backend to start before Postgres is accepting connections.
What is graceful shutdown? When you deploy a new version, Docker sends a "please stop" signal (SIGTERM) to the old container. A gracefully configured NestJS app hears this, finishes any in-progress HTTP requests, then exits cleanly. Without this, in-flight requests get cut off mid-response. stop_grace_period: 30s gives NestJS up to 30 seconds to drain before Docker force-kills it.
Why not put .env in the Docker image? The image is published to a public registry (GitHub Container Registry). If .env were baked in, your database passwords and JWT secrets would be visible to anyone who pulls the image. Instead, environment variables are passed at runtime from a .env.production file that lives only on the VPS and is never committed to git.
Technical Elaboration
New file: docker-compose.prod.yml (repo root)
services:
backend:
image: ghcr.io/gitaddremote/station-backend:${STATION_VERSION:-latest}
restart: unless-stopped
env_file: .env.production
ports:
- "127.0.0.1:3001:3001" # only accessible from localhost (Nginx proxies)
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
healthcheck:
test: ["CMD", "wget", "-qO-", "http://localhost:3001/health"]
interval: 15s
timeout: 5s
retries: 3
start_period: 30s
stop_grace_period: 30s
frontend:
image: ghcr.io/gitaddremote/station-frontend:${STATION_VERSION:-latest}
restart: unless-stopped
ports:
- "127.0.0.1:3000:80" # Nginx inside container serves React on port 80
healthcheck:
test: ["CMD", "wget", "-qO-", "http://localhost:80"]
interval: 15s
timeout: 5s
retries: 3
postgres:
image: postgres:16-alpine
restart: unless-stopped
env_file: .env.production
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${DATABASE_USER} -d ${DATABASE_NAME}"]
interval: 10s
timeout: 5s
retries: 5
redis:
image: redis:7-alpine
restart: unless-stopped
command: redis-server --requirepass ${REDIS_PASSWORD}
volumes:
- redis_data:/data
healthcheck:
test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
interval: 10s
timeout: 3s
retries: 3
volumes:
postgres_data:
redis_data:New file: .env.production.example (committed to repo)
Template of all required environment variables with placeholder values. The real .env.production lives only on the VPS at /opt/station/.env.production and is never committed.
Add new variables required by this issue:
REDIS_PASSWORD=changeme
STATION_VERSION=latestBackend change: backend/src/main.ts
Add app.enableShutdownHooks() — this is what allows NestJS to listen for SIGTERM and drain in-flight requests before exiting. One line change.
Backend change: add GET /health endpoint
If it doesn't exist: a new HealthController in backend/src/health/ that returns { status: 'ok' } with HTTP 200. Used by Docker health checks and Nginx upstreams.
Update Nginx config from #107
Update infra/nginx/station.drdnt.org.conf to proxy to localhost:3000 (frontend container).
New file: infra/scripts/deploy.sh
The script GitHub Actions runs on the VPS via SSH:
#!/bin/bash
set -euo pipefail
cd /opt/station
docker compose -f docker-compose.prod.yml pull
docker compose -f docker-compose.prod.yml up -d --no-deps backend frontend
docker compose -f docker-compose.prod.yml ps--no-deps restarts only the named services, leaving Postgres and Redis untouched during routine deploys.
Definition of Done
-
docker-compose.prod.ymlcommitted with all four services (backend, frontend, postgres, redis) - All ports bound to
127.0.0.1(not0.0.0.0) — services not directly internet-accessible - Health checks configured for all services;
depends_onwithcondition: service_healthyfor backend -
stop_grace_period: 30son backend -
app.enableShutdownHooks()added tobackend/src/main.ts -
GET /healthendpoint returns 200 (backend) -
.env.production.examplecommitted; real.env.productionin.gitignore -
infra/scripts/deploy.shwritten and executable -
docker compose -f docker-compose.prod.yml up -dstarts all services successfully on the VPS - All services show healthy in
docker compose ps
Dependencies
- Depends on: Tech Story: Harden Dockerfiles for production #102 (Dockerfiles must be hardened before images are built for prod)
- Depends on: Tech Story: VPS baseline provisioning (Nginx, Certbot, deploy user) #107 (VPS must exist and Nginx must be configured before testing)
- Blocks: Tech Story: GitHub Actions CI/CD — release-tag SSH deploy with graceful restart #90 (CI/CD pipeline uses
deploy.sh)