Tech Story
As a platform engineer, I want Redis to persist data across restarts and the deploy pipeline to capture a database snapshot before running migrations, so that I never lose cached session data due to a Redis crash and always have a safe rollback point before any schema change.
ELI5 Context
What is RDB vs AOF persistence in Redis?
By default Redis saves a snapshot of all data to disk every few minutes (RDB). If Redis crashes between snapshots, you lose everything written in that window — typically 1–5 minutes of data. AOF (Append-Only File) persistence writes every single command to a log file as it happens. With appendfsync everysec, the worst-case data loss is 1 second. Think of RDB as saving a Word document every 5 minutes vs AOF as Google Docs auto-saving every keystroke.
Why does a migration need a backup before it runs?
A database migration changes the schema — adding columns, dropping tables, renaming fields. If a migration runs halfway and then fails, the database can be left in an inconsistent state that prevents the app from starting. With a backup taken 30 seconds before the migration, the worst-case recovery is: restore backup, redeploy previous image, investigate. Without it, recovery means manually reverse-engineering what the half-run migration did.
What is appendfsync everysec?
This Redis config option tells Redis to flush the AOF log to disk every second. It's the recommended balance between performance (not flushing every single write) and safety (losing at most 1 second of data). The alternative always (flush every write) is too slow; no (let the OS decide) gives no guarantees.
Technical Elaboration
File: docker-compose.prod.yml — Redis service changes
redis:
image: redis:7-alpine
restart: unless-stopped
command: redis-server --requirepass ${REDIS_PASSWORD} --appendonly yes --appendfsync everysec
volumes:
- redis_aof:/data # named volume — persists AOF file across container restarts
healthcheck:
test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
interval: 10s
timeout: 3s
retries: 3
volumes:
postgres_data:
redis_aof: # renamed from redis_data to reflect AOF usage
Apply the same change to docker-compose.staging.yml — staging should mirror production configuration.
Important: if you previously had a volume named redis_data, rename it to redis_aof in the compose file and run docker volume rename station_redis_data station_redis_aof on the VPS, or simply recreate the Redis container (Redis data is ephemeral cache — loss is acceptable during a planned rename).
Verifying AOF is active
After restarting the Redis container:
docker exec station-redis-1 redis-cli -a "${REDIS_PASSWORD}" INFO persistence | grep aof_enabled
# Expected output: aof_enabled:1
File: .github/workflows/release.yml — pre-migration backup step
In the deploy-production job, add this step before the migration step:
- name: Pre-migration database backup
run: |
ssh ${{ secrets.VPS_USER }}@${{ secrets.VPS_HOST }} \
"cd /opt/station && \
LABEL=pre-deploy-${{ github.sha }} \
bash infra/scripts/backup-db.sh"
# If this step fails, the workflow stops — migrations never run against an un-backed-up database
The backup-db.sh script already exists from issue #125. This step just calls it with a label so the backup filename includes the git SHA for traceability.
File: infra/scripts/backup-db.sh — label support (small change)
Add an optional LABEL environment variable to the backup filename:
LABEL="${LABEL:-nightly}"
BACKUP_FILE="/tmp/station_backup_${TIMESTAMP}_${LABEL}.sql.gz"
So pre-deploy backups are named station_backup_20260510_030000_pre-deploy-abc1234.sql.gz and are distinguishable from nightly backups in B2.
New file: infra/docs/redis.md
Document:
- Why AOF is enabled (data safety, 1-second loss window)
- How to verify AOF is active (
INFO persistence)
- What to do if Redis data is lost (it's a cache — the app degrades gracefully, no restore needed; sessions are re-created on next login)
- How to inspect the AOF file size:
docker exec station-redis-1 redis-cli -a "${REDIS_PASSWORD}" INFO persistence | grep aof_current_size
Definition of Done
Dependencies
Tech Story
As a platform engineer, I want Redis to persist data across restarts and the deploy pipeline to capture a database snapshot before running migrations, so that I never lose cached session data due to a Redis crash and always have a safe rollback point before any schema change.
ELI5 Context
What is RDB vs AOF persistence in Redis?
By default Redis saves a snapshot of all data to disk every few minutes (RDB). If Redis crashes between snapshots, you lose everything written in that window — typically 1–5 minutes of data. AOF (Append-Only File) persistence writes every single command to a log file as it happens. With
appendfsync everysec, the worst-case data loss is 1 second. Think of RDB as saving a Word document every 5 minutes vs AOF as Google Docs auto-saving every keystroke.Why does a migration need a backup before it runs?
A database migration changes the schema — adding columns, dropping tables, renaming fields. If a migration runs halfway and then fails, the database can be left in an inconsistent state that prevents the app from starting. With a backup taken 30 seconds before the migration, the worst-case recovery is: restore backup, redeploy previous image, investigate. Without it, recovery means manually reverse-engineering what the half-run migration did.
What is
appendfsync everysec?This Redis config option tells Redis to flush the AOF log to disk every second. It's the recommended balance between performance (not flushing every single write) and safety (losing at most 1 second of data). The alternative
always(flush every write) is too slow;no(let the OS decide) gives no guarantees.Technical Elaboration
File:
docker-compose.prod.yml— Redis service changesApply the same change to
docker-compose.staging.yml— staging should mirror production configuration.Important: if you previously had a volume named
redis_data, rename it toredis_aofin the compose file and rundocker volume rename station_redis_data station_redis_aofon the VPS, or simply recreate the Redis container (Redis data is ephemeral cache — loss is acceptable during a planned rename).Verifying AOF is active
After restarting the Redis container:
File:
.github/workflows/release.yml— pre-migration backup stepIn the
deploy-productionjob, add this step before the migration step:The
backup-db.shscript already exists from issue #125. This step just calls it with a label so the backup filename includes the git SHA for traceability.File:
infra/scripts/backup-db.sh— label support (small change)Add an optional
LABELenvironment variable to the backup filename:So pre-deploy backups are named
station_backup_20260510_030000_pre-deploy-abc1234.sql.gzand are distinguishable from nightly backups in B2.New file:
infra/docs/redis.mdDocument:
INFO persistence)docker exec station-redis-1 redis-cli -a "${REDIS_PASSWORD}" INFO persistence | grep aof_current_sizeDefinition of Done
docker-compose.prod.ymlRedis service uses--appendonly yes --appendfsync everysecdocker-compose.staging.ymlRedis service matches production configurationredis_aofin both compose filesdocker exec ... redis-cli INFO persistence | grep aof_enabledreturnsaof_enabled:1on the VPS.github/workflows/release.ymlhas a pre-migration backup step that runs beforemigration:runinfra/scripts/backup-db.shaccepts optionalLABELenv var and includes it in the backup filenameinfra/docs/redis.mdwritten covering AOF, verification, and recoveryDependencies