Skip to content

R2 Backup System #284

@jrf0110

Description

@jrf0110

Part of #271 — Gastown Cloud Proposal A (Sandbox-per-Town)

Goal

Implement disaster-recovery backup of gastown state (Dolt databases, git repos, config files) to Cloudflare R2. This is a backup layer — the Fly persistent volume is the primary storage. R2 handles catastrophic volume loss and cross-region migration.

Context

Fly.io persistent volumes survive machine restarts and stops but not volume destruction or region migration. R2 backups ensure recoverability with a worst-case data loss window of 5 minutes.

Requirements

R2 Key Structure

gastown/{town_id}/
  ├── latest.json                    # Pointer: { "timestamp": "20260217T120000Z" }
  ├── snapshots/{timestamp}/
  │   ├── manifest.json              # Files included, checksums, gt version
  │   ├── dolt/{rig_name}.backup     # `dolt backup` output per rig
  │   ├── git/{rig_name}.bundle      # `git bundle create --all` per rig
  │   ├── config.tar                 # Town + rig config files
  │   └── runtime.tar               # .runtime/ checkpoint files
  └── incremental/                   # Future: incremental deltas

Sync Daemon (r2-sync-daemon.sh)

Runs as a background process inside the sandbox, triggered every 5 minutes.

  1. Acquire flock (/tmp/r2-sync.lock) to prevent concurrent syncs
  2. Create snapshot directory
  3. For each rig: dolt backup → snapshot dir
  4. For each rig: git bundle create --all (bare repo only, not worktrees) → snapshot dir
  5. Tar config files (settings/, */settings/)
  6. Tar runtime state (.runtime/)
  7. Write manifest.json with checksums (sha256)
  8. Upload to R2 staging prefix
  9. Update latest.json pointer (atomic swap)
  10. Cleanup: keep last 3 snapshots, delete older
  11. Report sync time to cloud API: POST /api/gastown/heartbeat

Support --immediate flag for SIGTERM flush (skip timer, run once).

Restore Script (r2-restore.sh)

Runs on container startup (called by startup.sh from PR 1).

  1. Check if volume already has data → if yes, skip restore (volume-persisted state takes priority)
  2. Fetch latest.json from R2
  3. Download snapshot files
  4. For each rig: dolt backup restore
  5. For each rig: git clone --bare <bundle> → recreate worktrees from branches
  6. Extract config + runtime tarballs
  7. Verify Dolt integrity: dolt verify-constraints
  8. Report restore status to cloud API

SIGTERM Integration

Update startup.sh (from PR 1) to add:

cleanup() {
  /usr/local/bin/r2-sync-daemon.sh --immediate
  gt down
  exit 0
}
trap cleanup SIGTERM SIGINT

R2 Client Configuration

  • Use existing R2 client infrastructure (cloud/src/lib/r2/client.ts)
  • New bucket or prefix: gastown-backups
  • Sandbox needs R2 credentials as env vars: R2_ENDPOINT, R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, R2_BUCKET
  • Inside the sandbox, use aws CLI (S3-compatible) or a small upload script

Files

  • cloud/infra/gastown-sandbox/r2-sync-daemon.sh
  • cloud/infra/gastown-sandbox/r2-restore.sh
  • Updates to cloud/infra/gastown-sandbox/startup.sh (SIGTERM handler)

Acceptance Criteria

  • Sync daemon runs on a 5-minute timer
  • dolt backup produces restorable snapshots for each rig
  • git bundle create --all produces valid bundles for each rig
  • Config and runtime files are tarred and uploaded
  • manifest.json includes sha256 checksums for all files
  • latest.json pointer is updated atomically (staging prefix → swap)
  • Old snapshots are cleaned up (keep last 3)
  • Restore script skips if volume already has data
  • Restore script successfully restores Dolt + git + config from R2
  • dolt verify-constraints passes after restore
  • Git worktrees are recreated from restored bare repo
  • --immediate flag runs sync once and exits
  • SIGTERM handler flushes to R2 before shutdown
  • Heartbeat reported to cloud API after each sync

Dependencies

  • PR 1 (Sandbox Docker Image) — startup.sh integration
  • PR 2 (Provisioning API) — R2 credentials provisioned as env vars

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions