ci(deploy): pull+restart container before register; route deploy through publish-edge only by petrpan26 · Pull Request #5 · beava-dev/beava

petrpan26 · 2026-05-07T19:43:59Z

Every deploy since PR #1 has been hitting 409 `registration_conflict`. PR #3/4 fixed the server-side bug — but the box never picked up the new image because deploy-hetzner.yml never restarts the container. Compose's `pull_policy: always` only matters when you call `docker compose up`; the workflow only ran rsync + curl POST.

Plus a parallel race: when a single PR touches both server-code and website paths, both publish-edge-image and deploy fired simultaneously, deploy ran first against the still-old image, register 409'd before the new image was even built.

Changes

deploy-hetzner.yml

New first step: `docker compose pull beava && up -d --force-recreate --no-deps beava` over SSH, plus a 20-iter health probe (`http://beava:8090/ready\`). Hard-fail with logs if the new image doesn't come up ready in 20s.
Drop the `push` trigger entirely. Every deploy chains off `workflow_run` (publish-edge succeeded) or manual `workflow_dispatch`. One trigger path.

publish-edge-image.yml

Drop the path filter — fires on every push to main. Buildx + cargo caching make non-server commits finish in 2–3 min (cache-hit, manifest re-tag).

Trade-off

Website-only commits no longer auto-deploy in 30s — they wait ~2–3 min for publish-edge's cache-hit cycle. For a single-maintainer project with one prod box, ordering > deploy latency.

Verified

Manual repro of the fix steps unblocked prod:

SSH'd to box, `docker compose pull && up -d --force-recreate beava` → new :edge running
`gh workflow run deploy-hetzner.yml --ref main` → run 25518077565 → success
Pipeline registered cleanly against the new server.

PR feat: SDK-driven pipeline + plain-HTTP frontend, with PR-gating CI hardening #1 introduced the new pipeline shape that exposed the missing pull step.
PR ci(deploy): chain deploy after publish-edge to kill version-mismatch race #2 added the `workflow_run` chain — necessary but not sufficient (didn't drop the push trigger; didn't add pull).
PR fix(register): force=true must replace additive-against-existing descriptors (prod 409 hotfix) #3/fix+refactor(register): consolidate dual diff systems + force=true honors additive-against-existing #4 fixed the server's diff handling — necessary but not sufficient (box never got the fix).
This PR closes the loop.

…y via publish-edge Two bugs combined to break every prod deploy after PR #1: 1. deploy-hetzner.yml never ran 'docker compose pull && up -d'. It rsynced the website and POSTed the register payload but left the running beava container on whatever image was last manually pulled. Even after publish-edge-image built a new :edge digest, the box stayed on the old binary — and the new pipeline shape (PageView with session_id) hit the old server's diff path, returning 409. 2. The push trigger on deploy-hetzner.yml fired in parallel with publish-edge-image when a single PR touched both server-code paths AND website paths (PR #4 did exactly this). Even fixing #1 wouldn't help if deploy fired BEFORE publish-edge finished — the pull would pick up the previous :edge digest. Fix: * deploy-hetzner.yml gains a 'Pull latest beava image + restart container' step that runs FIRST: docker compose pull beava + up -d --force-recreate --no-deps beava + 20s health probe loop. Hard-fail with logs if the new image doesn't come up ready. * deploy-hetzner.yml drops the push trigger entirely. Every deploy chains off publish-edge-image's completion (workflow_run trigger) or workflow_dispatch. One trigger path, no parallel race. * publish-edge-image.yml drops its path filter — fires on every push to main. Buildx cache makes non-server commits finish in 2-3 min (cache-hit on cargo + image layers, just a manifest re-tag). The cost is small CI burn for trivial doc commits; the benefit is deploy is always behind a fresh publish, no races. Trade-off explicit in comments: website-only commits no longer auto-deploy in 30 seconds — they wait for publish-edge's 2-3 min cache-hit cycle. Acceptable for a single-maintainer project where ordering > deploy latency. Verified manually: SSH'd to box, pulled new :edge (consolidated server), restarted container, then triggered deploy via workflow_dispatch. Run 25518077565 — succeeded.

test_apply_drains_more_than_1024_items_per_iteration in phase12_08_drain_until_empty_test.rs is genuinely flaky on CI under shared runner load — the assertion is 'drained MORE than DRAIN_CAP=1024 items in one event-loop iteration', and runner contention sometimes keeps that watermark below 1024 even though the test logic is sound. Hit it twice on PR #6 (a docs-only PR with zero Rust changes), and on PR #5's CI history. Retry 3x via a per-test nextest override. Cite the symptom + reason in the config so future maintainers don't think this is masking a real regression.

petrpan26 merged commit 82b1b10 into main May 7, 2026
8 checks passed

petrpan26 deleted the fix/deploy-pull-and-restart branch May 7, 2026 20:00

This was referenced May 21, 2026

[SEV-1] HTTP wedges 60-90s during snapshot writes — snapshot holds state_tables lock + deep-clones state; wal active segment never rotates #151

Open

fix(snapshot): reclaim covered WAL bytes #152

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(deploy): pull+restart container before register; route deploy through publish-edge only#5

ci(deploy): pull+restart container before register; route deploy through publish-edge only#5
petrpan26 merged 1 commit into
mainfrom
fix/deploy-pull-and-restart

petrpan26 commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

petrpan26 commented May 7, 2026

Changes

Trade-off

Verified

Related

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant