ci(deploy): chain deploy after publish-edge to kill version-mismatch race by petrpan26 · Pull Request #2 · beava-dev/beava

petrpan26 · 2026-05-07T17:32:44Z

Summary

Today's PR #1 merge hit a workflow race: deploy-hetzner and publish-edge-image both fired on the same commit, ran in parallel, and deploy gave up before the new image was published. The pipeline re-register hit HTTP 409 because the old running image didn't honor force=true for destructive diffs.

This chains them: deploy now waits for publish-edge-image to succeed before firing.

What changed

deploy-hetzner.yml on: adds workflow_run trigger on publish-edge completion (branches: main).
Job-level if: skips the deploy if the upstream failed — a broken image never reaches prod.
The original push: trigger stays for website-only changes (path-filtered, no server rebuild needed).

Behavior matrix

Change shape	publish-edge fires?	deploy fires
Server code (`crates/**`)	yes	after publish succeeds
Website only (`beava-website/project/**`)	no	immediately on push
Both	yes	after publish succeeds (single deploy run)
`register_pipeline.py` only	no	immediately
`workflow_dispatch`	—	immediately

Companion change

docker-compose.prod.yml was already updated on main as f0a02354 (beava:next → beavadev/beava:edge + pull_policy: always). Without that, even a post-publish deploy wouldn't pull the new image. Both are needed.

Test plan

Merge PR feat: SDK-driven pipeline + plain-HTTP frontend, with PR-gating CI hardening #1 reproduced the race today — verify the next server-side change merges deploy AFTER publish completes
Website-only change still deploys immediately (don't wait for publish)
gh workflow run deploy-hetzner.yml still works as the manual escape hatch

…-mismatch race Today's PR-merge incident: deploy-hetzner and publish-edge-image both fired on the same merge commit and ran in parallel. deploy finished in ~4 min and tried to /register the new pipeline shape (PageView gained session_id, three feature names changed) against the old running image that didn't honor force=true for destructive diffs → HTTP 409 conflict. publish-edge was still building the new image (~20 min release compile) when deploy gave up. Fix: deploy now triggers via `workflow_run` on publish-edge-image completion. The job-level `if:` guard ensures it only runs when the upstream succeeded — a broken image never reaches the box. Behavior matrix: Change shape publish-edge? deploy fires ──────────────────────────────────── ───────────── ───────────────── Server code (crates/**) yes after publish ok Website only (project/**) no immediately on push Both yes after publish ok register_pipeline.py only no immediately Manual workflow_dispatch — immediately Path filter on the push trigger is unchanged — it stays narrow so website/SDK changes deploy without round-tripping through publish-edge. Companion change to docker-compose.prod.yml (track :edge + pull_policy: always) already landed on main as f0a0235 — without that, even a post-publish deploy wouldn't pick up the new image.

…ugh publish-edge only (#5) Every deploy since PR #1 has been hitting 409 \`registration_conflict\`. PR #3/4 fixed the server-side bug — but the box never picked up the new image because **deploy-hetzner.yml never restarts the container**. Compose's \`pull_policy: always\` only matters when you call \`docker compose up\`; the workflow only ran rsync + curl POST. Plus a parallel race: when a single PR touches both server-code and website paths, both publish-edge-image and deploy fired simultaneously, deploy ran first against the still-old image, register 409'd before the new image was even built. ## Changes **deploy-hetzner.yml** - New first step: \`docker compose pull beava && up -d --force-recreate --no-deps beava\` over SSH, plus a 20-iter health probe (\`http://beava:8090/ready\`). Hard-fail with logs if the new image doesn't come up ready in 20s. - Drop the \`push\` trigger entirely. Every deploy chains off \`workflow_run\` (publish-edge succeeded) or manual \`workflow_dispatch\`. One trigger path. **publish-edge-image.yml** - Drop the path filter — fires on every push to main. Buildx + cargo caching make non-server commits finish in 2–3 min (cache-hit, manifest re-tag). ## Trade-off Website-only commits no longer auto-deploy in 30s — they wait ~2–3 min for publish-edge's cache-hit cycle. For a single-maintainer project with one prod box, ordering > deploy latency. ## Verified Manual repro of the fix steps unblocked prod: 1. SSH'd to box, \`docker compose pull && up -d --force-recreate beava\` → new :edge running 2. \`gh workflow run deploy-hetzner.yml --ref main\` → run 25518077565 → success 3. Pipeline registered cleanly against the new server. ## Related - PR #1 introduced the new pipeline shape that exposed the missing pull step. - PR #2 added the \`workflow_run\` chain — necessary but not sufficient (didn't drop the push trigger; didn't add pull). - PR #3/#4 fixed the server's diff handling — necessary but not sufficient (box never got the fix). - This PR closes the loop. Co-authored-by: Hoang Phan <hoang.phan@viggle.ai>

…of returning the error envelope (#130) ## Summary \`TcpTransport.send_push\` was blindly JSON-decoding the response frame and returning the dict to user code. When the server emitted \`OP_ERROR_RESPONSE\` (e.g. \`invalid_event\` on a type mismatch), the error envelope was returned as a regular dict — **no exception**. Embed mode defaults to TCP, so fire-and-forget pushes silently \`/dev/null\` on validation failure. PR #120 documented this and locked the buggy behaviour with \`test_type_error_at_push.py\`. This PR fixes the bug and flips those tests to assert the correct contract. ## Fix in \`python/beava/_transport.py\` \`send_push\` (lines 443-490) now mirrors \`send_get\`: - After \`read_frame\`, check \`frame.op != OP_PUSH\` (the success-echo opcode — server reuses \`OP_PUSH\`, not a separate \`OP_PUSH_RESPONSE\`). - If \`OP_ERROR_RESPONSE\`: parse the JSON body with try/except guards and \`raise RegistrationError(code=err_body["error"]["code"], message=...)\`. - Fallback: \`"unparseable_error"\` for bad bytes / \`"unexpected_frame"\` for missing code. Docstring expanded with success/error wire shapes and \`Raises:\` section. ## Test flips in \`python/tests/test_type_error_at_push.py\` - 5 type-mismatch tests flipped from \`_assert_push_error(...)\` (which asserted the buggy return-dict shape) to \`with pytest.raises(RegistrationError) as exc_info: ...; assert exc_info.value.code == "<code>"\`. - Test #2 (float→int silent accept) unchanged — server still legitimately accepts that case via numeric I64↔F64 compat, returns ack_lsn. - Added 2 new tests with in-process TCP mock server: - \`test_push_response_unexpected_opcode_raises\` — server replies with bogus opcode → \`RegistrationError(code="unexpected_frame")\`. - \`test_push_error_response_with_unparseable_body_raises\` — non-JSON error body still raises cleanly (no crash). ## Test plan - [x] Before fix: 6/6 passed by asserting the buggy return shape. - [x] After fix: 8/8 passed (\`pytest python/tests/test_type_error_at_push.py\`, 21.26s). - [x] Re-run against current \`main\` (\`b20d2b83\`) — clean. - [x] \`ruff check\` clean.

petrpan26 merged commit 071ead7 into main May 7, 2026
1 check passed

petrpan26 deleted the ci/sequential-publish-then-deploy branch May 7, 2026 17:33

petrpan26 mentioned this pull request May 7, 2026

ci(deploy): pull+restart container before register; route deploy through publish-edge only #5

Merged

This was referenced May 21, 2026

[SEV-1] HTTP wedges 60-90s during snapshot writes — snapshot holds state_tables lock + deep-clones state; wal active segment never rotates #151

Open

fix(snapshot): reclaim covered WAL bytes #152

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(deploy): chain deploy after publish-edge to kill version-mismatch race#2

ci(deploy): chain deploy after publish-edge to kill version-mismatch race#2
petrpan26 merged 1 commit into
mainfrom
ci/sequential-publish-then-deploy

petrpan26 commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

petrpan26 commented May 7, 2026

Summary

What changed

Behavior matrix

Companion change

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant