fix(register): force=true must replace additive-against-existing descriptors (prod 409 hotfix)#3
Closed
petrpan26 wants to merge 6 commits into
Closed
Conversation
added 6 commits
May 7, 2026 13:49
…ource (RED) Reproduces the prod 409 from the deploy-hetzner failure on PR #1 merge: the pipeline added `session_id` to PageView, expected force=true to replace the descriptor, but the legacy compute_diff in execute_register_inner unconditionally returned `registration_conflict` because apply_shard's pre-removal block only ran for destructive entries. NewField on an existing event source classifies as `additive` in classify_register_diff but `changed` in the legacy compute_diff. The two diff systems disagree, force=true is silently dropped, the legacy path wins, 409. Test currently fails with the legacy 409 envelope (status=409, error.code=registration_conflict, diff.changed contains the Tx schema mismatch). Will go green once apply_shard pre-removes additive-changed descriptors when force=true.
…xisting descriptors (GREEN)
Two diff systems on the server disagreed on schema additions:
- classify_register_diff (apply_shard pre-flight, register_validate.rs):
new field on existing event source → additive[NewField], no destructive.
register_check_force_required passes since destructive is empty.
- compute_diff (legacy, registry_diff.rs, called by execute_register_inner):
same shape → changed[ConflictDetail{reason: schema_mismatch}].
register.rs:257 returns RegisterOutcome::Conflict unconditionally —
no force check on this path.
apply_shard's force-handling block at line 269 was gated on
`force && !diff.destructive.is_empty()`, so additive-only payloads
slipped through unchanged. The legacy path then 409'd, and the wire
`force=true` was effectively dropped. Production deploy of
PageView{path,dwell_ms} → PageView{session_id,path,dwell_ms} +
SiteMetrics has been failing on every retry since PR #1 merged.
Fix: when force=true, also walk diff.additive and pre-remove descriptors
named in NewField{event} and NewAgg{table} entries. NewDescriptor is
genuinely-new (not present in the registry) and stays untouched. The
existing destructive loop is unchanged. Pre-removal goes through the
same registry.force_remove_descriptors call as the destructive path,
so state-loss semantics for force=true are unchanged: callers who want
to add a field without losing accumulated state should send the request
without `force` (additive is allowed without force per
register_check_force_required design).
Targeted: phase13_4_force_register passes 10/10 including the new test.
The pipeline showcase showed: $ curl https://beava.dev/api/features/site That route doesn't exist on the live Caddy proxy and never has — beava's HTTP surface is `POST /api/get` with `{table, key}` body. The LiveMetrics card a few lines up already hits the real endpoint, but the prose example drifted at some point. Anyone copy-pasting hit a 404. Replaced with the real shape: $ curl https://beava.dev/api/get -d '{"table":"SiteMetrics","key":""}'
…w (1280x640)
These two surfaces aren't in the rendered website — they're uploaded
manually via GitHub's Settings UI. Committing the canonical renders
to the design-system assets dir so future updates don't have to
redraw them from scratch.
- org-avatar-280.png: 280x280 square, transparent geometric mascot
centered on #000. Drop-in for Settings → Profile → upload picture
on github.com/beava-dev (or petrpan26).
- repo-social-preview-1280x640.png: 1280x640, same mascot on #000,
centered. Drop-in for Settings → General → Social preview on
github.com/beava-dev/beava.
Both follow the brand-github-banner.html spec from the design bundle
(geometric-transparent on near-black canvas).
Per request, both logo surfaces use the cream-on-mascot variant from the brand-github-banner.html spec instead of the black variant. Same canvas dimensions (280x280, 1280x640), same centered geometric mascot, just --beava-cream (#fdfaf4) bg matching the README banner cutover.
Contributor
Author
|
Superseded by #4 — folding the apply_shard fix + integration test + bundled docs/design changes into the broader consolidation PR. Single review, single merge for the full prod fix + cleanup story. |
petrpan26
added a commit
that referenced
this pull request
May 7, 2026
…nors additive-against-existing (#4) **Folds in #3** (closed as superseded). Single PR for the prod 409 fix + the underlying dual-diff cleanup. ## Root cause Two diff systems disagreed on schema additions: - `classify_register_diff` (apply_shard pre-flight): NewField → `additive`, no force needed. - `compute_diff` (legacy, in `execute_register_inner`): same shape → `changed`, returned Conflict regardless of force. `apply_shard`'s pre-removal block was gated on `force && !destructive.is_empty()` — additive-only payloads slipped through, hit the legacy path, 409'd. Wire `force=true` was effectively dropped. ## Fix in 5 commits | # | Step | What | |---|---|---| | `92af5b87` | hotfix RED | integration test reproducing prod 409 | | `ff8dd73f` | hotfix GREEN | apply_shard pre-removes additive-against-existing when force=true | | `29bbe4d7` | refactor 1/3 | `RegisterDiff` gains `already_present` field + 6 unit tests | | `e7d66e76` | refactor 2/3 | `execute_register_inner` switches to `classify_register_diff` | | `6d801f40` | refactor 3/3 | Delete legacy `compute_diff` + `ConflictDetail` + `DiffReason` + 25 tests + `RegisterOutcome::Conflict` | Plus `97794ce3` fmt cleanup, `f221dc54` homepage curl example fix, `4505a60b` + `e17cebcc` brand asset renders. **Net diff: +74 / −1295 lines.** ## Verification ``` cargo clippy --all-targets --features testing -- -D warnings → clean beava-core unit tests 599 passed (incl. 6 new classify tests) phase13_4_force_register 10 passed (incl. force-additive) phase13_4_dry_run_register 4 passed phase2_smoke 6 passed (already_present preserved) ``` Full workspace gate runs in CI on this PR. ## Out of scope Allowing additive-without-force to apply in-place. Today the UX is "send `force=true` to land any change against an existing descriptor." Deeper engine work (in-place schema mutation without state loss) is a separate effort. --------- Co-authored-by: Hoang Phan <hoang.phan@viggle.ai>
petrpan26
added a commit
that referenced
this pull request
May 7, 2026
…ugh publish-edge only (#5) Every deploy since PR #1 has been hitting 409 \`registration_conflict\`. PR #3/4 fixed the server-side bug — but the box never picked up the new image because **deploy-hetzner.yml never restarts the container**. Compose's \`pull_policy: always\` only matters when you call \`docker compose up\`; the workflow only ran rsync + curl POST. Plus a parallel race: when a single PR touches both server-code and website paths, both publish-edge-image and deploy fired simultaneously, deploy ran first against the still-old image, register 409'd before the new image was even built. ## Changes **deploy-hetzner.yml** - New first step: \`docker compose pull beava && up -d --force-recreate --no-deps beava\` over SSH, plus a 20-iter health probe (\`http://beava:8090/ready\`). Hard-fail with logs if the new image doesn't come up ready in 20s. - Drop the \`push\` trigger entirely. Every deploy chains off \`workflow_run\` (publish-edge succeeded) or manual \`workflow_dispatch\`. One trigger path. **publish-edge-image.yml** - Drop the path filter — fires on every push to main. Buildx + cargo caching make non-server commits finish in 2–3 min (cache-hit, manifest re-tag). ## Trade-off Website-only commits no longer auto-deploy in 30s — they wait ~2–3 min for publish-edge's cache-hit cycle. For a single-maintainer project with one prod box, ordering > deploy latency. ## Verified Manual repro of the fix steps unblocked prod: 1. SSH'd to box, \`docker compose pull && up -d --force-recreate beava\` → new :edge running 2. \`gh workflow run deploy-hetzner.yml --ref main\` → run 25518077565 → success 3. Pipeline registered cleanly against the new server. ## Related - PR #1 introduced the new pipeline shape that exposed the missing pull step. - PR #2 added the \`workflow_run\` chain — necessary but not sufficient (didn't drop the push trigger; didn't add pull). - PR #3/#4 fixed the server's diff handling — necessary but not sufficient (box never got the fix). - This PR closes the loop. Co-authored-by: Hoang Phan <hoang.phan@viggle.ai>
petrpan26
pushed a commit
that referenced
this pull request
May 7, 2026
…riter Three review-driven changes on top of the orphan-recovery fix: 1. (test reviewer #3) Wrap `BadMagic` and `UnsupportedVersion` errors from `segment::read_header` inside `reuse_orphan` into the structured `AlreadyExists` Io error. Pre-fix, a 24-byte file with garbage magic or a future format_version would propagate the raw error variant from `read_header`, breaking the boot-time error contract uniformity (operator gets `BadMagic` here, `AlreadyExists` everywhere else). The wrap includes the original error in the message for forensics. Makes `open_refuses_orphan_with_bad_magic` and `open_refuses_orphan_with_unsupported_format_version` go green. 2. (impl reviewer #1) Document the single-writer invariant on `WalWriter::open`. The reuse path opens with `read+write` and takes no `flock`/`fcntl(F_SETLK)`. In normal beava deploys (single container, restart: unless-stopped, --force-recreate serializing on container lifecycle) this never bites, but two concurrent processes pointed at the same WAL dir can both pass the size==HEADER_SIZE check and start interleaving record bytes. Documented as an explicit prerequisite so a future multi-process scenario doesn't silently produce CRC corruption. 3. (impl reviewer #7) Update the now-stale comment in `rotation::rotate` that said `WalWriter::open` errors loudly on existing files. After the orphan-recovery fix it can also REUSE a header-only segment; reflect the new contract. Plus a clippy fix in the new test (`OpenOptions::write(true).append(true)` collapses to `.append(true)` — the `append` flag implies `write`). Verified: cargo test -p beava-persistence → all 9 + pre-existing 7 + others green cargo clippy -p beava-persistence --all-targets -- -D warnings → clean cargo fmt --check -p beava-persistence → clean
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Root cause
Two diff systems on the server disagree on schema additions:
classify_register_diff(apply_shard pre-flight): NewField on existing event →additive. force-required gate passes.compute_diff(legacy, inexecute_register_inner): same shape →changed: [ConflictDetail], returnsRegisterOutcome::Conflictunconditionally — never readsforce.apply_shard's pre-removal at line 269 was gatedforce && !diff.destructive.is_empty(). Pure-additive payloads slipped through, hit the legacy path, 409'd. Wireforce=truesilently dropped.This was the prod 409 on every
deploy-hetznerretry after PR #1 (PageView gainedsession_id).Fix
When
force=true, also pre-remove descriptors named inNewField{event}andNewAgg{table}additive entries viaforce_remove_descriptors.NewDescriptorskipped (genuinely-new). Destructive loop unchanged.Verification
Bundled
c54d4499homepage curl example:/api/features/site(404) → realPOST /api/get.3a8843b8+a400b70adesign assets:org-avatar-280.pngandrepo-social-preview-1280x640.pngin cream variant. No GitHub API for either — upload via Settings UI.Followup (out of scope)
Additive without
forceshould also succeed —register_check_force_requiredsays it's allowed, but the legacycompute_diff409s anyway. Real fix is to aligncompute_diffwithclassify_register_diffor thread the new diff into execute_register. Tracked separately.