feat(git): move repo-name registry to Postgres + relax RWM chart gate (HA relay)#1432
Merged
Conversation
The relay's git ref/object state is already fully object-store-backed: every read and write hydrates an ephemeral bare repo from S3 per request, and writer serialization is the object-store pointer CAS (docs/git-on-object-storage.md, Inv_NoFork). The one remaining piece of persistent local-disk state was the `.names/<community>/<repo_id>` repo-name reservation index, which forced multi-replica deployments onto a shared ReadWriteMany (EFS) volume just to agree on name ownership. Move that registry into Postgres (already a hard relay dependency): - New tenant-scoped table `git_repo_names` in the consolidated 0001 schema, keyed `(community_id, repo_id)` so the migration-lint invariant holds and name uniqueness is per-community and DB-enforced. - New `buzz-db::git_repo` module mirroring `relay_members`: `reserve_repo_name` (INSERT … ON CONFLICT DO NOTHING → Reserved / AlreadyOwned / TakenByOther), `repo_name_owner`, `count_repos_for_owner` (quota), `release_repo_name` (owner-scoped rollback). - `handle_git_repo_announcement` reworked onto those calls, preserving the three semantics (atomic uniqueness, idempotent same-owner re-announce, per-pubkey quota) and the all-or-nothing seed-failure rollback. With no persistent git state on disk, drop the chart's ReadWriteMany hard-fail: `_validate.tpl` no longer requires persistence.git.accessMode=ReadWriteMany at replicaCount>1. Redis remains required for buzz-pubsub — that is the real multi-pod dependency. The now-unused `git_repo_path` config field / git PVC mount is left in place (doc-noted) as a separate cleanup PR to keep this diff minimal. Verified: full buzz-relay suite (429) green; buzz-db lib (79) green including migration-lint; new git_repo DB tests pass against live Postgres; clippy clean; helm template confirms replicaCount=2 + ReadWriteOnce renders while the Redis gate still blocks when no Redis source is configured. Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Addresses two review blockers on the .names->Postgres change. Blocker 1 (name-registry rollback/re-announce race): - The seed-failure rollback was owner-scoped but not attempt-scoped: an AlreadyOwned attempt could DELETE the reservation row a concurrent same-owner attempt had inserted and successfully seeded. Now only a genuinely fresh Reserved outcome by THIS attempt may release the row; AlreadyOwned releases nothing. - Same-owner re-announce no longer returns Ok() purely on the existing row (which proves ownership, not that the manifest pointer was seeded). It falls through to seed_manifest_pointer, which is idempotent under concurrency (create-only put_pointer(IfNoneMatchStar); LostRace on the same empty digest is success, a different non-empty pointer is a hard error). Handler success now means row AND pointer are both present, so a repo can never be 'accepted' while uncloneable. Blocker 2 (stale RWM assertions across the chart): Relaxing the gate in _validate.tpl left the rest of the chart claiming ReadWriteMany is required. Made the gate-relax chart-wide: - values.yaml, values.schema.json, README.md: replicaCount>1 requires Redis only; git is object-store-backed, RWO per replica is fine. - NOTES.txt: dropped the now-false 'RWO will fail at template time' warning. - examples/argocd-app.yaml, examples/flux-helmrelease.yaml: RWO + no efs-sc. - tests: validation_test drops the obsolete 'RWO fails' case (RWO no longer fails; the Redis-absent gate test is retained); render_test + HA/production fixtures now assert replicaCount=3 renders with ReadWriteOnce. Verified: buzz-relay suite 429/0; buzz-db git_repo live-PG tests 4/4; clippy clean; helm unittest 30/30 (7 suites); helm template confirms replicaCount=2 + RWO + Redis renders and replicaCount=2 + RWO + no-Redis is still blocked. Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
…nsure) The prior fix routed the same-owner AlreadyOwned reannounce path through seed_manifest_pointer, which is intentionally strict for repo *creation*: its LostRace branch only accepts an existing pointer if it names the same empty manifest digest, and errors otherwise. That regressed the normal lifecycle — once an owner pushed, the pointer named a non-empty manifest, so the next kind:30617 reannounce failed with 'already has a non-empty pointer; refusing to overwrite via announce'. Split the pointer step by outcome: - Fresh Reserved claim (this attempt inserted the row) -> seed_manifest_pointer, kept strict: a non-empty pointer under a just-reserved name is suspicious (stale prior lifecycle) and correctly fails + rolls back the fresh row. - Same-owner AlreadyOwned (reannounce) -> new ensure_manifest_pointer, tolerant: any existing pointer (empty OR non-empty) is valid and left untouched; only an absent pointer is repaired by seeding the empty pointer. This restores the old filesystem behavior (reannounce is an idempotent update regardless of pushed state) and keeps the 'announced <=> pointer exists' invariant, while never overwriting real ref state. The read-then-conditional-seed in ensure_manifest_pointer is race-safe: the repair uses the same create-only put_pointer(IfNoneMatchStar), so a concurrent seeder or pusher that populates the pointer between the read and the seed loses the create race and is treated as already-present, not an overwrite. Rollback semantics unchanged: only reserved_by_this_attempt releases the row. Verified: buzz-relay suite 429/0; git_repo live-PG 4/4; clippy clean; helm unittest 30/30 (chart unchanged this pass). Note: no cheap handler-level test exists (the announce path needs a full AppState with a live git_store); coverage is the full suite + the isolated, reasoned ensure_manifest_pointer branching. Flagged for reviewer. Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
…eserve Two review blockers on the name-registry work: 1. Brownfield migration. git_repo_names was added by editing the consolidated 0001 in place, changing its checksum. Any database that already applied the pre-PR 0001 would abort startup under AUTO_MIGRATE with sqlx VersionMismatch(1). Revert 0001 to be byte-identical to origin/main and add the table + owner index as an additive 0002_git_repo_names.sql. To keep the tenant-isolation lint honest as migrations accrue, migration_sql() now concatenates every embedded migration in version order (git_repo_names is community_id-led, so it passes). Tests updated: embedded_migrator expects 2 migrations and asserts git_repo_names lives in v2 (absent from v1); the live run_migrations test expects applied_versions [1, 2]. 2. Stale ref-state on re-announce. handle_git_repo_announcement called emit_initial_ref_state unconditionally, so a same-owner re-announce-after-push published a newer empty kind:30618 that, under NIP-16 latest-replaceable ordering, shadowed the real pushed refs. Gate the emission behind reserved_by_this_attempt: emit the initial empty ref-state only on a fresh Reserved claim. AlreadyOwned re-announce still ensures/repairs the manifest pointer but emits no empty 30618. Verified: cargo check -p buzz-relay, cargo test -p buzz-db --lib (79 pass), clippy clean on both crates, helm unittest (30 pass). Brownfield proven live against a throwaway Postgres seeded with origin/main's 0001: the reviewed SHA errored VersionMismatch(1); this change applied [1, 2] and created git_repo_names cleanly. Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
This was referenced Jul 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Makes the buzz relay stateless enough to run multiple pods without a
ReadWriteMany(RWM) volume, unblocking HA deployment. The last local-disk state — the.names/repo-name registry — moves into Postgres, and the Helm chart's RWM hard-gate is relaxed (Redis stays required atreplicaCount > 1).This lets the bb-block EFS/CSI workaround (bb-block #129, which created an EFS filesystem purely to get RWM for the git PVC) be closed — no shared git storage needed. Git ref/object state was already object-store-backed; only name allocation lived on local disk.
What changed
Registry → Postgres (
buzz-db::git_repo, migration)git_repo_namestable, PK(community_id, repo_id)— the DB primary key is the race guard, so concurrent same-name announces are TOCTOU-free.reserve_repo_name→Reserved/AlreadyOwned/TakenByOther; plusrepo_name_owner,count_repos_for_owner(per-pubkey quota),release_repo_name.Announce handler (
handle_git_repo_announcement,side_effects.rs)create_dir/read_dironto the DB registry + object-store pointer.Reservedby this attempt is released on a pointer failure;AlreadyOwnednever deletes another attempt's row.Reserved→ strictseed_manifest_pointer(creates empty pointer; a pre-existing non-empty pointer under a just-reserved name is suspicious → fail + rollback).AlreadyOwned→ tolerantensure_manifest_pointer(any existing pointer, empty or non-empty, is left untouched; only an absent pointer is repaired). This keeps normal re-announce-after-push accepted while never overwriting real ref state.Helm chart RWM cleanup (~13 sites)
values.yaml,values.schema.json,README.md,NOTES.txt,examples/*, and test fixtures now reflect:replicaCount > 1requires Redis only; git is object-store-backed, soReadWriteOnceper replica is fine.Validation
buzz-relaysuite 429/0;buzz-db git_repolive-PG 4/4; clippy clean.helm unittest: 30/30 across 7 suites, includingrender_testasserting the git PVC rendersReadWriteOnceatreplicaCount=3.helm templateA/B:replicaCount=2+ RWO + Redis renders;replicaCount=2+ RWO + no Redis still blocks ("requires Redis for buzz-pubsub").Live clean-room git validation
Stood up a fresh relay (empty DB verified 0→37 tables, fresh object-store bucket,
AUTO_MIGRATE=true, A3 git object-store conformance probe passed) and drove the git path end-to-end at this branch's tip:Reviewed independently before opening.
Follow-ups (non-blocking)
git_repo_path/ PVC mount (kept in this PR to minimize the diff).Related