fix(agent/sync): cover sub-graph SWM in catchup + close approve-time race#885
Merged
Conversation
3 tasks
2ae9612 to
b64f32c
Compare
branarakic
pushed a commit
that referenced
this pull request
Jun 1, 2026
…boundary, graph-aware shared-memory verify, deferred-gossip e2e Three Codex findings on PR #885: 1. sync-verify-worker (graph-blind validity sets): the pre-fix worker used flat `opsWithType` / `opsWithPublishedAt` Sets, so an op subject that appeared in two `_shared_memory_meta` graphs had its rootEntity admitted as universally valid. Refactor to per-meta-graph maps and derive allowed roots into a `dataGraph → Set<rootEntity>` index; the data quad filter then consults the same per-graph scope. 2. sync-responder boundary (URI shape was not exact): the prior `STRSTARTS(?g, "<cgPrefix>/") && STRENDS(?g, "/_shared_memory[_meta]")` filter — even with a `!CONTAINS(STRBEFORE(STRAFTER(...), suffix), "/")` segment-count tightening — could not distinguish a sub-graph "child" of cg-A from the root SWM of a different CG `cg-A/child`, because `validateContextGraphId` permits `/`. Switch to registration-based admission: admit the root SWM unconditionally and admit a sub-graph SWM only if a matching SubGraph registration exists in `<cgPrefix>/_meta` (the same triples `DKGPublisher.isSubGraphRegistered` looks for). Nested CGs put their registration in their own `_meta`, so they are naturally excluded. Pin the contract with a new `CG boundary tightening — nested CGs sharing a prefix do not leak` test. 3. e2e for the deferred SWM gossip flow: `join-approved` calls `subscribeToContextGraph(..., { deferSharedMemoryGossipSubscribe: true })` to skip the immediate SWM subscribe (the curator's allowlist hasn't synced into local `_meta` yet, so a pre-meta subscribe would emit a misleading `SWM gossip subscription denied` WARN). Add `swm-late-joiner-deferred-gossip.test.ts` with three pinning cases: - defer=true keeps publish/app/update/finalization wired but leaves the SWM workspace topic unsubscribed, - `refreshMetaSyncedFlags(cgIds)` re-queues the SWM subscribe once `_meta` lands locally, - default (no defer option) subscribes to SWM immediately when the ACL allows, guarding against an accidental always-defer regression. Co-authored-by: Cursor <cursoragent@cursor.com>
…race A late joiner who joins a CG via the requestJoin/approveJoin flow could not see any SWM history that had been published into a sub-graph (e.g. `<cg>/ai-tools/_shared_memory`), and the very first auto-subscribe attempt always emitted a misleading `SWM gossip subscription denied: local node is not authorized` warning. Two surgical fixes, both in the agent package: 1. `sync/responder/sync-handler.ts` — the workspace branch hardcoded its SPARQL to `contextGraphWorkspaceGraphUri(cgId)` / `...MetaGraphUri(cgId)`, both of which alias the CG-root SWM URI. Any publish that supplied a `subGraphName` lands at `<cgPrefix>/<sub>/_shared_memory[_meta]` instead, so the responder served zero bytes of any sub-graph SWM. Replaced with a shape-based FILTER (`STRSTARTS(?g, "<cgPrefix>/") && STRENDS(?g, "/_shared_memory[_meta]")`) that emits `?g` per binding; the TTL data branch binds the ops graph to the entity graph via `STR(?gMeta) = CONCAT(STR(?g), "_meta")` so per-sub-graph ops only pair with their own entities. 2. `dkg-agent.ts` — added an opt-in `deferSharedMemoryGossipSubscribe` to `subscribeToContextGraph` and threaded it through the join-approved gossip handler. Other gossip topics (publish, app, update, finalization) still wire immediately for UI feedback; only the SWM gossip subscribe is deferred. `runImmediatePostApprovalSync` pulls `_meta` (now including the curator's allowlist), then `runCatchupOverPeers` calls `refreshMetaSyncedFlags` which re-queues the SWM subscribe (clean self-heal). No spurious WARN, no missed future gossip — sync covers the historical window and gossip the live one. Test coverage in `test/sync-responder-swm-subgraphs.test.ts` (8 new cases, 350 lines): root + every sub-graph SWM fan-out across data and meta phases, durable-graph isolation, TTL filter scoped per sub-graph, correct graph URI emitted per nquad. Existing `sync-responder-per-cgid-meta` regression suite + `request-authorize` + `swm-snapshot-sync` + `swm-first-writer-wins-extra` + `sync-fresh-per-attempt` all still pass (38/38). Co-authored-by: Cursor <cursoragent@cursor.com>
Standalone devnet harness that reproduces the gap that motivated the
parent fix and asserts BOTH closures end-to-end against a real local
6-node devnet:
• Curator (N5) creates curated CG with allowlist=[curator] only.
• Curator creates sub-graph "ai-tools".
• Curator publishes 5 SWM triples into the sub-graph + 3 into the
CG-root SWM (control set distinguishes "neither flowed" from
"only one flowed" in failure logs).
• Late joiner (N1, NOT pre-allowlisted) signs a join delegation,
sends `request-join`, curator approves.
• After post-approval sync settles:
▸ Late joiner SPARQLs sub-graph "ai-tools" SWM → MUST be 5.
(Pre-fix this is reliably 0 — exact gap-2 signature.)
▸ Late joiner SPARQLs CG-root SWM → MUST be 3 (baseline).
▸ Late joiner daemon log MUST NOT contain a
`SWM gossip subscription denied for "<cgId>"` line for this
specific CG. (Pre-fix this is reliably ≥1 — exact gap-1
signature.)
Script runs against the standard devnet harness (`./scripts/devnet.sh
start`); no extra setup. Verified PASS on a freshly built local devnet
this commit was developed against:
Catch-up sync ... data=144 sharedMemory=106 denied=0
late joiner sub-graph "ai-tools" SWM count: 5 (expect 5)
late joiner root SWM count: 3 (expect 3)
zero spurious 'SWM gossip subscription denied' lines
Same harness format as `devnet-test-rfc38-late-joiner.sh` so the
existing operator muscle memory carries over (timestamp-suffixed CG
ids → re-runnable; HTTP-only → no libp2p mocking; CG_ID-anchored
log greps → no false-positives across the run's other CGs).
Co-authored-by: Cursor <cursoragent@cursor.com>
…boundary, graph-aware shared-memory verify, deferred-gossip e2e Three Codex findings on PR #885: 1. sync-verify-worker (graph-blind validity sets): the pre-fix worker used flat `opsWithType` / `opsWithPublishedAt` Sets, so an op subject that appeared in two `_shared_memory_meta` graphs had its rootEntity admitted as universally valid. Refactor to per-meta-graph maps and derive allowed roots into a `dataGraph → Set<rootEntity>` index; the data quad filter then consults the same per-graph scope. 2. sync-responder boundary (URI shape was not exact): the prior `STRSTARTS(?g, "<cgPrefix>/") && STRENDS(?g, "/_shared_memory[_meta]")` filter — even with a `!CONTAINS(STRBEFORE(STRAFTER(...), suffix), "/")` segment-count tightening — could not distinguish a sub-graph "child" of cg-A from the root SWM of a different CG `cg-A/child`, because `validateContextGraphId` permits `/`. Switch to registration-based admission: admit the root SWM unconditionally and admit a sub-graph SWM only if a matching SubGraph registration exists in `<cgPrefix>/_meta` (the same triples `DKGPublisher.isSubGraphRegistered` looks for). Nested CGs put their registration in their own `_meta`, so they are naturally excluded. Pin the contract with a new `CG boundary tightening — nested CGs sharing a prefix do not leak` test. 3. e2e for the deferred SWM gossip flow: `join-approved` calls `subscribeToContextGraph(..., { deferSharedMemoryGossipSubscribe: true })` to skip the immediate SWM subscribe (the curator's allowlist hasn't synced into local `_meta` yet, so a pre-meta subscribe would emit a misleading `SWM gossip subscription denied` WARN). Add `swm-late-joiner-deferred-gossip.test.ts` with three pinning cases: - defer=true keeps publish/app/update/finalization wired but leaves the SWM workspace topic unsubscribed, - `refreshMetaSyncedFlags(cgIds)` re-queues the SWM subscribe once `_meta` lands locally, - default (no defer option) subscribes to SWM immediately when the ACL allows, guarding against an accidental always-defer regression. Co-authored-by: Cursor <cursoragent@cursor.com>
2706ef5 to
07462ce
Compare
There was a problem hiding this comment.
Codex review produced 2 comment(s) but all targeted lines outside the diff and were dropped. Check the workflow logs for details.
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes two related gaps that prevent a freshly-approved late joiner from seeing a CG's SWM history when it lives in sub-graphs, and that emit a misleading
SWM gossip subscription denied: local node is not authorizedwarning on every fresh approve.Reproducer (real network, devnet):
<cgPrefix>and publishes SWM into a sub-graph (e.g.<cgPrefix>/ai-tools/_shared_memory).requestJoin; curatorapproveJoins it.deniedWARN in the daemon log.runImmediatePostApprovalSyncreturnsdata=N sharedMemory=0even though the curator has thousands of SWM triples in<cgPrefix>/ai-tools/_shared_memory.This PR ships a standalone devnet regression harness (
scripts/devnet-test-swm-late-joiner-subgraph.sh) that mechanises the above and asserts both fixes end-to-end. The existingdevnet-test-rfc38-late-joiner.shscenarios pre-allowlist members and only exercise root-level SWM, so neither path was covered until now.What changed
Gap 2 — sub-graph blind spot in the sync responder
packages/agent/src/sync/responder/sync-handler.tsThe workspace branch hardcoded its SPARQL to
contextGraphWorkspaceGraphUri(cgId)/...MetaGraphUri(cgId), both of which alias the CG-root SWM URI:<cgPrefix>/_shared_memory<cgPrefix>/_shared_memory_metaAny publish that supplied
subGraphNamelands at the per-sub-graph variants (<cgPrefix>/<sub>/_shared_memory[_meta]), so the responder's static graph filter never saw them. After this PR the responder filters by URI shape under the CG prefix and emits?gper binding so the requester reconstructs the correct graph at write time. The TTL data branch binds the ops graph to the entity graph viaSTR(?gMeta) = CONCAT(STR(?g), "_meta")so per-sub-graph ops only pair with their own entities (no bleeding across sub-graphs or into root).URI shape filter is exact:
validateSubGraphName(dkg-core/constants.ts) prohibits/and leading_, so the only graphs matching<cgPrefix>/[<sub>/]_shared_memory[_meta]are the SWM graphs we want._shared_memory_metadoes not matchSTRENDS(?g, "/_shared_memory")(length mismatch), so the data and meta phases stay cleanly partitioned.This same fix also closes Gap 3 — the perceived "pre-join SWM is undecryptable" issue. Forward secrecy protects the gossip ciphertext from non-members; it does not prevent the curator (who holds plaintext in its local SWM store) from shipping that plaintext over the authenticated
/dkg/10.0.1/synclink to a peer who is now allowlisted. That's exactly what bilateral sync was designed for. The pre-existingauthorizePrivateSyncRequestgate (envelope freshness + replay protection + signer recovery + identity verification + participant/agent-gate/peer/delegation allowlist +refreshMetaFromCuratoron first miss) ensures only authorized peers receive plaintext.Gap 1 — approve-time SWM gossip auth race
packages/agent/src/dkg-agent.tsOn a fresh
join-approvednotification, the curator has just written the allowlist into ITS_meta, but the requesting node hasn't synced that allowlist yet. The current handler calledsubscribeToContextGraph(cgId)synchronously, which queued an SWM gossip subscribe whosecanReadContextGraphcheck ran against an empty local_meta— denying with the misleadinglocal node is not authorizedwarning beforerefreshMetaSyncedFlagsself-healed it seconds later.Added opt-in
deferSharedMemoryGossipSubscribe: booleantosubscribeToContextGraphand wired it through the join-approved handler. Other gossip topics (publish, app, update, finalization) wire up immediately as before — UI feedback unchanged. Only SWM gossip subscribe is deferred until_metais locally visible.The self-heal path already exists:
runImmediatePostApprovalSynccallsrunCatchupOverPeerswhich awaitsrefreshMetaSyncedFlags([cgId])(line 3580), which callsqueueSharedMemoryGossipSubscriptiononcehasConfirmedMetaStateis true (line 3738). Robust against post-approval sync failure too:refreshMetaSyncedFlagsis also called fromsync-on-connect.ts:85on every new peer connection and from the periodic catchup reconciler.Test coverage
Unit (
packages/agent/test/sync-responder-swm-subgraphs.test.ts) — new, 350 lines, 8 cases — modelled on the existingsync-responder-per-cgid-meta.test.tsregression style. Covers:phase=data: returns root + every sub-graph SWM in one response, excludes SWM meta graphs, excludes durable-tier graphs.phase=meta: returns root + every sub-graph SWM meta, excludes SWM data graphs, excludes durable top-level_meta.phase=datawith TTL: keeps fresh root + sub entities, drops stale ones, scoped per sub-graph (verifies the?gMeta = ?g + "_meta"binding).Devnet (
scripts/devnet-test-swm-late-joiner-subgraph.sh) — new, 315 lines — exercises the fullrequestJoin → approveJoin → catchup → assert both root + sub-graph SWM visibleflow against./scripts/devnet.sh start. Curator (N5) creates CG with allowlist=[curator] only, creates sub-graphai-tools, publishes 5 SWM triples into the sub-graph + 3 into root. Late joiner (N1, NOT pre-allowlisted) signs a delegation, request-joins, gets approved, then SPARQLs both layers. Two anchored assertions:ai-toolsSWM count == 5 (gap 2 — pre-fix this is reliably 0).SWM gossip subscription denied for "<cgId>"returns 0 hits in the late joiner's daemon log (gap 1 — pre-fix this is reliably ≥1).Test plan
pnpm -F @origintrail-official/dkg-agent exec vitest run --no-coverage test/sync-responder-swm-subgraphs.test.ts test/sync-responder-per-cgid-meta.test.ts test/sync-fresh-per-attempt.test.ts test/request-authorize.test.ts test/swm-snapshot-sync.test.ts test/swm-first-writer-wins-extra.test.ts— 38/38 pass./scripts/devnet.sh start && ./scripts/devnet-test-swm-late-joiner-subgraph.sh— PASS on local 6-node devnet (curator catchup line:Catch-up sync ... data=144 sharedMemory=106 denied=0; SPARQLsub-graph=5, root=3; zero gap-1 WARNs for the test CG).