Skip to content

fix(agent/sync): cover sub-graph SWM in catchup + close approve-time race#885

Merged
branarakic merged 4 commits into
mainfrom
fix/swm-subgraph-late-joiner-backfill
Jun 1, 2026
Merged

fix(agent/sync): cover sub-graph SWM in catchup + close approve-time race#885
branarakic merged 4 commits into
mainfrom
fix/swm-subgraph-late-joiner-backfill

Conversation

@branarakic
Copy link
Copy Markdown
Contributor

@branarakic branarakic commented Jun 1, 2026

Summary

Fixes two related gaps that prevent a freshly-approved late joiner from seeing a CG's SWM history when it lives in sub-graphs, and that emit a misleading SWM gossip subscription denied: local node is not authorized warning on every fresh approve.

Reproducer (real network, devnet):

  1. Curator creates CG <cgPrefix> and publishes SWM into a sub-graph (e.g. <cgPrefix>/ai-tools/_shared_memory).
  2. New node sends a signed requestJoin; curator approveJoins it.
  3. Late joiner observes the denied WARN in the daemon log.
  4. runImmediatePostApprovalSync returns data=N sharedMemory=0 even though the curator has thousands of SWM triples in <cgPrefix>/ai-tools/_shared_memory.

This PR ships a standalone devnet regression harness (scripts/devnet-test-swm-late-joiner-subgraph.sh) that mechanises the above and asserts both fixes end-to-end. The existing devnet-test-rfc38-late-joiner.sh scenarios pre-allowlist members and only exercise root-level SWM, so neither path was covered until now.

What changed

Gap 2 — sub-graph blind spot in the sync responder

packages/agent/src/sync/responder/sync-handler.ts

The workspace branch hardcoded its SPARQL to contextGraphWorkspaceGraphUri(cgId) / ...MetaGraphUri(cgId), both of which alias the CG-root SWM URI:

  • <cgPrefix>/_shared_memory
  • <cgPrefix>/_shared_memory_meta

Any publish that supplied subGraphName lands at the per-sub-graph variants (<cgPrefix>/<sub>/_shared_memory[_meta]), so the responder's static graph filter never saw them. After this PR the responder filters by URI shape under the CG prefix and emits ?g per binding so the requester reconstructs the correct graph at write time. The TTL data branch binds the ops graph to the entity graph via STR(?gMeta) = CONCAT(STR(?g), "_meta") so per-sub-graph ops only pair with their own entities (no bleeding across sub-graphs or into root).

URI shape filter is exact: validateSubGraphName (dkg-core/constants.ts) prohibits / and leading _, so the only graphs matching <cgPrefix>/[<sub>/]_shared_memory[_meta] are the SWM graphs we want. _shared_memory_meta does not match STRENDS(?g, "/_shared_memory") (length mismatch), so the data and meta phases stay cleanly partitioned.

This same fix also closes Gap 3 — the perceived "pre-join SWM is undecryptable" issue. Forward secrecy protects the gossip ciphertext from non-members; it does not prevent the curator (who holds plaintext in its local SWM store) from shipping that plaintext over the authenticated /dkg/10.0.1/sync link to a peer who is now allowlisted. That's exactly what bilateral sync was designed for. The pre-existing authorizePrivateSyncRequest gate (envelope freshness + replay protection + signer recovery + identity verification + participant/agent-gate/peer/delegation allowlist + refreshMetaFromCurator on first miss) ensures only authorized peers receive plaintext.

Gap 1 — approve-time SWM gossip auth race

packages/agent/src/dkg-agent.ts

On a fresh join-approved notification, the curator has just written the allowlist into ITS _meta, but the requesting node hasn't synced that allowlist yet. The current handler called subscribeToContextGraph(cgId) synchronously, which queued an SWM gossip subscribe whose canReadContextGraph check ran against an empty local _meta — denying with the misleading local node is not authorized warning before refreshMetaSyncedFlags self-healed it seconds later.

Added opt-in deferSharedMemoryGossipSubscribe: boolean to subscribeToContextGraph and wired it through the join-approved handler. Other gossip topics (publish, app, update, finalization) wire up immediately as before — UI feedback unchanged. Only SWM gossip subscribe is deferred until _meta is locally visible.

The self-heal path already exists: runImmediatePostApprovalSync calls runCatchupOverPeers which awaits refreshMetaSyncedFlags([cgId]) (line 3580), which calls queueSharedMemoryGossipSubscription once hasConfirmedMetaState is true (line 3738). Robust against post-approval sync failure too: refreshMetaSyncedFlags is also called from sync-on-connect.ts:85 on every new peer connection and from the periodic catchup reconciler.

Test coverage

Unit (packages/agent/test/sync-responder-swm-subgraphs.test.ts) — new, 350 lines, 8 cases — modelled on the existing sync-responder-per-cgid-meta.test.ts regression style. Covers:

  • phase=data: returns root + every sub-graph SWM in one response, excludes SWM meta graphs, excludes durable-tier graphs.
  • phase=meta: returns root + every sub-graph SWM meta, excludes SWM data graphs, excludes durable top-level _meta.
  • phase=data with TTL: keeps fresh root + sub entities, drops stale ones, scoped per sub-graph (verifies the ?gMeta = ?g + "_meta" binding).
  • Per-line nquad serialization correctly emits the source graph URI (no longer the static root alias).

Devnet (scripts/devnet-test-swm-late-joiner-subgraph.sh) — new, 315 lines — exercises the full requestJoin → approveJoin → catchup → assert both root + sub-graph SWM visible flow against ./scripts/devnet.sh start. Curator (N5) creates CG with allowlist=[curator] only, creates sub-graph ai-tools, publishes 5 SWM triples into the sub-graph + 3 into root. Late joiner (N1, NOT pre-allowlisted) signs a delegation, request-joins, gets approved, then SPARQLs both layers. Two anchored assertions:

  • Sub-graph ai-tools SWM count == 5 (gap 2 — pre-fix this is reliably 0).
  • CG_ID-anchored grep for SWM gossip subscription denied for "<cgId>" returns 0 hits in the late joiner's daemon log (gap 1 — pre-fix this is reliably ≥1).

Test plan

  • pnpm -F @origintrail-official/dkg-agent exec vitest run --no-coverage test/sync-responder-swm-subgraphs.test.ts test/sync-responder-per-cgid-meta.test.ts test/sync-fresh-per-attempt.test.ts test/request-authorize.test.ts test/swm-snapshot-sync.test.ts test/swm-first-writer-wins-extra.test.ts — 38/38 pass
  • ./scripts/devnet.sh start && ./scripts/devnet-test-swm-late-joiner-subgraph.sh — PASS on local 6-node devnet (curator catchup line: Catch-up sync ... data=144 sharedMemory=106 denied=0; SPARQL sub-graph=5, root=3; zero gap-1 WARNs for the test CG).
  • Hardhat-backed e2e suite runs green in CI

Comment thread packages/agent/src/sync/responder/sync-handler.ts
Comment thread packages/agent/src/dkg-agent.ts
Comment thread packages/agent/src/sync/responder/sync-handler.ts Outdated
branarakic pushed a commit that referenced this pull request Jun 1, 2026
…boundary, graph-aware shared-memory verify, deferred-gossip e2e

Three Codex findings on PR #885:

1. sync-verify-worker (graph-blind validity sets): the pre-fix worker
   used flat `opsWithType` / `opsWithPublishedAt` Sets, so an op
   subject that appeared in two `_shared_memory_meta` graphs had its
   rootEntity admitted as universally valid. Refactor to per-meta-graph
   maps and derive allowed roots into a `dataGraph → Set<rootEntity>`
   index; the data quad filter then consults the same per-graph scope.

2. sync-responder boundary (URI shape was not exact): the prior
   `STRSTARTS(?g, "<cgPrefix>/") && STRENDS(?g, "/_shared_memory[_meta]")`
   filter — even with a `!CONTAINS(STRBEFORE(STRAFTER(...), suffix),
   "/")` segment-count tightening — could not distinguish a sub-graph
   "child" of cg-A from the root SWM of a different CG `cg-A/child`,
   because `validateContextGraphId` permits `/`. Switch to
   registration-based admission: admit the root SWM unconditionally
   and admit a sub-graph SWM only if a matching SubGraph registration
   exists in `<cgPrefix>/_meta` (the same triples
   `DKGPublisher.isSubGraphRegistered` looks for). Nested CGs put their
   registration in their own `_meta`, so they are naturally excluded.
   Pin the contract with a new `CG boundary tightening — nested CGs
   sharing a prefix do not leak` test.

3. e2e for the deferred SWM gossip flow: `join-approved` calls
   `subscribeToContextGraph(..., { deferSharedMemoryGossipSubscribe:
   true })` to skip the immediate SWM subscribe (the curator's
   allowlist hasn't synced into local `_meta` yet, so a pre-meta
   subscribe would emit a misleading `SWM gossip subscription denied`
   WARN). Add `swm-late-joiner-deferred-gossip.test.ts` with three
   pinning cases:
     - defer=true keeps publish/app/update/finalization wired but
       leaves the SWM workspace topic unsubscribed,
     - `refreshMetaSyncedFlags(cgIds)` re-queues the SWM subscribe
       once `_meta` lands locally,
     - default (no defer option) subscribes to SWM immediately when
       the ACL allows, guarding against an accidental
       always-defer regression.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread packages/agent/src/sync/responder/sync-handler.ts
Comment thread packages/agent/src/sync/responder/sync-handler.ts
Branimir Rakic and others added 3 commits June 1, 2026 22:20
…race

A late joiner who joins a CG via the requestJoin/approveJoin flow could
not see any SWM history that had been published into a sub-graph (e.g.
`<cg>/ai-tools/_shared_memory`), and the very first auto-subscribe
attempt always emitted a misleading `SWM gossip subscription denied:
local node is not authorized` warning.

Two surgical fixes, both in the agent package:

1. `sync/responder/sync-handler.ts` — the workspace branch hardcoded its
   SPARQL to `contextGraphWorkspaceGraphUri(cgId)` /
   `...MetaGraphUri(cgId)`, both of which alias the CG-root SWM URI.
   Any publish that supplied a `subGraphName` lands at
   `<cgPrefix>/<sub>/_shared_memory[_meta]` instead, so the responder
   served zero bytes of any sub-graph SWM. Replaced with a shape-based
   FILTER (`STRSTARTS(?g, "<cgPrefix>/") && STRENDS(?g, "/_shared_memory[_meta]")`)
   that emits `?g` per binding; the TTL data branch binds the ops
   graph to the entity graph via `STR(?gMeta) = CONCAT(STR(?g),
   "_meta")` so per-sub-graph ops only pair with their own entities.

2. `dkg-agent.ts` — added an opt-in `deferSharedMemoryGossipSubscribe`
   to `subscribeToContextGraph` and threaded it through the
   join-approved gossip handler. Other gossip topics (publish, app,
   update, finalization) still wire immediately for UI feedback; only
   the SWM gossip subscribe is deferred. `runImmediatePostApprovalSync`
   pulls `_meta` (now including the curator's allowlist), then
   `runCatchupOverPeers` calls `refreshMetaSyncedFlags` which re-queues
   the SWM subscribe (clean self-heal). No spurious WARN, no missed
   future gossip — sync covers the historical window and gossip the
   live one.

Test coverage in `test/sync-responder-swm-subgraphs.test.ts` (8 new
cases, 350 lines): root + every sub-graph SWM fan-out across data and
meta phases, durable-graph isolation, TTL filter scoped per sub-graph,
correct graph URI emitted per nquad. Existing
`sync-responder-per-cgid-meta` regression suite + `request-authorize`
+ `swm-snapshot-sync` + `swm-first-writer-wins-extra` +
`sync-fresh-per-attempt` all still pass (38/38).

Co-authored-by: Cursor <cursoragent@cursor.com>
Standalone devnet harness that reproduces the gap that motivated the
parent fix and asserts BOTH closures end-to-end against a real local
6-node devnet:

  • Curator (N5) creates curated CG with allowlist=[curator] only.
  • Curator creates sub-graph "ai-tools".
  • Curator publishes 5 SWM triples into the sub-graph + 3 into the
    CG-root SWM (control set distinguishes "neither flowed" from
    "only one flowed" in failure logs).
  • Late joiner (N1, NOT pre-allowlisted) signs a join delegation,
    sends `request-join`, curator approves.
  • After post-approval sync settles:
      ▸ Late joiner SPARQLs sub-graph "ai-tools" SWM → MUST be 5.
        (Pre-fix this is reliably 0 — exact gap-2 signature.)
      ▸ Late joiner SPARQLs CG-root SWM → MUST be 3 (baseline).
      ▸ Late joiner daemon log MUST NOT contain a
        `SWM gossip subscription denied for "<cgId>"` line for this
        specific CG. (Pre-fix this is reliably ≥1 — exact gap-1
        signature.)

Script runs against the standard devnet harness (`./scripts/devnet.sh
start`); no extra setup. Verified PASS on a freshly built local devnet
this commit was developed against:

    Catch-up sync ... data=144 sharedMemory=106 denied=0
    late joiner sub-graph "ai-tools" SWM count: 5 (expect 5)
    late joiner root SWM count: 3 (expect 3)
    zero spurious 'SWM gossip subscription denied' lines

Same harness format as `devnet-test-rfc38-late-joiner.sh` so the
existing operator muscle memory carries over (timestamp-suffixed CG
ids → re-runnable; HTTP-only → no libp2p mocking; CG_ID-anchored
log greps → no false-positives across the run's other CGs).

Co-authored-by: Cursor <cursoragent@cursor.com>
…boundary, graph-aware shared-memory verify, deferred-gossip e2e

Three Codex findings on PR #885:

1. sync-verify-worker (graph-blind validity sets): the pre-fix worker
   used flat `opsWithType` / `opsWithPublishedAt` Sets, so an op
   subject that appeared in two `_shared_memory_meta` graphs had its
   rootEntity admitted as universally valid. Refactor to per-meta-graph
   maps and derive allowed roots into a `dataGraph → Set<rootEntity>`
   index; the data quad filter then consults the same per-graph scope.

2. sync-responder boundary (URI shape was not exact): the prior
   `STRSTARTS(?g, "<cgPrefix>/") && STRENDS(?g, "/_shared_memory[_meta]")`
   filter — even with a `!CONTAINS(STRBEFORE(STRAFTER(...), suffix),
   "/")` segment-count tightening — could not distinguish a sub-graph
   "child" of cg-A from the root SWM of a different CG `cg-A/child`,
   because `validateContextGraphId` permits `/`. Switch to
   registration-based admission: admit the root SWM unconditionally
   and admit a sub-graph SWM only if a matching SubGraph registration
   exists in `<cgPrefix>/_meta` (the same triples
   `DKGPublisher.isSubGraphRegistered` looks for). Nested CGs put their
   registration in their own `_meta`, so they are naturally excluded.
   Pin the contract with a new `CG boundary tightening — nested CGs
   sharing a prefix do not leak` test.

3. e2e for the deferred SWM gossip flow: `join-approved` calls
   `subscribeToContextGraph(..., { deferSharedMemoryGossipSubscribe:
   true })` to skip the immediate SWM subscribe (the curator's
   allowlist hasn't synced into local `_meta` yet, so a pre-meta
   subscribe would emit a misleading `SWM gossip subscription denied`
   WARN). Add `swm-late-joiner-deferred-gossip.test.ts` with three
   pinning cases:
     - defer=true keeps publish/app/update/finalization wired but
       leaves the SWM workspace topic unsubscribed,
     - `refreshMetaSyncedFlags(cgIds)` re-queues the SWM subscribe
       once `_meta` lands locally,
     - default (no defer option) subscribes to SWM immediately when
       the ACL allows, guarding against an accidental
       always-defer regression.

Co-authored-by: Cursor <cursoragent@cursor.com>
@branarakic branarakic force-pushed the fix/swm-subgraph-late-joiner-backfill branch from 2706ef5 to 07462ce Compare June 1, 2026 20:21
Comment thread packages/agent/src/sync/responder/sync-handler.ts
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex review produced 2 comment(s) but all targeted lines outside the diff and were dropped. Check the workflow logs for details.

@branarakic branarakic merged commit 694d74f into main Jun 1, 2026
40 checks passed
@branarakic branarakic mentioned this pull request Jun 2, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant