OT-RFC-38 LU-6 B1 — signed swm-host-catchup requests#618
Conversation
Closes the metadata-leak vector Codex flagged on PR #610 round-2 #6: the previous host-catchup wire let any peer that knew or guessed a `contextGraphId` pull stored envelopes from a host-only core. Local `allowedPeers` only mitigated the member-side case; host-only cores have no local allowlist and fell through to unauthenticated serving. Wire change (v1 → v2, hard cutover — no on-the-wire legacy to preserve since LU-6 hasn't shipped): * `SwmHostCatchupRequest` now carries `requesterEoa`, `issuedAtMs`, `nonce`, and `sig` (EIP-191 personal-sign over a 228-byte packed digest binding version + keccak256(cgId) + sinceSeqno + caps + requester + timestamp + nonce). * `new swm/host-catchup-sign.ts` — digest layout, mint + verify helpers, and a per-responder `CatchupReplayGuard` (sliding LRU, nonces age out with the freshness window). * `host-catchup-wire.ts` upgraded: encoder/decoder reject any wire that omits or malforms the new auth fields, including v1 bodies. Responder authorization (`authorizeSwmHostCatchupRequest`, layered): 1. signature + freshness (5 min, matches SWM envelope window) 2. replay-nonce uniqueness 3. chain-anchored: requesterEoa ∈ participantAgents (definitive) 4. pre-registration fallback: requesterEoa == beacon-pinned curator 5. member-side allowedPeers (transport-layer ACL, defence in depth) 6. otherwise DENY — closes the previous fail-open behaviour Requester (`catchupSwmFromHost`) mints a signed request per round via `chain.signMessage`; fails closed when no chain signer is wired. Tests (added to `vitest.unit.config.ts` allowlist): * 14 `host-catchup-sign.test.ts` — mint/verify roundtrip, tampered fields, freshness boundary, digest binding, replay-guard LRU. * 6 new cases in `host-catchup-wire.test.ts` — signed wire encode/ decode, malformed sig/nonce, hard v1 rejection, digest determinism. Co-authored-by: Cursor <cursoragent@cursor.com>
| // 3. chain-anchored authority | ||
| let chainParticipants: string[] | null = null; | ||
| try { | ||
| chainParticipants = await this.resolveOnChainParticipantAgents(req.contextGraphId); |
There was a problem hiding this comment.
🔴 Bug: this auth path only checks the on-chain participant set / beacon curator / peer allowlist, but curated sync in this repo also allows DKG_ALLOWED_AGENT entries and delegatee op keys (inviteAgentToContextGraph writes them into _meta). A node that joins via delegation can produce a valid signed catchup request and still be denied here because the recovered signer is not a participantAgent. Reuse the existing sync-auth resolution here, or add allowed-agent + delegatee checks before denying.
| // OT-RFC-38 LU-6 B1 — every catchup request is signed by the | ||
| // requesting agent's chain EOA so the host can authenticate via | ||
| // on-chain participant set without trusting the libp2p peer-id. | ||
| const requesterEoa = await this.getRegistrationTxSignerAddress(); |
There was a problem hiding this comment.
🔴 Bug: requesterEoa is claimed as the registration tx signer, but the signature is produced by chain.signMessage(). The helper comment on getRegistrationTxSignerAddress() explicitly says those identities can differ; when they do, verifySignedCatchupRequest() will reject every request with a signer mismatch and host catchup stops working. Populate requesterEoa from the actual message-signing principal (or sign with the same key you advertise here).
…+ UNION authority Addresses two Codex bugs flagged on PR #618: 1. requesterEoa <-> signer mismatch (dkg-agent.ts:9772, host-catchup-sign.ts) `catchupSwmFromHost` advertised `requesterEoa = getRegistrationTxSignerAddress()` while signing the digest via `chain.signMessage()`. The helper's own comment says those identities CAN differ; when they do, every catchup request gets rejected with "signer mismatch" and host catchup is fully broken. Fix: `mintSignedCatchupRequest` now recovers the signer from the signature itself and binds the digest to it. Two modes: - claimed mode: caller passes requesterEoa, helper verifies recovery matches (throws on mismatch — fail closed) - discovery mode: caller omits requesterEoa, helper signs a placeholder digest, recovers the address, then signs the FINAL digest bound to that address. Two sigs locally, one over the wire. `catchupSwmFromHost` switches to discovery mode — no more pre-call EOA lookup is needed. 2. Layered authority too strict (dkg-agent.ts:9648 authorizeSwmHostCatchupRequest) The prior implementation hard-denied as soon as `resolveOnChainParticipantAgents` returned a non-null set that didn't include the requester EOA. That's a fail-closed bug for two cases Codex called out: - members using a delegated message-signing key whose recovered signer isn't a `participantAgent` on chain - allowed agents whose chain mirror hasn't caught up yet Fix: UNION semantics across four sources. Accept on the first match; deny only if none accept: a. on-chain participant agents (chain truth) b. beacon-pinned curator (pre-registration fallback) c. local agent-gate set — `getContextGraphAgentGateAddresses()` — which unions `dkg:allowedAgent` + `dkg:participantAgent` from local `_meta` + subscription cache (NEW) d. transport-layer allowedPeers (libp2p peer-id allowlist) Distinguishes "no source available" from "all sources rejected" in the deny reason so operators can debug missing-CG vs. wrong-identity. Tests: 2 new B1 unit tests cover discovery mode + claim-mismatch throw. Existing 14 tests continue to pass against the claim mode (semantically unchanged when the caller pins a requesterEoa). Co-authored-by: Cursor <cursoragent@cursor.com>
| let lastDenied: string | undefined; | ||
| while (rounds < maxRounds) { | ||
| rounds += 1; | ||
| const signedReq = await mintSignedCatchupRequest({ |
There was a problem hiding this comment.
🔴 Bug: This new path now depends on chain.signMessage() producing a real recoverable EIP-191 signature, but MockAdapter.signMessage() still returns zeroed r/vs. In mock/test environments every catchup request minted here will fail verifySignedCatchupRequest() on the responder, so host catchup regresses completely. Either update the mock adapter in the same PR or add an explicit fallback/bypass for adapters that cannot produce recoverable signatures.
| if (issuedAtMs < threshold) { | ||
| this.seen.delete(key); | ||
| } else { | ||
| // Insertion order = ascending issuedAtMs (approximately), |
There was a problem hiding this comment.
🔴 Bug: evictStale() assumes insertion order roughly matches issuedAtMs, but this class explicitly accepts future-skewed timestamps within the freshness window. A future-dated nonce inserted first will make this break early and leave later stale entries in the map, causing false "replayed catchup nonce" rejections after those entries should have expired. Evict all stale entries, or track expiry by insertion time / a structure ordered by expiry instead of relying on map order.
…apter compat + mock signer Three more production bugs from the closed-PR codex review (#618, #620, #637) — the remaining hard-coded fail-paths that escaped the rc.10 integration merge. Companion to f85d2a3 which fixed the host-mode-store data-loss / lock-race set. T1b #1 — MockChainAdapter.signMessage (PR #618 c3) — the mock returned zero-byte `{r, vs}`, so every test that exercised `mintSignedCatchupRequest` recovered a garbage signer on the responder side and host catchup was effectively dead under MockChainAdapter. Existing test files worked around this by monkey-patching `signMessage` on the adapter instance (see cg-discovery-integration.test.ts:114). The mock now delegates to `mockACKSigner` when a wallet has been registered via `setMockACKSigner`, so any test that already wires the ACK signer gets a real EIP-191 signature on `signMessage` too. Tests that don't configure the ACK signer keep getting zeros — no behavioural change for unit tests that don't exercise the signed-catchup path. T1b #2 — gossip-teardown persistence leak (PR #620 c4) — when `reconcileSharedMemoryGossipSubscription` discovers the local node lost membership for a curated CG, it `gossip.unsubscribe(swmTopic)`s the whole topic and clears the in-memory `swmHostModeSubscribed` / `swmHostModeHandlers` maps. The persisted `hostModeSubscribed=true` flag in `_meta` was NOT cleared, so the B3 startup-restore loop in `initializeSwmHostModeStore` would happily re-subscribe to the same CG on next boot — exactly the CG this branch had just torn down for authorization reasons. Added an `enqueueHostModePersistence(cgId, false)` call alongside the in-memory deletes. If the immediate `reconcileSwmHostModeSubscription` re-engages, it does so through `wireSwmHostModeHandler` which enqueues `true` again; the per-CG queue's serialisation guarantees the final state always reflects the final intent. T1b #3 — backwards-compatible access-policy probe (PR #637 c1+c3) — `_resolveEncryptInlinePayload`'s probe path was returning `null` (UNKNOWN) when `chain.getContextGraphAccessPolicy` was undefined, and `null` made the throw at the bottom of the helper hard-fail publish for any numeric CG. Optional adapter method → mandatory in practice. External / custom adapters that support V10 publish but haven't adopted the access-policy getter would 500 on every publish they routed through here. Fix distinguishes "method not implemented" (falls back to PUBLIC — v9-era behaviour, restores compat) from "method threw" (still returns null, still fails closed — that's an actual RPC failure, refusing to pick plaintext-vs-encrypted without a verified policy is the right call there). Local-meta probe above runs unchanged and would still return `true` for any curated CG the local node created or joined with policy metadata. No behaviour change for clients that: - use the EVM adapter (always implements the getter) - run tests that already configure mockACKSigner - hit a curated CG with local policy metadata Builds clean (chain + agent). Co-authored-by: Cursor <cursoragent@cursor.com>
Summary
Closes the metadata-leak vector Codex flagged on PR #610 round-2 #6: the previous host-catchup wire let any peer that knew or guessed a
contextGraphIdpull stored envelopes from a host-only core. The localallowedPeersmitigation only covered member-side nodes — host-only cores have no local allowlist and fell through to unauthenticated serving.Stacked on #610 (which is itself stacked on #609 → #608 → #595).
Wire change (v1 → v2, hard cutover)
SwmHostCatchupRequestnow carriesrequesterEoa,issuedAtMs,nonce, andsig.sigis EIP-191 personal-sign over a 228-byte packed digest bindingversion + keccak256(cgId) + sinceSeqno + maxEntries + maxBytes + requesterEoa + issuedAtMs + nonce.swm/host-catchup-sign.ts— digest layout, mint + verify, and a per-responderCatchupReplayGuard.Responder authorization (layered, deny-by-default)
`authorizeSwmHostCatchupRequest` runs:
Requester
`catchupSwmFromHost` mints a signed request per round via `chain.signMessage`. Fails closed when no chain signer is wired (returns `{ denied: 'no chain signer...' }` rather than sending unsigned).
Test plan
Out of scope (follow-ups)
Made with Cursor