Skip to content

OT-RFC-38 LU-6 B1 — signed swm-host-catchup requests#618

Closed
branarakic wants to merge 2 commits into
feat/ot-rfc-38-lu6-host-modefrom
feat/lu6-followup-b1-signed-host-catchup
Closed

OT-RFC-38 LU-6 B1 — signed swm-host-catchup requests#618
branarakic wants to merge 2 commits into
feat/ot-rfc-38-lu6-host-modefrom
feat/lu6-followup-b1-signed-host-catchup

Conversation

@branarakic
Copy link
Copy Markdown
Contributor

Summary

Closes the metadata-leak vector Codex flagged on PR #610 round-2 #6: the previous host-catchup wire let any peer that knew or guessed a contextGraphId pull stored envelopes from a host-only core. The local allowedPeers mitigation only covered member-side nodes — host-only cores have no local allowlist and fell through to unauthenticated serving.

Stacked on #610 (which is itself stacked on #609#608#595).

Wire change (v1 → v2, hard cutover)

  • SwmHostCatchupRequest now carries requesterEoa, issuedAtMs, nonce, and sig.
  • sig is EIP-191 personal-sign over a 228-byte packed digest binding version + keccak256(cgId) + sinceSeqno + maxEntries + maxBytes + requesterEoa + issuedAtMs + nonce.
  • New swm/host-catchup-sign.ts — digest layout, mint + verify, and a per-responder CatchupReplayGuard.
  • No back-compat: LU-6 hasn't shipped yet so there is no on-the-wire legacy to preserve.

Responder authorization (layered, deny-by-default)

`authorizeSwmHostCatchupRequest` runs:

  1. Signature recovery + freshness (5 min, matches the SWM envelope window).
  2. Replay-nonce uniqueness.
  3. Chain-anchored: `requesterEoa ∈ participantAgents` (definitive when chain context is available).
  4. Pre-registration fallback: `requesterEoa == beacon-pinned curator` (curator can always catch up themselves before paying gas).
  5. Member-side: `allowedPeers` peer-id check (defence in depth when local meta is present).
  6. Otherwise DENY — replaces the previous "serve openly when no authority source available" behaviour.

Requester

`catchupSwmFromHost` mints a signed request per round via `chain.signMessage`. Fails closed when no chain signer is wired (returns `{ denied: 'no chain signer...' }` rather than sending unsigned).

Test plan

  • 14 `host-catchup-sign.test.ts` cases: mint/verify roundtrip, tampered fields, freshness boundary, digest binding, replay-guard LRU
  • 6 new `host-catchup-wire.test.ts` cases: signed wire encode/decode, malformed sig/nonce, hard v1 rejection, digest determinism
  • All cases added to `vitest.unit.config.ts` allowlist (44 tests passing)
  • Devnet validation — covered once OT-RFC-38 LU-6 — opaque SWM hosting on cores + member host-catchup fallback #610 + this branch merge

Out of scope (follow-ups)

  • B2 (PR follow-up): orphaned `.log` reconcile on host-mode-store startup
  • B3 (PR follow-up): host-only designation persistence across restart
  • C-series: end-to-end devnet scenarios (member revocation, curator-offline mid-batch, unclean restart, pre-reg stress)

Made with Cursor

Closes the metadata-leak vector Codex flagged on PR #610 round-2 #6:
the previous host-catchup wire let any peer that knew or guessed a
`contextGraphId` pull stored envelopes from a host-only core. Local
`allowedPeers` only mitigated the member-side case; host-only cores
have no local allowlist and fell through to unauthenticated serving.

Wire change (v1 → v2, hard cutover — no on-the-wire legacy to
preserve since LU-6 hasn't shipped):

  * `SwmHostCatchupRequest` now carries `requesterEoa`, `issuedAtMs`,
    `nonce`, and `sig` (EIP-191 personal-sign over a 228-byte packed
    digest binding version + keccak256(cgId) + sinceSeqno + caps +
    requester + timestamp + nonce).
  * `new swm/host-catchup-sign.ts` — digest layout, mint + verify
    helpers, and a per-responder `CatchupReplayGuard` (sliding LRU,
    nonces age out with the freshness window).
  * `host-catchup-wire.ts` upgraded: encoder/decoder reject any wire
    that omits or malforms the new auth fields, including v1 bodies.

Responder authorization (`authorizeSwmHostCatchupRequest`, layered):
  1. signature + freshness (5 min, matches SWM envelope window)
  2. replay-nonce uniqueness
  3. chain-anchored: requesterEoa ∈ participantAgents (definitive)
  4. pre-registration fallback: requesterEoa == beacon-pinned curator
  5. member-side allowedPeers (transport-layer ACL, defence in depth)
  6. otherwise DENY — closes the previous fail-open behaviour

Requester (`catchupSwmFromHost`) mints a signed request per round
via `chain.signMessage`; fails closed when no chain signer is wired.

Tests (added to `vitest.unit.config.ts` allowlist):
  * 14 `host-catchup-sign.test.ts` — mint/verify roundtrip, tampered
    fields, freshness boundary, digest binding, replay-guard LRU.
  * 6 new cases in `host-catchup-wire.test.ts` — signed wire encode/
    decode, malformed sig/nonce, hard v1 rejection, digest determinism.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread packages/agent/src/dkg-agent.ts Outdated
// 3. chain-anchored authority
let chainParticipants: string[] | null = null;
try {
chainParticipants = await this.resolveOnChainParticipantAgents(req.contextGraphId);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: this auth path only checks the on-chain participant set / beacon curator / peer allowlist, but curated sync in this repo also allows DKG_ALLOWED_AGENT entries and delegatee op keys (inviteAgentToContextGraph writes them into _meta). A node that joins via delegation can produce a valid signed catchup request and still be denied here because the recovered signer is not a participantAgent. Reuse the existing sync-auth resolution here, or add allowed-agent + delegatee checks before denying.

Comment thread packages/agent/src/dkg-agent.ts Outdated
// OT-RFC-38 LU-6 B1 — every catchup request is signed by the
// requesting agent's chain EOA so the host can authenticate via
// on-chain participant set without trusting the libp2p peer-id.
const requesterEoa = await this.getRegistrationTxSignerAddress();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: requesterEoa is claimed as the registration tx signer, but the signature is produced by chain.signMessage(). The helper comment on getRegistrationTxSignerAddress() explicitly says those identities can differ; when they do, verifySignedCatchupRequest() will reject every request with a signer mismatch and host catchup stops working. Populate requesterEoa from the actual message-signing principal (or sign with the same key you advertise here).

…+ UNION authority

Addresses two Codex bugs flagged on PR #618:

1. requesterEoa <-> signer mismatch (dkg-agent.ts:9772, host-catchup-sign.ts)

   `catchupSwmFromHost` advertised `requesterEoa = getRegistrationTxSignerAddress()`
   while signing the digest via `chain.signMessage()`. The helper's own comment
   says those identities CAN differ; when they do, every catchup request gets
   rejected with "signer mismatch" and host catchup is fully broken.

   Fix: `mintSignedCatchupRequest` now recovers the signer from the signature
   itself and binds the digest to it. Two modes:
     - claimed mode: caller passes requesterEoa, helper verifies recovery
       matches (throws on mismatch — fail closed)
     - discovery mode: caller omits requesterEoa, helper signs a placeholder
       digest, recovers the address, then signs the FINAL digest bound to
       that address. Two sigs locally, one over the wire.

   `catchupSwmFromHost` switches to discovery mode — no more pre-call EOA
   lookup is needed.

2. Layered authority too strict (dkg-agent.ts:9648 authorizeSwmHostCatchupRequest)

   The prior implementation hard-denied as soon as `resolveOnChainParticipantAgents`
   returned a non-null set that didn't include the requester EOA. That's a
   fail-closed bug for two cases Codex called out:
     - members using a delegated message-signing key whose recovered signer
       isn't a `participantAgent` on chain
     - allowed agents whose chain mirror hasn't caught up yet

   Fix: UNION semantics across four sources. Accept on the first match;
   deny only if none accept:
     a. on-chain participant agents (chain truth)
     b. beacon-pinned curator (pre-registration fallback)
     c. local agent-gate set — `getContextGraphAgentGateAddresses()` —
        which unions `dkg:allowedAgent` + `dkg:participantAgent` from
        local `_meta` + subscription cache (NEW)
     d. transport-layer allowedPeers (libp2p peer-id allowlist)

   Distinguishes "no source available" from "all sources rejected" in
   the deny reason so operators can debug missing-CG vs. wrong-identity.

Tests: 2 new B1 unit tests cover discovery mode + claim-mismatch throw.
Existing 14 tests continue to pass against the claim mode (semantically
unchanged when the caller pins a requesterEoa).

Co-authored-by: Cursor <cursoragent@cursor.com>
let lastDenied: string | undefined;
while (rounds < maxRounds) {
rounds += 1;
const signedReq = await mintSignedCatchupRequest({
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: This new path now depends on chain.signMessage() producing a real recoverable EIP-191 signature, but MockAdapter.signMessage() still returns zeroed r/vs. In mock/test environments every catchup request minted here will fail verifySignedCatchupRequest() on the responder, so host catchup regresses completely. Either update the mock adapter in the same PR or add an explicit fallback/bypass for adapters that cannot produce recoverable signatures.

if (issuedAtMs < threshold) {
this.seen.delete(key);
} else {
// Insertion order = ascending issuedAtMs (approximately),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: evictStale() assumes insertion order roughly matches issuedAtMs, but this class explicitly accepts future-skewed timestamps within the freshness window. A future-dated nonce inserted first will make this break early and leave later stale entries in the map, causing false "replayed catchup nonce" rejections after those entries should have expired. Evict all stale entries, or track expiry by insertion time / a structure ordered by expiry instead of relying on map order.

@branarakic
Copy link
Copy Markdown
Contributor Author

Superseded by PR #649 (release: OT-RFC-38 LU-6 + RFC-39 Phase A.5 — testnet-ready cut rc.10). All commits from this PR are now in the integration/rfc38-mainnet-ready branch and being merged to main via #649. Unaddressed Codex review feedback is being tracked + fixed in a dedicated followup PR.

@branarakic branarakic closed this May 25, 2026
branarakic pushed a commit that referenced this pull request May 25, 2026
…apter compat + mock signer

Three more production bugs from the closed-PR codex review (#618, #620,
#637) — the remaining hard-coded fail-paths that escaped the rc.10
integration merge. Companion to f85d2a3 which fixed the host-mode-store
data-loss / lock-race set.

T1b #1 — MockChainAdapter.signMessage (PR #618 c3) — the mock returned
zero-byte `{r, vs}`, so every test that exercised
`mintSignedCatchupRequest` recovered a garbage signer on the responder
side and host catchup was effectively dead under MockChainAdapter.
Existing test files worked around this by monkey-patching `signMessage`
on the adapter instance (see cg-discovery-integration.test.ts:114). The
mock now delegates to `mockACKSigner` when a wallet has been registered
via `setMockACKSigner`, so any test that already wires the ACK signer
gets a real EIP-191 signature on `signMessage` too. Tests that don't
configure the ACK signer keep getting zeros — no behavioural change
for unit tests that don't exercise the signed-catchup path.

T1b #2 — gossip-teardown persistence leak (PR #620 c4) — when
`reconcileSharedMemoryGossipSubscription` discovers the local node
lost membership for a curated CG, it `gossip.unsubscribe(swmTopic)`s
the whole topic and clears the in-memory `swmHostModeSubscribed` /
`swmHostModeHandlers` maps. The persisted `hostModeSubscribed=true`
flag in `_meta` was NOT cleared, so the B3 startup-restore loop in
`initializeSwmHostModeStore` would happily re-subscribe to the same
CG on next boot — exactly the CG this branch had just torn down for
authorization reasons. Added an `enqueueHostModePersistence(cgId,
false)` call alongside the in-memory deletes. If the immediate
`reconcileSwmHostModeSubscription` re-engages, it does so through
`wireSwmHostModeHandler` which enqueues `true` again; the per-CG
queue's serialisation guarantees the final state always reflects
the final intent.

T1b #3 — backwards-compatible access-policy probe (PR #637 c1+c3)
— `_resolveEncryptInlinePayload`'s probe path was returning `null`
(UNKNOWN) when `chain.getContextGraphAccessPolicy` was undefined,
and `null` made the throw at the bottom of the helper hard-fail
publish for any numeric CG. Optional adapter method → mandatory in
practice. External / custom adapters that support V10 publish but
haven't adopted the access-policy getter would 500 on every publish
they routed through here. Fix distinguishes "method not implemented"
(falls back to PUBLIC — v9-era behaviour, restores compat) from
"method threw" (still returns null, still fails closed — that's an
actual RPC failure, refusing to pick plaintext-vs-encrypted without
a verified policy is the right call there). Local-meta probe above
runs unchanged and would still return `true` for any curated CG the
local node created or joined with policy metadata.

No behaviour change for clients that:
  - use the EVM adapter (always implements the getter)
  - run tests that already configure mockACKSigner
  - hit a curated CG with local policy metadata

Builds clean (chain + agent).

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant