feat(rc.11) PR-B: typed ACK errors + RPC caching + LU-6 runbook + provenance telemetry#672
Conversation
…ce telemetry
PR-B in the 5-PR rc.11 train (T3+T4+T5 in the train commit 543c2c14).
Stacks on top of feat/rc11-A-honest-ack-tentative-vm-cleanup which
ships T1+T2.
T3 — Honest publish error reporting + RPC caching
- packages/publisher/src/ack-errors.ts (NEW): typed `ACKProviderError`
hierarchy with `RpcPreconditionError` (RPC pre-flight died on
chainId / KAV10 / minRequiredSignatures) vs `QuorumUnmetError`
(dialled N, got M, needed K — with `PeerOutcome[]` carrying
`{peerId, dialOk, protocolSupported, swmHostModeAdvertised,
reason}`). Includes `wrapAsRpcPreconditionIfApplicable` helper
for the agent's V10 provider wrapper to upgrade legacy throws.
- packages/chain/src/evm-adapter.ts: 1h-TTL cache for chainId, KAV10
address, and minimumRequiredSignatures. Steady-state cost: zero
RPC per publish (previously one round-trip per attempt). Cache
invalidator (`invalidatePublishPreflightCache`) exposed on the
EVM adapter only — mock parity test docs this as EVM-only.
- packages/cli/src/daemon/lifecycle.ts + publisher-runner.ts:
`chainRpcUrl` flows through from config. Daemon emits startup
WARN when an operator runs in publishing role against a known
public RPC ("node is configured to publish but using public RPC X
— ACK pre-flight will be rate-limited, recommend setting
chainRpcUrl to a private endpoint").
- packages/cli/test/lifecycle-public-rpc-warn.test.ts (NEW): pins
the WARN for the dzudza-style failure mode.
T4 — LU-6 Phase B verification + CHANGELOG + runbook + MCP tool
- CHANGELOG.md: "Deferred (Phase A sub-task)" stub rewritten into
"Added — OT-RFC-38 Phase B (cores auto-host curated CGs)"
documenting the four discovery paths actually wired in rc.10:
chain-event auto-subscribe (dkg-agent.ts:1737-1803), discovery
beacon (1810-1838), periodic reconciler (9347-9376), and
operator-driven enrolment (memory.ts:960-977). Genuine remaining
gaps (sharding-table-aware selection once shards>1; cross-core
ciphertext replay) stay in Deferred.
- docs/runbooks/RUNBOOK_HOST_MODE_MANUAL_SUBSCRIBE.md (NEW): operator
playbook for the manual fourth-path `POST /api/shared-memory/
host-mode/subscribe`. When / why / how, response-shape reference,
anti-patterns, undo procedure.
- packages/mcp-dkg/src/{client.ts,tools/setup.ts}: new
`dkg_request_hosting` MCP tool wrapping the manual subscribe
endpoint. Operator UX sugar; refuses cleanly when local is
already a CG member (memberMode handler takes precedence) or
on non-core daemons.
- packages/mcp-dkg/test/drop-sweep.test.ts: locked count 25 → 26.
T5 — ACK-provenance telemetry
- packages/core/src/proto/storage-ack.ts: `StorageACKMsg.subscription
Source` (wire field 8, strictly additive); closed `SUBSCRIPTION_
SOURCES` enum (chain-event | beacon | reconciler | manual | member).
- packages/agent/src/dkg-agent.ts: `swmHostModeSubscribed` converted
from `Set<string>` to `Map<string, SubscriptionSource>`. The four
Phase B paths call `reconcileSwmHostModeSubscription(cgId, source)`
with their respective sources; `wireSwmHostModeHandler` records it.
New public `getSwmSubscriptionSource(...candidateIds)` resolver
handles the cleartext / wire-hash key-shape difference per path.
Bound to `StorageACKHandlerConfig.getSubscriptionSourceForCg` so
every signed ACK carries provenance.
- packages/publisher/src/storage-ack-handler.ts: populates new
`subscriptionSource` field on both signed-ACK paths
(encrypted + plaintext).
- packages/publisher/src/ack-collector.ts: `CollectedACK.subscription
Source?` plumbed through; `snapshotPeerOutcomes` populates
`swmHostModeAdvertised` from the ACK source so `QuorumUnmetError.
peers[].swmHostModeAdvertised` is meaningful for diagnosis.
- packages/publisher/src/dkg-publisher.ts: per-publish ACK-provenance
summary line ("V10: Collected N core node ACKs [peer:source, ...]")
emitted after successful ACK collection. Lets an operator answer
"*why* did this CG get hosted by these cores?" from the daemon log
alone instead of cross-referencing chain poller + beacon + reconciler.
Evidence:
- Full repo `pnpm -r build` clean on this branch.
- Integration branch (`release/v10.0.0-rc.11`, includes this PR
+ PR-A + all 15 branarakic rc.11 PRs) sweep:
* RFC-38 round-2 cold-start: 11/11 PASS.
* `pnpm -r test:unit`: 317/317 PASS.
* Two new targeted devnet scenarios (async-promote crash +
restart, SIGTERM mid-publish under load): both GREEN.
Co-authored-by: Cursor <cursoragent@cursor.com>
| `nodeIdentityId ${this.config.nodeIdentityId} exceeds uint64 wire format`, | ||
| ); | ||
| } | ||
| const encryptedSubscriptionSource = this.config.getSubscriptionSourceForCg?.( |
There was a problem hiding this comment.
🔴 Bug: This resolver only sees the numeric cgId plus the optional cleartext swmGraphId, but DKGAgent.swmHostModeSubscribed stores chain-event/beacon subscriptions under the wire hash. On those common host-only paths getSwmSubscriptionSource() will miss and emit no subscriptionSource, so the new ACK-provenance logs/reporting degrade to ?. Pass the wire hash/gossip topic here, or canonicalize candidates through gossipWireIdFor() before lookup.
|
|
||
| export function isLikelyPublicRpc(url: string): boolean { | ||
| const lower = url.toLowerCase(); | ||
| return KNOWN_PUBLIC_RPC_HOSTS.some((host) => lower.includes(host)); |
There was a problem hiding this comment.
🟡 Issue: includes() is checking the entire URL string, not just the hostname. That means a private proxy URL like https://rpc.my-company.example/https://sepolia.base.org will be warned as "public" even though the host is private. Parse the URL and match against hostname only, which is what the comment above says this helper is doing.
|
|
||
| 1. The daemon log on the core should include | ||
| `Host-mode subscribed CG=<id>` (or the persistence-queue equivalent). | ||
| 2. `GET /api/shared-memory/host-mode/list` reports the CG (along with |
There was a problem hiding this comment.
🟡 Issue: This verification step points operators at GET /api/shared-memory/host-mode/list, but the daemon route added in this PR is /api/shared-memory/host-mode/stats. As written, the runbook's "verify it worked" step will 404. Update the runbook to use the existing stats endpoint (or add the list route if that was the intended API).
3 issues from `github-actions[bot]` Codex review on PR #672. Strictly additive — no new behaviour, just closes the misses Codex flagged on the PR5 ACK-provenance resolver, the PR3 public-RPC heuristic, and the LU-6 manual-subscribe runbook. ## 1. dkg-agent.ts:1627 — canonicalize ACK resolver candidates through `gossipWireIdFor()` (🔴) The PR5 `getSubscriptionSourceForCg` closure passed only the numeric on-chain `cgId` and the cleartext `swmGraphId` to `getSwmSubscriptionSource(...candidateIds)`. But chain-event AND discovery-beacon paths key `swmHostModeSubscribed` by the WIRE form (the curator-committed `nameHash`, normalized through `gossipWireIdFor` — see `dkg-agent.ts:9317-9325`). The resolver therefore missed every chain-event / beacon host-only subscription and the publisher's ACK-provenance log line degraded to `?` on the common host-only paths. Fix: in the same closure, also compute the wire form of each non-empty candidate via `this.gossipWireIdFor(...)` and pass it as an extra candidate. `getSwmSubscriptionSource` is already variadic and dedupes via its internal `seen` Set, so over-passing is cheap, order-independent, and a no-op when one of the four shapes already hits. `gossipWireIdFor` is a pure local computation — no extra network or chain reads. ## 2. lifecycle.ts:408 — `isLikelyPublicRpc` parses the URL and matches against `hostname` only (🟡) Pre-fix: `lower.includes(host)` against the WHOLE URL string. A private proxy URL that embeds a known-public hostname in its PATH — e.g. `https://rpc.my-company.example/upstream/sepolia.base.org` or `https://proxy.my-company.example/?target=https://rpc.ankr.com` — would be misclassified as public and the daemon would emit a spurious WARN. Worse: an operator who suppressed the WARN by setting `chain.rpcUrl` to that same URL would still trip the same check. Fix: parse `new URL(url).hostname.toLowerCase()` and match with endpoint semantics (`===` OR `endsWith('.' + host)`). Subdomains of known-public hosts (e.g. `eu.rpc.ankr.com`) still flag correctly. Adjacent-suffix lookalikes (e.g. `notsepolia.base.org`) do NOT. Falls back to the old substring scan for unparseable inputs (bare hostnames without a scheme) so a malformed `chain.rpcUrl` an operator still wants flagged keeps tripping the WARN. Added 4 new test cases to `lifecycle-public-rpc-warn.test.ts`: two private-proxy-with-embedded-public-path regression cases, and two subdomain-handling cases (`eu.rpc.ankr.com` → true, `notsepolia.base.org` → false). All 17 cases pass. ## 3. RUNBOOK_HOST_MODE_MANUAL_SUBSCRIBE.md — point operators at the endpoint that actually exists (🟡) The "How to verify the subscribe took effect" section referenced `GET /api/shared-memory/host-mode/list`, but the daemon only exposes `GET /api/shared-memory/host-mode/stats` (served by `packages/cli/src/daemon/routes/memory.ts:947-957`, backed by `DKGAgent.getSwmHostModeStats()`). As written, an operator following the runbook would hit a 404. Fix: point at `host-mode/stats`, document its actual response shape (`{ enabled, cgCount, totalBytes, totalEntries, subscribedCgIds }`), and explain why `totalBytes` / `totalEntries` stay at zero until the first gossiped envelope ingests (so an operator following step 3 isn't surprised by zero counters on a fresh subscribe). Co-authored-by: Cursor <cursoragent@cursor.com>
|
Codex review addressed in
Builds clean on |
| // call returns a clean `hostingEnabled: false` rather than a | ||
| // throw — the tool surfaces that as a helpful no-op message | ||
| // instead of an error. | ||
| server.registerTool( |
There was a problem hiding this comment.
🔴 Bug: This registers the manual host-mode subscribe path as a normal MCP tool, but the runbook/agent instructions treat /api/shared-memory/host-mode/subscribe as operator-only because it changes who hosts a curated CG. Exposing it without any explicit human confirmation lets agents invoke a hosting change autonomously and bypass that trust boundary. Please gate this behind an operator-confirmed flow (or keep it out of the default tool surface).
There was a problem hiding this comment.
Fixed in e913ec77. Removed the dkg_request_hosting MCP tool registration entirely (packages/mcp-dkg/src/tools/setup.ts) — the route stays in place for operator use via curl / dkg CLI, but it's no longer agent-invokable through MCP. Added the tool name to DROPPED_TOOLS in packages/mcp-dkg/test/drop-sweep.test.ts so a future re-registration trips the regression guard (locked tool count goes 26 → 25). Updated docs/runbooks/RUNBOOK_HOST_MODE_MANUAL_SUBSCRIBE.md with an operator-only callout up top and strengthened the existing anti-pattern entry. DkgClient.requestHostMode() stays so operator-side scripts that wrap the daemon API still work; only the agent-facing MCP surface drops.
| const wireCgId = this.gossipWireIdFor(contextGraphId); | ||
| const swmTopic = contextGraphWorkspaceTopic(wireCgId); | ||
| this.swmHostModeSubscribed.add(contextGraphId); | ||
| this.swmHostModeSubscribed.set(contextGraphId, source); |
There was a problem hiding this comment.
🔴 Bug: swmHostModeSubscribed is now keyed by the raw contextGraphId, but this PR also documents that the same CG can arrive as a cleartext id, numeric id, or wire hash. After a chain-event/beacon subscribe (hash key), a later manual subscribe with the cleartext id will miss has() and wire a second handler on the same topic, causing duplicate host-mode ingest/persistence and ambiguous provenance. Canonicalize the bookkeeping key (e.g. wireCgId) or check both local+wire forms before wiring.
There was a problem hiding this comment.
Fixed in e913ec77. Introduced DKGAgent.canonicalSwmHostModeKey() (thin alias around the existing gossipWireIdFor helper that resolves cleartext / numeric / pre-canonical / hash shapes all to the curator-committed wire hash) and routed every swmHostModeSubscribed / swmHostModeHandlers insert/lookup/delete site through it. Touch sites:
wireSwmHostModeHandler— canonicalise first, key both maps offwireCgIdso re-entry under a different shape short-circuits at thehas()check.unwireSwmHostModeHandler— canonical-key lookup so a cleartext subscribe + hash unwire (or vice versa) releases the wired handler.reconcileSharedMemoryGossipSubscription— canonical-key delete when member authorisation is lost so the immediate host-mode reconcile call sees the entry cleared.reconcileSwmHostModeSubscription+enableSwmHostModeFor— idempotent re-entryhas()checks go through the canonical key.getSwmSubscriptionSource— canonicalise each candidate before host-mode lookup;seenSet dedups raw + canonical forms; updated thelifecycle.tsgetSubscriptionSourceForCgcallback comment to reflect that internal canonicalisation.
Regression coverage: new packages/agent/test/swm/host-mode-key-canonicalization.test.ts (4 tests, added to the unit config) verifies cleartext-then-hash AND hash-then-cleartext subscribe orderings each leave exactly one canonical-keyed entry in both bookkeeping maps, with no duplicate gossip handler wired on the same topic, and that a mixed-shape unwire releases the wire path. All 115 agent unit tests + 218 mcp-dkg tests pass.
A small side effect: getSwmHostModeStats().subscribedCgIds now always returns the canonical wire-hash form (regardless of subscribe path / shape), so operators reading the diagnostics endpoint will see the hash even after a cleartext POST. Documented in the runbook with an updated example response and a pointer to canonicalSwmHostModeKey.
| quads: [q(`${ENTITY}/D6revert`, 'http://schema.org/name', '"FakeId"')], | ||
| publisherNodeIdentityIdOverride: fakeId, | ||
| }), | ||
| ).rejects.toThrow(); |
There was a problem hiding this comment.
🟡 Issue: rejects.toThrow() is too broad for the behavior this test is trying to pin. With the new PR1/PR3 failure modes, this will also pass on unrelated rejections (ACK quorum failure, RPC preflight, missing attestation, etc.), so the fake-identity contract revert can silently stop being covered. Assert on the expected revert shape/message here to keep the regression targeted.
| // reject a publish that used the stale quorum so a single | ||
| // mis-routed retry past the boundary is the worst-case symptom. | ||
| const now = Date.now(); | ||
| if (EVMChainAdapter.preflightCacheFresh(this.cachedMinRequiredSignatures, now)) { |
There was a problem hiding this comment.
🔴 Bug: caching minimumRequiredSignatures() for 1 hour can reject otherwise valid publishes after governance lowers the quorum (for example 3 -> 2). The contract only protects the stale-low case; in the stale-high case the publisher bails out locally before submit. This value needs a much shorter TTL or an explicit invalidation path on parameter changes.
| `nodeIdentityId ${this.config.nodeIdentityId} exceeds uint64 wire format`, | ||
| ); | ||
| } | ||
| const encryptedSubscriptionSource = this.config.getSubscriptionSourceForCg?.( |
There was a problem hiding this comment.
🟡 Issue: the new getSubscriptionSourceForCg contract says resolvers can inspect the gossip-topic/wire-form candidate, but both ACK paths only pass cgId and swmGraphId. On direct publishes where swmGraphId is omitted because it equals the numeric contextGraphId, the resolver never sees a key that matches member/host subscriptions, so subscriptionSource falls back to undefined for a normal ACK. Pass the third candidate here (and in the non-encrypted path) or narrow the API/docs.
| // without a source field are pre-PR5; we leave | ||
| // `swmHostModeAdvertised` undefined rather than asserting | ||
| // either way. | ||
| const advertised = ack.subscriptionSource !== undefined ? true : undefined; |
There was a problem hiding this comment.
🟡 Issue: protobufjs decodes an unset string field as "", not undefined (you already rely on that for declineCode). With this check, legacy/pre-PR5 ACKs get reported as swmHostModeAdvertised: true even though they did not send a provenance source. Gate this on a validated non-empty enum value (isSubscriptionSource) before setting the telemetry flag.
| contextGraphId: string, | ||
| source: SubscriptionSource = SUBSCRIPTION_SOURCES.RECONCILER, | ||
| ): void { | ||
| if (this.swmHostModeHandlers.has(contextGraphId)) { |
There was a problem hiding this comment.
🔴 Bug: This idempotency check keys host-mode subscriptions by the raw contextGraphId, but the same CG can reach this method as a wire hash (chain-event/beacon) or as a cleartext id (reconciler/manual). After a hash-keyed subscribe, a later cleartext pass misses this guard, registers a second handler on the same gossip topic, and ingests each envelope twice under two IDs. Canonicalize the key (for example the wire id) or dedupe across equivalent ids before wiring.
| } | ||
|
|
||
| async getMinimumRequiredSignatures(): Promise<number> { | ||
| // PR3 / RC11: TTL-cached. Governance vote that changes |
There was a problem hiding this comment.
🔴 Bug: Caching minimumRequiredSignatures for a full hour creates a correctness window after any governance change. Once the on-chain quorum changes, the publisher will keep using the stale threshold until TTL expiry, so it can either wait for ACKs that are no longer needed or submit with too few ACKs and get reverted for up to an hour. This value needs a much shorter invalidation path (or no long-lived cache), even if chainId and KAV10 stay cached.
| curl -X POST \ | ||
| -H "Authorization: Bearer $AUTH" \ | ||
| -d '{ "contextGraphId": "<id>" }' \ | ||
| http://127.0.0.1:9201/api/shared-memory/host-mode/unsubscribe |
There was a problem hiding this comment.
🔴 Bug: This runbook tells operators to call /api/shared-memory/host-mode/unsubscribe, but this PR does not add that route and there is no matching handler in packages/cli/src/daemon/routes/memory.ts. Following these steps will 404 while the doc claims unsubscribe is supported and idempotent. Either implement the endpoint or remove/update the undo section.
…/github.com/OriginTrail/dkg into feat/rc11-B-typed-errors-runbook-provenance
…/github.com/OriginTrail/dkg into feat/rc11-B-typed-errors-runbook-provenance
… MCP tool PR #672 Codex review pass — two trust/correctness fixes flagged on PR-B (`feat/rc11-B-typed-errors-runbook-provenance`). ### 1. `swmHostModeSubscribed` key canonicalisation (id=3302086589) The four LU-6 Phase B discovery paths deliver the same CG to host-mode wiring in different shapes: chain-event/beacon carry the curator- committed wire hash, while reconciler/manual carry the cleartext local id (or whatever string the operator POSTed). The previous code keyed `swmHostModeSubscribed` / `swmHostModeHandlers` by the raw caller- supplied id, so a chain-event-driven hash subscribe followed by a manual cleartext subscribe for the SAME CG would miss `has()` and wire a second handler on the same gossip topic — doubling host-mode ingest, duplicating persistence, and producing ambiguous ACK provenance. Fix: introduce `DKGAgent.canonicalSwmHostModeKey()` (thin alias around the existing `gossipWireIdFor` helper) and route every host-mode bookkeeping insert/lookup/delete through it so the maps are always keyed by the curator-committed wire hash. Touch sites updated: * `wireSwmHostModeHandler` — canonicalise first, key both maps off `wireCgId` so re-entry under a different shape short-circuits the handler set. * `unwireSwmHostModeHandler` — canonical-key lookup on the unwire path so a cleartext subscribe + hash unwire (or vice versa) releases the wired handler. * `reconcileSharedMemoryGossipSubscription` — delete-by-canonical when member authorisation is lost so the immediate host-mode reconcile call sees the entry cleared. * `reconcileSwmHostModeSubscription` + `enableSwmHostModeFor` — the idempotent re-entry `has()` checks go through the canonical key. * `getSwmSubscriptionSource` — canonicalise each candidate before the host-mode lookup; the `seen` Set now dedups raw + canonical forms. * `lifecycle.ts` `getSubscriptionSourceForCg` callback — docs updated to reflect that `getSwmSubscriptionSource` now does the canonicalisation internally on the host-mode path. Regression coverage: new `packages/agent/test/swm/host-mode-key-canonicalization.test.ts` (added to the unit config) exercises cleartext-then-hash and hash-then-cleartext subscribe orderings plus mixed-shape unwire, asserting both bookkeeping maps end with exactly one canonical-keyed entry per CG and zero residue after unwire. ### 2. Remove `dkg_request_hosting` MCP tool (id=3302086584) The PR-4 honest-ACK cleanup added an MCP tool that let any connected agent flip its core into a host for any curated CG by name. That changes a trust-boundary decision the curator owns — the operator-only `POST /api/shared-memory/host-mode/subscribe` route is for direct `curl` / `dkg` CLI use, not for autonomous agent invocation. Fix: removed the `server.registerTool('dkg_request_hosting', …)` call from `packages/mcp-dkg/src/tools/setup.ts` and added it to the `DROPPED_TOOLS` allowlist in `drop-sweep.test.ts` so a future revival trips the regression guard. Locked tool count goes 26 → 25. `DkgClient.requestHostMode()` stays for operator scripts that wrap the daemon API for their own internal tooling — only the MCP- discoverable tool surface drops. `docs/runbooks/RUNBOOK_HOST_MODE_MANUAL_SUBSCRIBE.md` gains an operator-only callout up top and a corrected `subscribedCgIds` example showing the canonical wire-hash form returned post-fix #1. ### Tests `pnpm --filter @origintrail-official/dkg-mcp test`: 218 passed. `pnpm --filter @origintrail-official/dkg-agent test:unit`: 115 passed (includes the 4 new canonicalisation tests + existing 25 host-mode- store, 16 host-catchup-sign, 13 host-catchup-wire suites). Co-authored-by: Cursor <cursoragent@cursor.com>
| * second adapter pointed at a different chain has its own cache | ||
| * with no cross-talk. | ||
| */ | ||
| private static readonly PREFLIGHT_TTL_MS = 60 * 60 * 1000; |
There was a problem hiding this comment.
🔴 Bug: a single 1h TTL for all pre-flight reads makes publishes fail after legitimate chain changes. minimumRequiredSignatures can change via governance and KnowledgeAssetsV10 can rotate via Hub; until this cache expires, the publisher keeps using stale quorum/address data, so every publish is built against the wrong pre-flight values and is rejected on-chain. Consider only caching immutable data like chainId, or add explicit invalidation for the mutable reads.
…aleatoric/dev/dkg-pr-rc11-a into feat/rc11-B-typed-errors-runbook-provenance
|
Folded into #680 (release/v10.0.0-rc.11 integration branch), now on main as commit 6c09049. All commits on this branch are ancestors of Closing as superseded. Any unaddressed Codex findings on this PR will be tracked in #676 (rc.12 follow-ups) and addressed in a follow-up PR. |
… into release/rc.11 Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # packages/agent/src/dkg-agent.ts
Bump root + 17 workspace packages from 10.0.0-rc.10 to 10.0.0-rc.11. Promote the CHANGELOG "Unreleased" block to the dated rc.11 section. Release contents (PR OriginTrail#680 — release/rc.11 integration branch): Core-stability hardening (rc.10 deadlock workstream): OriginTrail#655 hard shutdown timeout OriginTrail#657 async-promote queue library OriginTrail#659 auto-update install-source override OriginTrail#669 AbortSignal plumbing through DKGNode.stop() OriginTrail#670 chain provider filter log-spam silencer OriginTrail#666 dkg migrate-to-npm CLI subcommand OriginTrail#668 AutoNAT boot self-probe OriginTrail#661 core relay capability sanity check OriginTrail#662 relay metrics in /api/status OriginTrail#664 supervisor positive-liveness probe ERC-721 mint ordering: OriginTrail#681 CEI mint-last at every mint site (supersedes OriginTrail#663, which proposed _safeMint and was rejected as a public-API break for older Gnosis Safes / DAO timelocks / strategy wrappers). Keeps _mint; reorders so _mint is the last state-changing call. relock moves _burn before _mint. Async-promote queue stack: OriginTrail#660 /promote-async route wiring with worker-readiness gate OriginTrail#665 async-promote worker supervisor OriginTrail#667 async-promote queue config + e2e tests Honest ACK + tentative VM cleanup: OriginTrail#671 delete self-signed ACK fallback + tentative-VM concept OriginTrail#672 typed errors + LU-6 runbook + provenance telemetry Test infra: OriginTrail#673 rc.11 test infrastructure fixes Verification on the integration branch (release/rc.11): pnpm -r build clean pnpm --filter @origintrail-official/dkg test:unit 403/403 PASS evm-module 278/278 PASS (NFT + CG contract tests) devnet-test-rc11-promote-crash-recovery.sh GREEN devnet-test-rc11-shutdown-mid-publish.sh GREEN (549ms shutdown, 0 [shutdown-timeout] lines) devnet-test-rfc38-all.sh 10/11 PASS (lj is the pre-existing documented LU-6 cores-only gap) devnet-test.sh 343/347 PASS — 4 fails tracked in OriginTrail#676 as stale test expectations against OriginTrail#671's seal contract + V10 auto-registration. Co-authored-by: Cursor <cursoragent@cursor.com>
Summary
PR-B of the rc.11 two-PR train. Stacks on top of #671
(
feat/rc11-A-honest-ack-tentative-vm-cleanup); change the base tomainafter #671 lands.This PR covers T3 (typed ACK errors + RPC caching), T4 (LU-6
Phase B verification + runbook + MCP tool), and T5 (ACK-provenance
telemetry).
T3 — Honest publish error reporting + RPC caching
packages/publisher/src/ack-errors.ts(NEW): typedACKProviderErrorhierarchy withRpcPreconditionError— RPC pre-flight died on chainId /KAV10 / minRequiredSignatures.
QuorumUnmetError— dialled N, got M, needed K, withPeerOutcome[]carrying{peerId, dialOk, protocolSupported, swmHostModeAdvertised, reason}so an operator can see which core declined and why.
wrapAsRpcPreconditionIfApplicablehelper for the agent's V10provider wrapper to upgrade legacy throws into the typed shape.
packages/chain/src/evm-adapter.ts: 1h-TTL cache forchainId,KAV10address, andminimumRequiredSignatures. Steady-state cost:zero RPC per publish (previously one round-trip per attempt).
invalidatePublishPreflightCacheexposed on the EVM adapter only;the mock-parity test documents this as EVM-only.
packages/cli/src/daemon/lifecycle.ts+publisher-runner.ts:chainRpcUrlflows through from config. Daemon emits a startupWARN when an operator runs in publishing role against a known
public RPC.
packages/cli/test/lifecycle-public-rpc-warn.test.ts(NEW):pins the WARN for the dzudza-style failure mode.
T4 — LU-6 Phase B verification + CHANGELOG + runbook + MCP tool
CHANGELOG.md: "Deferred (Phase A sub-task)" stub rewritten into"Added — OT-RFC-38 Phase B (cores auto-host curated CGs)"
documenting the four discovery paths actually wired in rc.10:
chain-event auto-subscribe (
dkg-agent.ts:1737-1803), discoverybeacon (
1810-1838), periodic reconciler (9347-9376), andoperator-driven enrolment (
memory.ts:960-977). Genuine remaininggaps (sharding-table-aware selection once
shards > 1; cross-coreciphertext replay) stay in Deferred.
docs/runbooks/RUNBOOK_HOST_MODE_MANUAL_SUBSCRIBE.md(NEW):operator playbook for the manual fourth-path
POST /api/shared-memory/host-mode/subscribe. When / why / how,response-shape reference, anti-patterns, undo procedure.
packages/mcp-dkg/src/{client.ts,tools/setup.ts}: newdkg_request_hostingMCP tool wrapping the manual subscribeendpoint. Operator UX sugar; refuses cleanly when local is already
a CG member (memberMode handler takes precedence) or on non-core
daemons.
packages/mcp-dkg/test/drop-sweep.test.ts: locked count 25 → 26.T5 — ACK-provenance telemetry
packages/core/src/proto/storage-ack.ts:StorageACKMsg.subscriptionSource(wire field 8, strictlyadditive); closed
SUBSCRIPTION_SOURCESenum(
chain-event | beacon | reconciler | manual | member).packages/agent/src/dkg-agent.ts:swmHostModeSubscribedconverted from
Set<string>toMap<string, SubscriptionSource>.The four Phase B paths call
reconcileSwmHostModeSubscription(cgId, source)with theirrespective sources;
wireSwmHostModeHandlerrecords it. New publicgetSwmSubscriptionSource(...candidateIds)resolver handles thecleartext / wire-hash key-shape difference per path. Bound to
StorageACKHandlerConfig.getSubscriptionSourceForCgso everysigned ACK carries provenance.
packages/publisher/src/storage-ack-handler.ts: populates newsubscriptionSourcefield on both signed-ACK paths(encrypted + plaintext).
packages/publisher/src/ack-collector.ts:CollectedACK.subscriptionSource?plumbed through;snapshotPeerOutcomespopulatesswmHostModeAdvertisedfrom theACK source so
QuorumUnmetError.peers[].swmHostModeAdvertisedismeaningful for diagnosis.
packages/publisher/src/dkg-publisher.ts: per-publishACK-provenance summary line
(
V10: Collected N core node ACKs [peer:source, ...]) emittedafter successful ACK collection. Lets an operator answer "why did
this CG get hosted by these cores?" from the daemon log alone
instead of cross-referencing chain poller + beacon + reconciler.
Test plan
pnpm -r buildclean on this branch.pnpm -r buildclean on the integration branch(
release/v10.0.0-rc.11).branch: 11/11 PASS.
pnpm -r test:uniton the integration branch: 317/317 PASS.devnet-test-rc11-promote-crash-recovery.shGREEN +devnet-test-rc11-shutdown-mid-publish.shGREEN.Notes for reviewers
QuorumUnmetErrorshape is strictly additive to thewire format (proto field 8). Pre-rc.11 cores send no
subscriptionSource; their ACKs appear in the publisher'sper-publish provenance line as
?(the honest "we don't know yet"shape until they upgrade).
Adding it to the MockAdapter would create false test confidence;
the mock-parity test pins this asymmetry.
main, please retarget this PR's base fromfeat/rc11-A-honest-ack-tentative-vm-cleanupback tomain.Made with Cursor