fix(rfc39): address Codex review on PR #715 (3 bugs)#727
Merged
Conversation
Bug 1 — `ack-collector.ts`: replace misleading "fall back to V1 against
legacy peers" doc with an honest cluster-wide V2 requirement. Chunked
publishes dispatch unconditionally over `PROTOCOL_STORAGE_ACK_V2` —
there is no per-peer downgrade today, and a rc.11 core can't decode
LU-11 chunked gossip envelopes anyway, so a V1 fallback wouldn't help.
Document the operational reality + file a TODO(rc.12.1) for a real
capability probe + downgrade if mixed-cluster windows become a thing.
Bug 2 — `ciphertext-chunk-store.ts` / persist + lookup sites:
`(batchId, chunkIndex)` is NOT globally unique. Two CGs publishing
identical V10 KCs share a batchId (it's a plaintext-derived merkleRoot);
the previous wildcard `GRAPH ?g` lookups would have returned the wrong
CG's ciphertext under collision. Fix: canonicalize the CG id used in
the per-CG named graph (`ciphertextChunkStoreGraph(canonical)`) at every
persist AND every lookup site, then pin SPARQL `GRAPH` clauses to the
specific per-CG graph. `gossipWireIdFor` (cleartext → curator nameHash)
gives both sides a uniform key without changing the subject URI shape.
Threaded through:
- persist: `dkg-agent.ingestSwmCiphertextChunkEnvelope`,
`dkg-agent.fetchCiphertextChunkFromPeer` (persist branch)
- lookup: `dkg-agent.handleGetCiphertextChunk` (V10 sync responder),
`storage-ack-handler.loadChunk` (V2 ACK verify),
`random-sampling.ciphertext-chunks-extractor` (prover path)
The prover now takes a `canonicalCgIdForChunkStore(cgId): string | null`
dep wired through `random-sampling-bind` to
`resolveLocalCgIdByOnChainId` + `gossipWireIdFor` in the agent — null
returns degrade to wildcard scanning (pre-fix behaviour) for the
catching-up case so the fix is strictly additive.
Bug 3 — `dkg-agent.fetchCiphertextChunkFromPeer`: remove dead
`signWithChainAdapter:false` option. The old error message told callers
to pass it but the closure still unconditionally called `chain.signMessage`,
so anyone following the suggestion would have crashed at runtime. No
production caller has ever set the flag — drop the option and the lie
in favour of a clean "operator key required" contract.
Validated end-to-end with `devnet-test-rfc39-comprehensive.sh` on a
fresh 6-node devnet: all four scenarios PASS in ~155s, including
Scenario D (late-join auto-backfill via the LU-11 sync verb), which
exercises both the canonical-CG-id scoping AND the V2 protocol path.
Co-authored-by: Cursor <cursoragent@cursor.com>
12 tasks
3 tasks
branarakic
added a commit
that referenced
this pull request
May 27, 2026
…-round2 fix(rfc39): address Codex review on PR #727 (3 follow-up bugs)
matic031
pushed a commit
to KilianTrunk/dkg
that referenced
this pull request
Jun 2, 2026
…ugs) PR OriginTrail#727 fixed three Codex bugs from PR OriginTrail#715 but introduced three new ones the bot flagged on the next round. All three are real, all three shipped on `release/rc.12`, all three are fixed here. Validation: full 4-scenario `devnet-test-rfc39-comprehensive.sh` PASS on a fresh 6-node devnet (~165s wall clock). Bug 4 — `storage-ack-handler.ts` `loadChunk`: the previous shape assumed the normalizer could always derive a canonical wire-form CG id from `swmGraphId`, but `PublishIntent.swmGraphId` is optional on the V2 ACK wire (handler falls back to numeric `cgId` when absent). `gossipWireIdFor('42')` then keccak'd the decimal string instead of resolving the curator nameHash → ACK lookup missed every persisted chunk → declined a valid publish. Our publisher always sets `swmGraphId`, but the wire-schema contract permits omission, so this matters for protocol robustness against third-party publishers. Bug 5 — `dkg-agent.ts` `handleGetCiphertextChunk`: same keccak-decimal-string trap on the LU-11 sync verb responder. A requester legitimately addressing the CG by its numeric on-chain id would get `chunk not found` even when the chunk was present under the real name-hash graph. Narrowed the request contract in a way the public API didn't advertise. Bug 6 — `dkg-agent.ts` `fetchCiphertextChunkFromPeer`: removing `signWithChainAdapter` from the public method signature was a breaking TypeScript API change. Restore as a deprecated no-op documented to be removed in a future intentional major-version break. Fix design: - Added `canonicalChunkStoreCgIdOrNull(rawId: string): string | null` helper in `dkg-agent.ts` that routes through (in order): 1. 0x-prefixed 64-hex → already wire form 2. Tracked in `subscribedContextGraphs` → `gossipWireIdFor` 3. Pure decimal → `resolveLocalCgIdByOnChainId` then wire form, OR null if not locally registered 4. Other cleartext → `gossipWireIdFor` (keccak of cleartext bytes) Rule 3 NEVER falls through to `keccak(decimal-string)` — that's the exact trap. - Changed `StorageACKHandlerConfig.normalizeContextGraphIdForChunkStore` signature from `(string) => string` to `(string) => string | null`. Handler now degrades to wildcard `GRAPH ?g` when normalizer returns null, preserving the pre-fix legacy contract for cases where we can't safely canonicalize. - `handleGetCiphertextChunk` and both persist sites in `dkg-agent.ts` now route through `canonicalChunkStoreCgIdOrNull`, with consistent null-fallback semantics: • Lookups → wildcard `GRAPH ?g` scan • Persists → raw `storageCgId` as the graph key (preserves the original cleartext namespace; lookup-side wildcard will still find it) - `fetchCiphertextChunkFromPeer` `options` keeps `signWithChainAdapter` as a documented no-op, doc-blocked with `@deprecated`. TS callers that still pass it compile cleanly; runtime always uses the chain signer. Multi-CG identical-KC isolation (the PR OriginTrail#715 / OriginTrail#727 goal) holds for every path where canonicalization succeeds; the wildcard fallback only kicks in for the legitimate "unknown / catching-up / unresolvable id" case the Codex bot called out as a missing contract. Co-authored-by: Cursor <cursoragent@cursor.com>
matic031
pushed a commit
to KilianTrunk/dkg
that referenced
this pull request
Jun 2, 2026
…iginTrail#722 (6 follow-ups) Closes the remaining open Codex findings on the rc.12 integration branch (OriginTrail#716) for the operational scripts and importer-skill docs landed by OriginTrail#721 and OriginTrail#722. None of these are production-code regressions — they are testnet stress-harness bugs and one documentation example — but they each made the tooling silently lossy or non-idempotent in ways that would bite the next operator to run them. All five concerns the bot flagged on OriginTrail#722 plus one on OriginTrail#721 are addressed in this single follow-up so the source PRs can be cleanly marked resolved before OriginTrail#716 merges to main. Bug 1 — `scripts/testnet-publish-stress/preflight.mjs` — wrong API path The daemon-side list endpoint is `/api/context-graph/list` (confirmed in `packages/cli/src/api-client.ts:1242` and the daemon-http behavior tests). The preflight helper was probing `/api/context-graph` instead, which returns nothing, so the "is the CG already present?" check always failed through to the create() path and exited on the duplicate-id 409 instead of being idempotent as the script's docstring promises. One-line path fix, plus the same correction to the error-message string so it reflects what the script actually probed. Bug 2 — `scripts/testnet-publish-stress/rs-scan.mjs` — default checkpoint filename mismatch `publish-loop.mjs` (the producer) derives its checkpoint filename from `STRESS_RUN_ID` (default `26may`). `rs-scan.mjs` (the consumer) was hard-coded to `${homedir()}/.dkg-publish-stress/checkpoints/26may2.json`, so out-of-the-box a fresh pair would load no KC ids and rs-scan would report "none of ours sampled" even when sampling was working. Switched to read `STRESS_RUN_ID` from the same env var the producer uses. Bug 3 — `scripts/testnet-publish-stress/fetch-wikidata-music.mjs` — SPARQL pagination correctness Five paginated CONSTRUCT subqueries (humans, musical-groups, albums, songs, music-genres) were using `LIMIT … OFFSET …` without an `ORDER BY` clause. SPARQL result order is not guaranteed across pages, so two requests with the same offset can return different (or overlapping) subjects on different runs. Added `ORDER BY ?s` to every paginated subquery so the offset windowing is stable. Bug 4 — `scripts/testnet-publish-stress/fetch-wikidata-music.mjs` — resume only tracked the partition counter The script's resume support counted output-file lines (`alreadyWritten`) and used that as the next `partitionIdx`, but `classCursor` and per-class `offsets` always reset to 0 on a fresh run. After an interruption the script would re-query the same SPARQL pages and append duplicate early data under new partition ids, instead of continuing where it left off. Persists `{ offsets, classCursor }` to a sidecar `<OUT_PATH>.state.json` after every fetch and reloads it at startup. The in-flight buffer is intentionally NOT persisted (cheap to rebuild on the next fetch). Bug 5 — `scripts/testnet-publish-stress/publish-loop.mjs` — checkpoint frequency vs. determinism Each partition's anchor URI is deterministic (`stress:partition/<idx>`). The previous code only persisted `lastPublishedIdx` every `checkpointEvery=50` partitions (alongside the expensive wallet-snapshot update). A crash between snapshots replayed already-published partitions on resume, which deterministically hit Rule 4 / root-entity conflicts on the second attempt. Split the cheap partition-bookkeeping save from the expensive wallet-snapshot save: `saveCheckpoint(checkpoint)` now runs after EVERY publish (with `lastPublishedIdx` already updated), and the wallet snapshot stays on the original cadence. Bug 6 — `packages/cli/skills/dkg-importer/SKILL.md` — IRI scheme under-detection in the blank-node rewrite example The reference example only treated subjects starting with `http` or `urn:` as IRIs, which silently misses valid RDF IRIs that use other schemes — `did:`, `ipfs:`, `tag:`, `file:`, plain-IRI imports etc. An importer that copied the example verbatim would leave colliding root entities un-rewritten and keep hitting Rule 4 on subsequent partitions. Replaced the scheme-list check with an RFC 3986 absolute-IRI regex and documented that callers whose parser exposes `term.termType` should prefer that. Companion to: 60e16e2 Merge pull request OriginTrail#727 from OriginTrail/fix/rfc39-codex-review-fixes 5ca94aa Merge pull request OriginTrail#729 from OriginTrail/fix/rfc39-codex-review-fixes-round2 fb3a528 fix(rfc39): address Codex review on PR OriginTrail#727 (3 follow-up bugs) This batch closes the rest of the open Codex findings (scripts + docs) that don't fit cleanly under either of the rfc39-codex-review-fixes rounds, finishing the rc.12 integration's review-resolution sweep. Validation: scripts are runtime-only (no build artefacts). Linted clean. Behavior changes spot-checked against the rc.12 daemon code (`api-client.ts:1242` confirms `/api/context-graph/list`). Made with [Cursor](https://cursor.com) Co-authored-by: Cursor <cursoragent@cursor.com>
matic031
pushed a commit
to KilianTrunk/dkg
that referenced
this pull request
Jun 2, 2026
PR OriginTrail#716 audit cluster B.3 — `fetchCiphertextChunkFromPeer` (initiator) and `ingestSwmCiphertextChunkEnvelope` (gossip ingester) are the two halves of the late-joining-host backfill contract that PR OriginTrail#715 / OriginTrail#717 / OriginTrail#727 / OriginTrail#729 shipped. The audit flagged both as having ZERO direct adapter-level coverage despite being wired into the random-sampling prover backfill (hot path) and the workspace chunked-publish subscription (security path). New file `packages/agent/test/lu11-chunk-catchup-wiring.test.ts` adds 12 tests across both functions: `fetchCiphertextChunkFromPeer` (7 tests): - Happy path: mints a signed request via chain.signMessage, sends to `/dkg/10.0.2/get-ciphertext-chunk`, decodes the response, persists ciphertext under the CANONICAL CG graph. Asserts the on-wire request decodes (sign step actually ran) and the persist site matches the responder lookup site (write/read address consistency — OriginTrail#729 Bug 5 regression). - persist=false: response returns bytes but local store stays untouched. - Responder denied: ACK is RETURNED (not thrown) so the backfill loop's `if (resp.denied) continue` can fall through. - Transport failure (delivered=false): throws with the messenger error verbatim — backfill loop records as failure. - Missing `chain.signMessage`: throws an honest precondition error, never sends an unsigned request. - Non-32-byte batchId / negative chunkIndex: precise API-boundary errors catch caller bugs. `ingestSwmCiphertextChunkEnvelope` (5 tests): - Happy path: chunked envelope persists ciphertext under the canonical CG graph (same shape the initiator + responder use). - Truncated payload (<= 32 bytes — no room for ciphertext after batchId): drops silently. - WireId mismatch (envelope.contextGraphId vs subscription cgId in wire form differ): drops silently — defense against a peer-sending-chunk-for-different-CG attack. - LU-6 authority declines: drops silently, nothing persists. Without this, any topic-reachable peer could plant arbitrary ciphertext under a victim's `(cgId, batchId)` keys. - Wrong envelope type (V1 `share-write` instead of `share-write-chunked`): drops silently — chunked ingester MUST NOT pick up legacy V1 (would corrupt the chunk store with meaningless chunkIndex=0 entries). Both functions are correct today (verified by inspection); these pins ensure a refactor can't silently break either side of the contract. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to PRs #715 (LU-11 substrate, merged) and #717 (RFC-39 curated prover + auto-backfill, merged). Addresses all three Codex review bugs filed on #715:
ack-collector.ts:212-217— V2 transport claim was a lie. Code unconditionally picksPROTOCOL_STORAGE_ACK_V2for chunked publishes; the doc claimed a V1 fallback that doesn't exist. Replaced with an honest cluster-wide-V2 requirement + a TODO(rc.12.1) marker for a real per-peer capability probe + downgrade. (A rc.11 core can't decode LU-11 chunked gossip envelopes anyway, so a V1 fallback wouldn't help the publisher reach quorum; documented that explicitly.)ciphertext-chunk-store.ts:73—(batchId, chunkIndex)collision risk. Two CGs publishing identical V10 KCs share abatchId(it's a plaintext-derived merkleRoot); the wildcardGRAPH ?glookup we'd been using would return whichever CG's ciphertext happened to be persisted under the same subject URI, corrupting either the V2 ACK verify or the curated random-sampling proof. Fix: canonicalize the CG id used in the per-CG named graph (gossipWireIdFor→ curator nameHash) at every persist AND lookup site, then pin SPARQLGRAPHclauses to that specific per-CG graph. Subject URI shape is unchanged; the per-CG named graph now provides the discriminator the subject URI alone cannot. Threaded through 5 sites:dkg-agent.ingestSwmCiphertextChunkEnvelope,dkg-agent.fetchCiphertextChunkFromPeer(persist branch)dkg-agent.handleGetCiphertextChunk,storage-ack-handler.loadChunk,random-sampling/ciphertext-chunks-extractorThe prover takes a new
canonicalCgIdForChunkStore(cgId): string | nulldep wired indkg-agentviaresolveLocalCgIdByOnChainId+gossipWireIdFor. Returns null → extractor falls back to wildcard scanning, preserving pre-fix behaviour for the catching-up case (strictly additive).dkg-agent.fetchCiphertextChunkFromPeer— deadsignWithChainAdapter:falseoption. The old error message told callers to pass it but the closure still unconditionally calledchain.signMessage, so anyone acting on the suggestion would crash. No production caller has ever set the flag — dropped the option and the lie.Validation
Re-ran
scripts/devnet-test-rfc39-comprehensive.shon a fresh 6-node devnet — all four scenarios PASS in ~155s:Scenario D specifically exercises both the new canonical-CG-id scoping AND the V2 protocol path — node 4 was offline during the publish, auto-backfilled 4/4 chunks from sibling cores via
PROTOCOL_GET_CIPHERTEXT_CHUNKafter restart, and landed a fresh on-chain proof.Test plan
pnpm run build:runtime:packages— cleanbash scripts/devnet-test-rfc39-comprehensive.sh— all 4 scenarios PASSScope guard
No behaviour change for any path that doesn't hit a multi-CG identical-KC collision or a rc.11/rc.12 mixed-cluster window. All three fixes are strictly additive (Bug 1 is a doc + TODO change; Bug 2 degrades to legacy wildcard scanning when the canonical resolver returns null; Bug 3 drops a code path no caller ever used).
Made with Cursor