fix(rfc39): address Codex review on PR #715 (3 bugs) by branarakic · Pull Request #727 · OriginTrail/dkg

branarakic · 2026-05-27T09:04:05Z

Summary

Follow-up to PRs #715 (LU-11 substrate, merged) and #717 (RFC-39 curated prover + auto-backfill, merged). Addresses all three Codex review bugs filed on #715:

ack-collector.ts:212-217 — V2 transport claim was a lie. Code unconditionally picks PROTOCOL_STORAGE_ACK_V2 for chunked publishes; the doc claimed a V1 fallback that doesn't exist. Replaced with an honest cluster-wide-V2 requirement + a TODO(rc.12.1) marker for a real per-peer capability probe + downgrade. (A rc.11 core can't decode LU-11 chunked gossip envelopes anyway, so a V1 fallback wouldn't help the publisher reach quorum; documented that explicitly.)
ciphertext-chunk-store.ts:73 — (batchId, chunkIndex) collision risk. Two CGs publishing identical V10 KCs share a batchId (it's a plaintext-derived merkleRoot); the wildcard GRAPH ?g lookup we'd been using would return whichever CG's ciphertext happened to be persisted under the same subject URI, corrupting either the V2 ACK verify or the curated random-sampling proof. Fix: canonicalize the CG id used in the per-CG named graph (gossipWireIdFor → curator nameHash) at every persist AND lookup site, then pin SPARQL GRAPH clauses to that specific per-CG graph. Subject URI shape is unchanged; the per-CG named graph now provides the discriminator the subject URI alone cannot. Threaded through 5 sites:
- persist: dkg-agent.ingestSwmCiphertextChunkEnvelope, dkg-agent.fetchCiphertextChunkFromPeer (persist branch)
- lookup: dkg-agent.handleGetCiphertextChunk, storage-ack-handler.loadChunk, random-sampling/ciphertext-chunks-extractor
The prover takes a new canonicalCgIdForChunkStore(cgId): string | null dep wired in dkg-agent via resolveLocalCgIdByOnChainId + gossipWireIdFor. Returns null → extractor falls back to wildcard scanning, preserving pre-fix behaviour for the catching-up case (strictly additive).
dkg-agent.fetchCiphertextChunkFromPeer — dead signWithChainAdapter:false option. The old error message told callers to pass it but the closure still unconditionally called chain.signMessage, so anyone acting on the suggestion would crash. No production caller has ever set the flag — dropped the option and the lie.

Validation

Re-ran scripts/devnet-test-rfc39-comprehensive.sh on a fresh 6-node devnet — all four scenarios PASS in ~155s:

	Scenario	KC	Chunks	Proof submitter	tx
A	Public CG	1	0	node 2	0x957c28b8…
B	Curated, single-chunk	2	1	node 3	0xffcc76a8…
C	Curated, multi-chunk	3	4	node 1	0xb739b5f1…
D	Late-join auto-backfill (LU-11 sync verb)	4	4	node 4 (was offline)	0xf8385ec6…

Scenario D specifically exercises both the new canonical-CG-id scoping AND the V2 protocol path — node 4 was offline during the publish, auto-backfilled 4/4 chunks from sibling cores via PROTOCOL_GET_CIPHERTEXT_CHUNK after restart, and landed a fresh on-chain proof.

Test plan

pnpm run build:runtime:packages — clean
ReadLints across all 7 touched files — no errors
bash scripts/devnet-test-rfc39-comprehensive.sh — all 4 scenarios PASS
Reviewer can re-run the comprehensive script on their own machine; the recipe is in PR release: v10.0.0-rc.12 integration branch #716's reproducibility comment

Scope guard

No behaviour change for any path that doesn't hit a multi-CG identical-KC collision or a rc.11/rc.12 mixed-cluster window. All three fixes are strictly additive (Bug 1 is a doc + TODO change; Bug 2 degrades to legacy wildcard scanning when the canonical resolver returns null; Bug 3 drops a code path no caller ever used).

Made with Cursor

Bug 1 — `ack-collector.ts`: replace misleading "fall back to V1 against legacy peers" doc with an honest cluster-wide V2 requirement. Chunked publishes dispatch unconditionally over `PROTOCOL_STORAGE_ACK_V2` — there is no per-peer downgrade today, and a rc.11 core can't decode LU-11 chunked gossip envelopes anyway, so a V1 fallback wouldn't help. Document the operational reality + file a TODO(rc.12.1) for a real capability probe + downgrade if mixed-cluster windows become a thing. Bug 2 — `ciphertext-chunk-store.ts` / persist + lookup sites: `(batchId, chunkIndex)` is NOT globally unique. Two CGs publishing identical V10 KCs share a batchId (it's a plaintext-derived merkleRoot); the previous wildcard `GRAPH ?g` lookups would have returned the wrong CG's ciphertext under collision. Fix: canonicalize the CG id used in the per-CG named graph (`ciphertextChunkStoreGraph(canonical)`) at every persist AND every lookup site, then pin SPARQL `GRAPH` clauses to the specific per-CG graph. `gossipWireIdFor` (cleartext → curator nameHash) gives both sides a uniform key without changing the subject URI shape. Threaded through: - persist: `dkg-agent.ingestSwmCiphertextChunkEnvelope`, `dkg-agent.fetchCiphertextChunkFromPeer` (persist branch) - lookup: `dkg-agent.handleGetCiphertextChunk` (V10 sync responder), `storage-ack-handler.loadChunk` (V2 ACK verify), `random-sampling.ciphertext-chunks-extractor` (prover path) The prover now takes a `canonicalCgIdForChunkStore(cgId): string | null` dep wired through `random-sampling-bind` to `resolveLocalCgIdByOnChainId` + `gossipWireIdFor` in the agent — null returns degrade to wildcard scanning (pre-fix behaviour) for the catching-up case so the fix is strictly additive. Bug 3 — `dkg-agent.fetchCiphertextChunkFromPeer`: remove dead `signWithChainAdapter:false` option. The old error message told callers to pass it but the closure still unconditionally called `chain.signMessage`, so anyone following the suggestion would have crashed at runtime. No production caller has ever set the flag — drop the option and the lie in favour of a clean "operator key required" contract. Validated end-to-end with `devnet-test-rfc39-comprehensive.sh` on a fresh 6-node devnet: all four scenarios PASS in ~155s, including Scenario D (late-join auto-backfill via the LU-11 sync verb), which exercises both the canonical-CG-id scoping AND the V2 protocol path. Co-authored-by: Cursor <cursoragent@cursor.com>

…-round2 fix(rfc39): address Codex review on PR #727 (3 follow-up bugs)

…ugs) PR OriginTrail#727 fixed three Codex bugs from PR OriginTrail#715 but introduced three new ones the bot flagged on the next round. All three are real, all three shipped on `release/rc.12`, all three are fixed here. Validation: full 4-scenario `devnet-test-rfc39-comprehensive.sh` PASS on a fresh 6-node devnet (~165s wall clock). Bug 4 — `storage-ack-handler.ts` `loadChunk`: the previous shape assumed the normalizer could always derive a canonical wire-form CG id from `swmGraphId`, but `PublishIntent.swmGraphId` is optional on the V2 ACK wire (handler falls back to numeric `cgId` when absent). `gossipWireIdFor('42')` then keccak'd the decimal string instead of resolving the curator nameHash → ACK lookup missed every persisted chunk → declined a valid publish. Our publisher always sets `swmGraphId`, but the wire-schema contract permits omission, so this matters for protocol robustness against third-party publishers. Bug 5 — `dkg-agent.ts` `handleGetCiphertextChunk`: same keccak-decimal-string trap on the LU-11 sync verb responder. A requester legitimately addressing the CG by its numeric on-chain id would get `chunk not found` even when the chunk was present under the real name-hash graph. Narrowed the request contract in a way the public API didn't advertise. Bug 6 — `dkg-agent.ts` `fetchCiphertextChunkFromPeer`: removing `signWithChainAdapter` from the public method signature was a breaking TypeScript API change. Restore as a deprecated no-op documented to be removed in a future intentional major-version break. Fix design: - Added `canonicalChunkStoreCgIdOrNull(rawId: string): string | null` helper in `dkg-agent.ts` that routes through (in order): 1. 0x-prefixed 64-hex → already wire form 2. Tracked in `subscribedContextGraphs` → `gossipWireIdFor` 3. Pure decimal → `resolveLocalCgIdByOnChainId` then wire form, OR null if not locally registered 4. Other cleartext → `gossipWireIdFor` (keccak of cleartext bytes) Rule 3 NEVER falls through to `keccak(decimal-string)` — that's the exact trap. - Changed `StorageACKHandlerConfig.normalizeContextGraphIdForChunkStore` signature from `(string) => string` to `(string) => string | null`. Handler now degrades to wildcard `GRAPH ?g` when normalizer returns null, preserving the pre-fix legacy contract for cases where we can't safely canonicalize. - `handleGetCiphertextChunk` and both persist sites in `dkg-agent.ts` now route through `canonicalChunkStoreCgIdOrNull`, with consistent null-fallback semantics: • Lookups → wildcard `GRAPH ?g` scan • Persists → raw `storageCgId` as the graph key (preserves the original cleartext namespace; lookup-side wildcard will still find it) - `fetchCiphertextChunkFromPeer` `options` keeps `signWithChainAdapter` as a documented no-op, doc-blocked with `@deprecated`. TS callers that still pass it compile cleanly; runtime always uses the chain signer. Multi-CG identical-KC isolation (the PR OriginTrail#715 / OriginTrail#727 goal) holds for every path where canonicalization succeeds; the wildcard fallback only kicks in for the legitimate "unknown / catching-up / unresolvable id" case the Codex bot called out as a missing contract. Co-authored-by: Cursor <cursoragent@cursor.com>

…iginTrail#722 (6 follow-ups) Closes the remaining open Codex findings on the rc.12 integration branch (OriginTrail#716) for the operational scripts and importer-skill docs landed by OriginTrail#721 and OriginTrail#722. None of these are production-code regressions — they are testnet stress-harness bugs and one documentation example — but they each made the tooling silently lossy or non-idempotent in ways that would bite the next operator to run them. All five concerns the bot flagged on OriginTrail#722 plus one on OriginTrail#721 are addressed in this single follow-up so the source PRs can be cleanly marked resolved before OriginTrail#716 merges to main. Bug 1 — `scripts/testnet-publish-stress/preflight.mjs` — wrong API path The daemon-side list endpoint is `/api/context-graph/list` (confirmed in `packages/cli/src/api-client.ts:1242` and the daemon-http behavior tests). The preflight helper was probing `/api/context-graph` instead, which returns nothing, so the "is the CG already present?" check always failed through to the create() path and exited on the duplicate-id 409 instead of being idempotent as the script's docstring promises. One-line path fix, plus the same correction to the error-message string so it reflects what the script actually probed. Bug 2 — `scripts/testnet-publish-stress/rs-scan.mjs` — default checkpoint filename mismatch `publish-loop.mjs` (the producer) derives its checkpoint filename from `STRESS_RUN_ID` (default `26may`). `rs-scan.mjs` (the consumer) was hard-coded to `${homedir()}/.dkg-publish-stress/checkpoints/26may2.json`, so out-of-the-box a fresh pair would load no KC ids and rs-scan would report "none of ours sampled" even when sampling was working. Switched to read `STRESS_RUN_ID` from the same env var the producer uses. Bug 3 — `scripts/testnet-publish-stress/fetch-wikidata-music.mjs` — SPARQL pagination correctness Five paginated CONSTRUCT subqueries (humans, musical-groups, albums, songs, music-genres) were using `LIMIT … OFFSET …` without an `ORDER BY` clause. SPARQL result order is not guaranteed across pages, so two requests with the same offset can return different (or overlapping) subjects on different runs. Added `ORDER BY ?s` to every paginated subquery so the offset windowing is stable. Bug 4 — `scripts/testnet-publish-stress/fetch-wikidata-music.mjs` — resume only tracked the partition counter The script's resume support counted output-file lines (`alreadyWritten`) and used that as the next `partitionIdx`, but `classCursor` and per-class `offsets` always reset to 0 on a fresh run. After an interruption the script would re-query the same SPARQL pages and append duplicate early data under new partition ids, instead of continuing where it left off. Persists `{ offsets, classCursor }` to a sidecar `<OUT_PATH>.state.json` after every fetch and reloads it at startup. The in-flight buffer is intentionally NOT persisted (cheap to rebuild on the next fetch). Bug 5 — `scripts/testnet-publish-stress/publish-loop.mjs` — checkpoint frequency vs. determinism Each partition's anchor URI is deterministic (`stress:partition/<idx>`). The previous code only persisted `lastPublishedIdx` every `checkpointEvery=50` partitions (alongside the expensive wallet-snapshot update). A crash between snapshots replayed already-published partitions on resume, which deterministically hit Rule 4 / root-entity conflicts on the second attempt. Split the cheap partition-bookkeeping save from the expensive wallet-snapshot save: `saveCheckpoint(checkpoint)` now runs after EVERY publish (with `lastPublishedIdx` already updated), and the wallet snapshot stays on the original cadence. Bug 6 — `packages/cli/skills/dkg-importer/SKILL.md` — IRI scheme under-detection in the blank-node rewrite example The reference example only treated subjects starting with `http` or `urn:` as IRIs, which silently misses valid RDF IRIs that use other schemes — `did:`, `ipfs:`, `tag:`, `file:`, plain-IRI imports etc. An importer that copied the example verbatim would leave colliding root entities un-rewritten and keep hitting Rule 4 on subsequent partitions. Replaced the scheme-list check with an RFC 3986 absolute-IRI regex and documented that callers whose parser exposes `term.termType` should prefer that. Companion to: 60e16e2 Merge pull request OriginTrail#727 from OriginTrail/fix/rfc39-codex-review-fixes 5ca94aa Merge pull request OriginTrail#729 from OriginTrail/fix/rfc39-codex-review-fixes-round2 fb3a528 fix(rfc39): address Codex review on PR OriginTrail#727 (3 follow-up bugs) This batch closes the rest of the open Codex findings (scripts + docs) that don't fit cleanly under either of the rfc39-codex-review-fixes rounds, finishing the rc.12 integration's review-resolution sweep. Validation: scripts are runtime-only (no build artefacts). Linted clean. Behavior changes spot-checked against the rc.12 daemon code (`api-client.ts:1242` confirms `/api/context-graph/list`). Made with [Cursor](https://cursor.com) Co-authored-by: Cursor <cursoragent@cursor.com>

PR OriginTrail#716 audit cluster B.3 — `fetchCiphertextChunkFromPeer` (initiator) and `ingestSwmCiphertextChunkEnvelope` (gossip ingester) are the two halves of the late-joining-host backfill contract that PR OriginTrail#715 / OriginTrail#717 / OriginTrail#727 / OriginTrail#729 shipped. The audit flagged both as having ZERO direct adapter-level coverage despite being wired into the random-sampling prover backfill (hot path) and the workspace chunked-publish subscription (security path). New file `packages/agent/test/lu11-chunk-catchup-wiring.test.ts` adds 12 tests across both functions: `fetchCiphertextChunkFromPeer` (7 tests): - Happy path: mints a signed request via chain.signMessage, sends to `/dkg/10.0.2/get-ciphertext-chunk`, decodes the response, persists ciphertext under the CANONICAL CG graph. Asserts the on-wire request decodes (sign step actually ran) and the persist site matches the responder lookup site (write/read address consistency — OriginTrail#729 Bug 5 regression). - persist=false: response returns bytes but local store stays untouched. - Responder denied: ACK is RETURNED (not thrown) so the backfill loop's `if (resp.denied) continue` can fall through. - Transport failure (delivered=false): throws with the messenger error verbatim — backfill loop records as failure. - Missing `chain.signMessage`: throws an honest precondition error, never sends an unsigned request. - Non-32-byte batchId / negative chunkIndex: precise API-boundary errors catch caller bugs. `ingestSwmCiphertextChunkEnvelope` (5 tests): - Happy path: chunked envelope persists ciphertext under the canonical CG graph (same shape the initiator + responder use). - Truncated payload (<= 32 bytes — no room for ciphertext after batchId): drops silently. - WireId mismatch (envelope.contextGraphId vs subscription cgId in wire form differ): drops silently — defense against a peer-sending-chunk-for-different-CG attack. - LU-6 authority declines: drops silently, nothing persists. Without this, any topic-reachable peer could plant arbitrary ciphertext under a victim's `(cgId, batchId)` keys. - Wrong envelope type (V1 `share-write` instead of `share-write-chunked`): drops silently — chunked ingester MUST NOT pick up legacy V1 (would corrupt the chunk store with meaningless chunkIndex=0 entries). Both functions are correct today (verified by inspection); these pins ensure a refactor can't silently break either side of the contract. Co-authored-by: Cursor <cursoragent@cursor.com>

branarakic merged commit 60e16e2 into release/rc.12 May 27, 2026
3 checks passed

branarakic mentioned this pull request May 27, 2026

release: v10.0.0-rc.12 integration branch #716

Merged

12 tasks

github-actions Bot reviewed May 27, 2026

View reviewed changes

Comment thread packages/publisher/src/storage-ack-handler.ts

Comment thread packages/agent/src/dkg-agent.ts

Comment thread packages/agent/src/dkg-agent.ts

branarakic mentioned this pull request May 27, 2026

fix(rfc39): address Codex review on PR #727 (3 follow-up bugs) #729

Merged

3 tasks

branarakic added a commit that referenced this pull request May 27, 2026

Merge pull request #729 from OriginTrail/fix/rfc39-codex-review-fixes…

5ca94aa

…-round2 fix(rfc39): address Codex review on PR #727 (3 follow-up bugs)

This was referenced May 27, 2026

fix(scripts/docs): address Codex review on PR #721 + PR #722 (6 follow-ups) #733

Merged

test(agent): pin LU-11 chunk-catchup initiator + gossip ingester wiring #742

Merged

test(agent): pin LU-11 random-sampling prover backfill orchestration #743

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rfc39): address Codex review on PR #715 (3 bugs)#727

fix(rfc39): address Codex review on PR #715 (3 bugs)#727
branarakic merged 1 commit into
release/rc.12from
fix/rfc39-codex-review-fixes

branarakic commented May 27, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

branarakic commented May 27, 2026

Summary

Validation

Test plan

Scope guard

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant