Skip to content

fix(rfc39): address Codex review on PR #715 (3 bugs)#727

Merged
branarakic merged 1 commit into
release/rc.12from
fix/rfc39-codex-review-fixes
May 27, 2026
Merged

fix(rfc39): address Codex review on PR #715 (3 bugs)#727
branarakic merged 1 commit into
release/rc.12from
fix/rfc39-codex-review-fixes

Conversation

@branarakic
Copy link
Copy Markdown
Contributor

Summary

Follow-up to PRs #715 (LU-11 substrate, merged) and #717 (RFC-39 curated prover + auto-backfill, merged). Addresses all three Codex review bugs filed on #715:

  1. ack-collector.ts:212-217 — V2 transport claim was a lie. Code unconditionally picks PROTOCOL_STORAGE_ACK_V2 for chunked publishes; the doc claimed a V1 fallback that doesn't exist. Replaced with an honest cluster-wide-V2 requirement + a TODO(rc.12.1) marker for a real per-peer capability probe + downgrade. (A rc.11 core can't decode LU-11 chunked gossip envelopes anyway, so a V1 fallback wouldn't help the publisher reach quorum; documented that explicitly.)

  2. ciphertext-chunk-store.ts:73(batchId, chunkIndex) collision risk. Two CGs publishing identical V10 KCs share a batchId (it's a plaintext-derived merkleRoot); the wildcard GRAPH ?g lookup we'd been using would return whichever CG's ciphertext happened to be persisted under the same subject URI, corrupting either the V2 ACK verify or the curated random-sampling proof. Fix: canonicalize the CG id used in the per-CG named graph (gossipWireIdFor → curator nameHash) at every persist AND lookup site, then pin SPARQL GRAPH clauses to that specific per-CG graph. Subject URI shape is unchanged; the per-CG named graph now provides the discriminator the subject URI alone cannot. Threaded through 5 sites:

    • persist: dkg-agent.ingestSwmCiphertextChunkEnvelope, dkg-agent.fetchCiphertextChunkFromPeer (persist branch)
    • lookup: dkg-agent.handleGetCiphertextChunk, storage-ack-handler.loadChunk, random-sampling/ciphertext-chunks-extractor

    The prover takes a new canonicalCgIdForChunkStore(cgId): string | null dep wired in dkg-agent via resolveLocalCgIdByOnChainId + gossipWireIdFor. Returns null → extractor falls back to wildcard scanning, preserving pre-fix behaviour for the catching-up case (strictly additive).

  3. dkg-agent.fetchCiphertextChunkFromPeer — dead signWithChainAdapter:false option. The old error message told callers to pass it but the closure still unconditionally called chain.signMessage, so anyone acting on the suggestion would crash. No production caller has ever set the flag — dropped the option and the lie.

Validation

Re-ran scripts/devnet-test-rfc39-comprehensive.sh on a fresh 6-node devnet — all four scenarios PASS in ~155s:

Scenario KC Chunks Proof submitter tx
A Public CG 1 0 node 2 0x957c28b8…
B Curated, single-chunk 2 1 node 3 0xffcc76a8…
C Curated, multi-chunk 3 4 node 1 0xb739b5f1…
D Late-join auto-backfill (LU-11 sync verb) 4 4 node 4 (was offline) 0xf8385ec6…

Scenario D specifically exercises both the new canonical-CG-id scoping AND the V2 protocol path — node 4 was offline during the publish, auto-backfilled 4/4 chunks from sibling cores via PROTOCOL_GET_CIPHERTEXT_CHUNK after restart, and landed a fresh on-chain proof.

Test plan

  • pnpm run build:runtime:packages — clean
  • ReadLints across all 7 touched files — no errors
  • bash scripts/devnet-test-rfc39-comprehensive.sh — all 4 scenarios PASS
  • Reviewer can re-run the comprehensive script on their own machine; the recipe is in PR release: v10.0.0-rc.12 integration branch #716's reproducibility comment

Scope guard

No behaviour change for any path that doesn't hit a multi-CG identical-KC collision or a rc.11/rc.12 mixed-cluster window. All three fixes are strictly additive (Bug 1 is a doc + TODO change; Bug 2 degrades to legacy wildcard scanning when the canonical resolver returns null; Bug 3 drops a code path no caller ever used).

Made with Cursor

Bug 1 — `ack-collector.ts`: replace misleading "fall back to V1 against
legacy peers" doc with an honest cluster-wide V2 requirement. Chunked
publishes dispatch unconditionally over `PROTOCOL_STORAGE_ACK_V2` —
there is no per-peer downgrade today, and a rc.11 core can't decode
LU-11 chunked gossip envelopes anyway, so a V1 fallback wouldn't help.
Document the operational reality + file a TODO(rc.12.1) for a real
capability probe + downgrade if mixed-cluster windows become a thing.

Bug 2 — `ciphertext-chunk-store.ts` / persist + lookup sites:
`(batchId, chunkIndex)` is NOT globally unique. Two CGs publishing
identical V10 KCs share a batchId (it's a plaintext-derived merkleRoot);
the previous wildcard `GRAPH ?g` lookups would have returned the wrong
CG's ciphertext under collision. Fix: canonicalize the CG id used in
the per-CG named graph (`ciphertextChunkStoreGraph(canonical)`) at every
persist AND every lookup site, then pin SPARQL `GRAPH` clauses to the
specific per-CG graph. `gossipWireIdFor` (cleartext → curator nameHash)
gives both sides a uniform key without changing the subject URI shape.
Threaded through:
  - persist:  `dkg-agent.ingestSwmCiphertextChunkEnvelope`,
              `dkg-agent.fetchCiphertextChunkFromPeer` (persist branch)
  - lookup:   `dkg-agent.handleGetCiphertextChunk` (V10 sync responder),
              `storage-ack-handler.loadChunk` (V2 ACK verify),
              `random-sampling.ciphertext-chunks-extractor` (prover path)
The prover now takes a `canonicalCgIdForChunkStore(cgId): string | null`
dep wired through `random-sampling-bind` to
`resolveLocalCgIdByOnChainId` + `gossipWireIdFor` in the agent — null
returns degrade to wildcard scanning (pre-fix behaviour) for the
catching-up case so the fix is strictly additive.

Bug 3 — `dkg-agent.fetchCiphertextChunkFromPeer`: remove dead
`signWithChainAdapter:false` option. The old error message told callers
to pass it but the closure still unconditionally called `chain.signMessage`,
so anyone following the suggestion would have crashed at runtime. No
production caller has ever set the flag — drop the option and the lie
in favour of a clean "operator key required" contract.

Validated end-to-end with `devnet-test-rfc39-comprehensive.sh` on a
fresh 6-node devnet: all four scenarios PASS in ~155s, including
Scenario D (late-join auto-backfill via the LU-11 sync verb), which
exercises both the canonical-CG-id scoping AND the V2 protocol path.

Co-authored-by: Cursor <cursoragent@cursor.com>
@branarakic branarakic merged commit 60e16e2 into release/rc.12 May 27, 2026
3 checks passed
Comment thread packages/publisher/src/storage-ack-handler.ts
Comment thread packages/agent/src/dkg-agent.ts
Comment thread packages/agent/src/dkg-agent.ts
branarakic added a commit that referenced this pull request May 27, 2026
…-round2

fix(rfc39): address Codex review on PR #727 (3 follow-up bugs)
matic031 pushed a commit to KilianTrunk/dkg that referenced this pull request Jun 2, 2026
…ugs)

PR OriginTrail#727 fixed three Codex bugs from PR OriginTrail#715 but introduced three new
ones the bot flagged on the next round. All three are real, all three
shipped on `release/rc.12`, all three are fixed here. Validation: full
4-scenario `devnet-test-rfc39-comprehensive.sh` PASS on a fresh 6-node
devnet (~165s wall clock).

Bug 4 — `storage-ack-handler.ts` `loadChunk`: the previous shape
assumed the normalizer could always derive a canonical wire-form CG
id from `swmGraphId`, but `PublishIntent.swmGraphId` is optional on
the V2 ACK wire (handler falls back to numeric `cgId` when absent).
`gossipWireIdFor('42')` then keccak'd the decimal string instead of
resolving the curator nameHash → ACK lookup missed every persisted
chunk → declined a valid publish. Our publisher always sets
`swmGraphId`, but the wire-schema contract permits omission, so this
matters for protocol robustness against third-party publishers.

Bug 5 — `dkg-agent.ts` `handleGetCiphertextChunk`: same
keccak-decimal-string trap on the LU-11 sync verb responder. A
requester legitimately addressing the CG by its numeric on-chain id
would get `chunk not found` even when the chunk was present under the
real name-hash graph. Narrowed the request contract in a way the
public API didn't advertise.

Bug 6 — `dkg-agent.ts` `fetchCiphertextChunkFromPeer`: removing
`signWithChainAdapter` from the public method signature was a
breaking TypeScript API change. Restore as a deprecated no-op
documented to be removed in a future intentional major-version break.

Fix design:

  - Added `canonicalChunkStoreCgIdOrNull(rawId: string): string | null`
    helper in `dkg-agent.ts` that routes through (in order):
      1. 0x-prefixed 64-hex → already wire form
      2. Tracked in `subscribedContextGraphs` → `gossipWireIdFor`
      3. Pure decimal → `resolveLocalCgIdByOnChainId` then wire form,
         OR null if not locally registered
      4. Other cleartext → `gossipWireIdFor` (keccak of cleartext bytes)
    Rule 3 NEVER falls through to `keccak(decimal-string)` — that's
    the exact trap.

  - Changed `StorageACKHandlerConfig.normalizeContextGraphIdForChunkStore`
    signature from `(string) => string` to `(string) => string | null`.
    Handler now degrades to wildcard `GRAPH ?g` when normalizer
    returns null, preserving the pre-fix legacy contract for cases
    where we can't safely canonicalize.

  - `handleGetCiphertextChunk` and both persist sites in `dkg-agent.ts`
    now route through `canonicalChunkStoreCgIdOrNull`, with consistent
    null-fallback semantics:
      • Lookups → wildcard `GRAPH ?g` scan
      • Persists → raw `storageCgId` as the graph key (preserves the
        original cleartext namespace; lookup-side wildcard will still
        find it)

  - `fetchCiphertextChunkFromPeer` `options` keeps `signWithChainAdapter`
    as a documented no-op, doc-blocked with `@deprecated`. TS callers
    that still pass it compile cleanly; runtime always uses the chain
    signer.

Multi-CG identical-KC isolation (the PR OriginTrail#715 / OriginTrail#727 goal) holds for
every path where canonicalization succeeds; the wildcard fallback only
kicks in for the legitimate "unknown / catching-up / unresolvable id"
case the Codex bot called out as a missing contract.

Co-authored-by: Cursor <cursoragent@cursor.com>
matic031 pushed a commit to KilianTrunk/dkg that referenced this pull request Jun 2, 2026
…iginTrail#722 (6 follow-ups)

Closes the remaining open Codex findings on the rc.12 integration branch
(OriginTrail#716) for the operational scripts and importer-skill docs landed by OriginTrail#721
and OriginTrail#722. None of these are production-code regressions — they are
testnet stress-harness bugs and one documentation example — but they each
made the tooling silently lossy or non-idempotent in ways that would bite
the next operator to run them. All five concerns the bot flagged on OriginTrail#722
plus one on OriginTrail#721 are addressed in this single follow-up so the source PRs
can be cleanly marked resolved before OriginTrail#716 merges to main.

Bug 1 — `scripts/testnet-publish-stress/preflight.mjs` — wrong API path
The daemon-side list endpoint is `/api/context-graph/list` (confirmed in
`packages/cli/src/api-client.ts:1242` and the daemon-http behavior tests).
The preflight helper was probing `/api/context-graph` instead, which
returns nothing, so the "is the CG already present?" check always failed
through to the create() path and exited on the duplicate-id 409 instead
of being idempotent as the script's docstring promises. One-line path
fix, plus the same correction to the error-message string so it reflects
what the script actually probed.

Bug 2 — `scripts/testnet-publish-stress/rs-scan.mjs` — default checkpoint
filename mismatch
`publish-loop.mjs` (the producer) derives its checkpoint filename from
`STRESS_RUN_ID` (default `26may`). `rs-scan.mjs` (the consumer) was
hard-coded to `${homedir()}/.dkg-publish-stress/checkpoints/26may2.json`,
so out-of-the-box a fresh pair would load no KC ids and rs-scan would
report "none of ours sampled" even when sampling was working. Switched
to read `STRESS_RUN_ID` from the same env var the producer uses.

Bug 3 — `scripts/testnet-publish-stress/fetch-wikidata-music.mjs` —
SPARQL pagination correctness
Five paginated CONSTRUCT subqueries (humans, musical-groups, albums,
songs, music-genres) were using `LIMIT … OFFSET …` without an
`ORDER BY` clause. SPARQL result order is not guaranteed across pages,
so two requests with the same offset can return different (or
overlapping) subjects on different runs. Added `ORDER BY ?s` to every
paginated subquery so the offset windowing is stable.

Bug 4 — `scripts/testnet-publish-stress/fetch-wikidata-music.mjs` —
resume only tracked the partition counter
The script's resume support counted output-file lines (`alreadyWritten`)
and used that as the next `partitionIdx`, but `classCursor` and per-class
`offsets` always reset to 0 on a fresh run. After an interruption the
script would re-query the same SPARQL pages and append duplicate early
data under new partition ids, instead of continuing where it left off.
Persists `{ offsets, classCursor }` to a sidecar `<OUT_PATH>.state.json`
after every fetch and reloads it at startup. The in-flight buffer is
intentionally NOT persisted (cheap to rebuild on the next fetch).

Bug 5 — `scripts/testnet-publish-stress/publish-loop.mjs` — checkpoint
frequency vs. determinism
Each partition's anchor URI is deterministic (`stress:partition/<idx>`).
The previous code only persisted `lastPublishedIdx` every
`checkpointEvery=50` partitions (alongside the expensive wallet-snapshot
update). A crash between snapshots replayed already-published partitions
on resume, which deterministically hit Rule 4 / root-entity conflicts
on the second attempt. Split the cheap partition-bookkeeping save from
the expensive wallet-snapshot save: `saveCheckpoint(checkpoint)` now
runs after EVERY publish (with `lastPublishedIdx` already updated), and
the wallet snapshot stays on the original cadence.

Bug 6 — `packages/cli/skills/dkg-importer/SKILL.md` — IRI scheme
under-detection in the blank-node rewrite example
The reference example only treated subjects starting with `http` or
`urn:` as IRIs, which silently misses valid RDF IRIs that use other
schemes — `did:`, `ipfs:`, `tag:`, `file:`, plain-IRI imports etc. An
importer that copied the example verbatim would leave colliding root
entities un-rewritten and keep hitting Rule 4 on subsequent partitions.
Replaced the scheme-list check with an RFC 3986 absolute-IRI regex and
documented that callers whose parser exposes `term.termType` should
prefer that.

Companion to:
  60e16e2 Merge pull request OriginTrail#727 from OriginTrail/fix/rfc39-codex-review-fixes
  5ca94aa Merge pull request OriginTrail#729 from OriginTrail/fix/rfc39-codex-review-fixes-round2
  fb3a528 fix(rfc39): address Codex review on PR OriginTrail#727 (3 follow-up bugs)

This batch closes the rest of the open Codex findings (scripts + docs)
that don't fit cleanly under either of the rfc39-codex-review-fixes
rounds, finishing the rc.12 integration's review-resolution sweep.

Validation: scripts are runtime-only (no build artefacts). Linted clean.
Behavior changes spot-checked against the rc.12 daemon code (`api-client.ts:1242` confirms `/api/context-graph/list`).

Made with [Cursor](https://cursor.com)

Co-authored-by: Cursor <cursoragent@cursor.com>
matic031 pushed a commit to KilianTrunk/dkg that referenced this pull request Jun 2, 2026
PR OriginTrail#716 audit cluster B.3 — `fetchCiphertextChunkFromPeer`
(initiator) and `ingestSwmCiphertextChunkEnvelope` (gossip
ingester) are the two halves of the late-joining-host backfill
contract that PR OriginTrail#715 / OriginTrail#717 / OriginTrail#727 / OriginTrail#729 shipped. The audit
flagged both as having ZERO direct adapter-level coverage despite
being wired into the random-sampling prover backfill (hot path)
and the workspace chunked-publish subscription (security path).

New file `packages/agent/test/lu11-chunk-catchup-wiring.test.ts`
adds 12 tests across both functions:

`fetchCiphertextChunkFromPeer` (7 tests):
  - Happy path: mints a signed request via chain.signMessage,
    sends to `/dkg/10.0.2/get-ciphertext-chunk`, decodes the
    response, persists ciphertext under the CANONICAL CG graph.
    Asserts the on-wire request decodes (sign step actually ran)
    and the persist site matches the responder lookup site
    (write/read address consistency — OriginTrail#729 Bug 5 regression).
  - persist=false: response returns bytes but local store stays
    untouched.
  - Responder denied: ACK is RETURNED (not thrown) so the
    backfill loop's `if (resp.denied) continue` can fall through.
  - Transport failure (delivered=false): throws with the
    messenger error verbatim — backfill loop records as failure.
  - Missing `chain.signMessage`: throws an honest precondition
    error, never sends an unsigned request.
  - Non-32-byte batchId / negative chunkIndex: precise API-boundary
    errors catch caller bugs.

`ingestSwmCiphertextChunkEnvelope` (5 tests):
  - Happy path: chunked envelope persists ciphertext under the
    canonical CG graph (same shape the initiator + responder use).
  - Truncated payload (<= 32 bytes — no room for ciphertext after
    batchId): drops silently.
  - WireId mismatch (envelope.contextGraphId vs subscription cgId
    in wire form differ): drops silently — defense against a
    peer-sending-chunk-for-different-CG attack.
  - LU-6 authority declines: drops silently, nothing persists.
    Without this, any topic-reachable peer could plant arbitrary
    ciphertext under a victim's `(cgId, batchId)` keys.
  - Wrong envelope type (V1 `share-write` instead of
    `share-write-chunked`): drops silently — chunked ingester MUST
    NOT pick up legacy V1 (would corrupt the chunk store with
    meaningless chunkIndex=0 entries).

Both functions are correct today (verified by inspection); these
pins ensure a refactor can't silently break either side of the
contract.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant