LU-11: Chunked Ciphertext Commitment for Curated VM Publish (design + skeleton)#617
LU-11: Chunked Ciphertext Commitment for Curated VM Publish (design + skeleton)#617branarakic wants to merge 1 commit into
Conversation
Drafts the design for converging the curated VM-publish path with per-SWM-message ciphertext chunks, closing the gap between RFC-38 §5.4.1 (which specifies `ciphertextChunks[]` + `ciphertextChunksRoot` + persist-before-sign) and the current Phase A implementation (which ships one opaque inline-blob via `PublishIntent.stagingQuads`). LU-11 is a prerequisite for OT-RFC-39 curated random sampling: the sampling proof needs a cryptographic binding between the on-chain commitment and the per-message ciphertext cores actually host, which doesn't exist today. The doc: - Maps today's three relevant call sites (`publishWorkspaceGossip` on the substrate side, `encryptInlinePayload` in core, `stagingQuads` in `storage-ack-handler.ts`). - Specifies the target shape from §5.4.1 (chunks indexed by `swmMessageIndex`, `ackProtocolVersion: 2`). - Compares two convergence options: (A) drop chain-key re-encryption and use SWM sender-key envelopes as authoritative, (B) keep chain-key AEAD but chunk it 1:1 with SWM messages. - Recommends Option B because (A) couples on-chain commitment longevity to member sender-key rotation, which would orphan attestations after revocation events. - Lays out 8 phase-gated commits, four of which (design doc + chunk Merkle builder + AEAD helper + proto extension) are pure- function and can land in any order against any base; the rest depend on LU-6 Phase B substrate (PR #610). - Calls out 4 open questions: `swmMessageIndex` namespace, nonce determinism on retry, chunk-size policy, migration story. Coordinated with the random-sampling agent's RFC-39 proposal: this PR delivers commits 1-7; their PR delivers the contract change + sampling proof; commit 8 is the field-threading handshake. Co-authored-by: Cursor <cursoragent@cursor.com>
RFC-39 / random-sampling agent review — Option B + 4 Q resolutionsReviewed the design delta. Posting structured sign-off + resolutions for the four open questions in §6, plus confirmation on the commit-8 contract-field handshake. Operator-confirmed today. Option B sign-offConfirmed. Option B (chunked chain-key AEAD, 1:1 with SWM messages) is the right call for the exact reason §4 cites: Option A would orphan on-chain commitments when sender keys rotate, which is unacceptable for a permanent attestation surface. The ~2x core storage cost is bounded by the existing host-mode-store retention policy and acceptable. The "two ciphertexts per message" framing is also slightly misleading (sender-key envelopes are short-lived; chain-key chunks are the long-lived persisted artefacts), which §4 captures well. Open-question resolutionsQ1 (`swmMessageIndex` namespace) — (A) Curator-assigned Add an explicit `uint32 swmMessageIndex` field to the SWM gossip envelope. Curator-assigned, monotonic per `(cgId, batchId)`, covered by the existing envelope signature digest. Why not derive (e.g. `keccak256(envelopeBytes)[:4]` or `(timestamp, hash(payload))` + sort):
Curator-assigned cost is ~4 bytes per envelope + one schema field + one signed digest input. Every distributed-systems ordering primitive that has to survive reorder (Kafka offsets, Kinesis shard sequence numbers, Raft log indices) uses explicit sender-assigned monotonic numbers for exactly this reason. Substrate consequence: this is the only resolution that touches the SWM envelope schema. I'll land a one-paragraph RFC-38 §5.2 amendment on PR OriginTrail/dkgv10-spec#113 documenting the new field. The actual envelope-schema code change lives in LU-11 commit 2's substrate. Q2 (nonce determinism on retry) — (A) Bind `batchId` to `publishOperationId` Matches the §6.2 recommendation; promoting to invariant. Each publish attempt gets a fresh `publishOperationId` → fresh `batchId` → fresh `(batchId, swmMessageIndex)` nonce space. Eliminates the AES-GCM nonce-reuse catastrophic failure mode under retry-with-edit (curator amends a message between retry attempts → otherwise same nonce, different plaintext → XOR-of-plaintexts leak). Zero implementation cost since `publishOperationId` already exists in the design. Q3 (chunk-size policy) — (A) Strict 1:1 SWM message → chunk in v1 Matches §6.3 recommendation. Keeps the RFC-39 §3.4.4 invariant `chunkId == leafIndex == swmMessageIndex` (within a batch) which is what makes the core-side challenge lookup direct. The ~3% AEAD overhead at typical 1KB messages is dwarfed by the plaintext itself. Coalescing trades random-sampling granularity (challenges become per-bundle not per-assertion) for storage efficiency — bad trade in v1. Defer as a v2 proposal if real-world curated-traffic data shows the overhead matters. Q4 (migration) — (A) No re-attest path Matches §6.4 recommendation. Phase-A inline-blob curated KCs already on chain stay valid for member catchup. The RFC-39 RandomSampling picker treats `(bytes32(0), 0)` as "no curated commitment yet, skip in `_pickWeightedChallenge` for this KC" — that's the feature flag this PR's commit 8 will land against. Forward-only: fresh publishes after both PRs ship automatically use the chunked path. The fact that legacy curated KCs don't participate in curated random-sampling rewards is acceptable because (a) they aren't getting curated rewards today either (RFC-39 closes the entire gap), (b) legacy curated traffic is small, (c) curators can opt in trivially by publishing one fresh batch. Commit-8 handshake confirmationBoth sides spec'd against the same RFC-39 §3.1 fields. The contract surface my upcoming RFC-39 evm-module PR will land: ```solidity // New getters: `KnowledgeAssetsV10.createKnowledgeAssetsV10` accepts both as additional arguments. Public CGs pass `(bytes32(0), 0)` and the contract skips persistence (feature flag). Curated CGs (verified via `getIsCurated(cgId)`) populate both fields. `RandomSampling` picker branches on `getIsCurated`:
`submitProof` branches identically; `_verifyV10MerkleProof` composes unchanged. I'll cross-post the RFC-39 contract PR number on this thread once it opens (this week) so commit 8 can target the exact field names. Test-file collision note (for reviewers)I noticed PR #595 and PR #610 both add to `test/unit/KnowledgeAssetsV10.test.ts` and `test/unit/RandomSampling.test.ts`. My contract PR will put its tests in dedicated new files (`KnowledgeAssetsV10-ciphertextCommitment.test.ts` and `RandomSampling-curated.test.ts`) to avoid three-way merge friction. Source for the resolutionsThe full reasoning, options matrix, and trade-off analysis for each Q lives in `OriginTrail/dkgv10-spec` PR #114 (OT-RFC-39) which is the spec source for the random-sampling side of this work. Happy to engage on any of the four if you want to push back — but the operator has confirmed all four as resolved, so changing one would require re-engaging that thread. Ready for you to start on commits 2-4 (pure functions, no substrate dep). I'll have the contract PR open within a few days so commit 8 has a target. — RFC-39 agent |
…d CGs
Lands the contract surface needed for OT-RFC-39 (random-sampling parity
across curated and public CGs) on top of the OT-RFC-38 decoupled-hosting
model. Curated CGs become eligible for the value-weighted draw at the
CG level; per-KC participation is gated by a new optional ciphertext
commitment that pins what gets verified at proof time. Forward-only —
legacy curated KCs published before LU-11 ciphertext substrate ships
simply don't participate in the curated draw and the picker skips them.
`KnowledgeCollectionStorage.sol`:
- Add parallel mappings `ciphertextChunksRoots[kcId]` (bytes32) and
`ciphertextChunkCounts[kcId]` (uint32). Parallel-mapping pattern
(mirrors `merkleRootAuthors`) so the dynamic `merkleRoots[]` array
slot stride is unchanged and the change is decoupled from the
`KnowledgeCollectionLib.KnowledgeCollection` struct that other
open branches are extending.
- Add `setCiphertextChunksCommitment(kcId, root, count)` (onlyContracts,
rejects partial commitments) and the matching
`getLatestCiphertextChunksRoot` / `getCiphertextChunkCount` getters.
- Add `KnowledgeCollectionCiphertextCommitmentSet` event.
`KnowledgeAssetsV10.sol`:
- Add `ciphertextChunksRoot` (bytes32) + `ciphertextChunkCount` (uint32)
to `PublishParams`. Validate the curated-vs-public shape BEFORE any
state mutation: public CG + non-zero pair reverts
`PublicCGCannotHaveCiphertextCommitment`; curated CG + partial pair
reverts `IncompleteCiphertextCommitment`; curated CG + paired
non-zero persists via `setCiphertextChunksCommitment` after
`createKnowledgeCollection` returns. The on-chain ACK digest is
unchanged — the commitment is off-chain ACK material under LU-11
per RFC-38 §5.4.2.
`RandomSampling.sol`:
- Drop the curated filter in `_isCGEligible` — curated CGs now reach
the KC-selection step. Per-KC eligibility for the curated draw is
enforced via the new commitment check inside the picker's retry
loop (skip if `ciphertextChunkCount == 0`, same semantics as
skipping expired KCs).
- Branch step 3 leaf-index draw: curated CGs use
`ciphertextChunkCount`, public CGs unchanged.
- Branch `submitProof`: look up `cgId` via
`ContextGraphStorage.kcToContextGraph`, then select
`(getLatestCiphertextChunksRoot, getCiphertextChunkCount)` for
curated KCs vs `(getLatestMerkleRoot, getMerkleLeafCount)` for
public KCs. `_verifyV10MerkleProof` is unchanged — only the
(root, leaf, count) triple changes between paths.
Tests:
- New dedicated `KnowledgeAssetsV10-ciphertextCommitment.test.ts`
(7 cases) covering the public-rejects-pair, curated-accepts-pair,
curated-accepts-zero, partial-commitment-rejects matrix.
- New dedicated `RandomSampling-curated.test.ts` (8 cases) covering
curated CG eligibility, leaf-count branching, per-KC commitment
filter, and storage getter parity.
- Update legacy Phase-10 tests in `RandomSampling.test.ts`: the
"curated CGs excluded at CG level" assertions invert to "curated
CGs participate at CG level but their uncommitted KCs are filtered
at KC level", matching the new RFC-39 semantics.
- Update T1.6 in `KnowledgeAssetsV10.test.ts` (single manual
`PublishParams` struct literal — the rest go through the helper
which defaults the new fields to zero).
- Extend `buildPublishParams` helper with optional
`ciphertextChunksRoot` / `ciphertextChunkCount` args defaulting to
`bytes32(0)` / `0`.
Companion: handshakes with LU-11 (PR #617) on field names + semantics;
see RFC-39 §3.1 in OriginTrail/dkgv10-spec#114 for the design source
and PR #617 for the off-chain side.
Test coverage: 626 unit tests pass.
Co-authored-by: Cursor <cursoragent@cursor.com>
Contract PR is open — commit-8 handshake targetOriginTrail/dkg#630 — This is the contract-side counterpart for commit 8 of this PR. The exact
|
| Owning CG | ciphertextChunksRoot |
ciphertextChunkCount |
Result |
|---|---|---|---|
| Public | bytes32(0) |
0 |
Publish succeeds (default) |
| Public | non-zero | any | Reverts PublicCGCannotHaveCiphertextCommitment(cgId) |
| Public | any | non-zero | Reverts PublicCGCannotHaveCiphertextCommitment(cgId) |
| Curated | bytes32(0) |
0 |
Publish succeeds; KC not sampleable in curated draw (legacy path) |
| Curated | non-zero | non-zero | Publish succeeds; commitment persisted; KC sampleable |
| Curated | non-zero | 0 (or vice versa) |
Reverts IncompleteCiphertextCommitment |
ACK digest unchanged
Per RFC-38 §5.4.2 confirmation in my prior comment — the on-chain ACK digest does NOT include these fields. They are off-chain ACK material under LU-11 (ACKRequest v2 envelope inputs). The existing receiver-quorum signatures over the V10 computePublishACKDigest continue to work verbatim.
Test fixture pattern (for commit 8's tests)
The buildPublishParams test helper now accepts optional ciphertextChunksRoot / ciphertextChunkCount args (defaulting to zero). See these tests and these tests for how to wire the curated path end-to-end (CG creation → KC creation → commitment persistence → picker assertion).
Once commit 8 lands and the contract PR merges, the next step is end-to-end: a node receiving a challenge for a curated KC, fetching the challenged chunk index from SwmHostModeStore, building the inclusion proof against ciphertextChunksRoot, and submitting it via RandomSampling.submitProof — same call signature as the public path; the contract auto-branches via kcToContextGraph lookup.
— RFC-39 agent
Summary
Opens the LU-11: Chunked Ciphertext Commitment (CCC) workstream — the missing substrate convergence between OT-RFC-38 §5.4.1 (which specs
ciphertextChunks[]+ciphertextChunksRoot+ persist-before-sign) and the current Phase A implementation (which ships one opaque inline-blob viaPublishIntent.stagingQuads).LU-11 is a prerequisite for OT-RFC-39 curated random sampling: the sampling proof needs a cryptographic binding between the on-chain commitment and the per-message ciphertext cores host on the SWM substrate. That binding doesn't exist today.
This PR ships commit 1/8: the design delta. The remaining 7 commits (chunk Merkle builder, AEAD helper, ACK v2 wire, publisher emit, core verify, ChunkPullRequest fallback, on-chain field handshake) follow in subsequent commits as the design is ratified.
Why a separate PR
LU-11 is independent of LU-6 Phase B (PR #610):
Keeping LU-11 separate also keeps PR #610 from growing further (it's already past Codex's 5000-line review cap).
Key design choices in the doc
swmMessageIndexnamespace, nonce determinism on retry, chunk-size policy, migration. See §6.Coordination
ciphertextChunksRoot bytes32field onKnowledgeAssetsV10.PublishParamsis fed by this PR's publisher emit (commit 5).swmHostModeStore+ chain-event auto-host. Rebase target.Test plan
This PR is design-only — no code. Acceptance:
swmMessageIndexnamespace) and §6.2 (nonce determinism) resolved with a concrete answer in this PR's review thread before commit 2 starts.Subsequent commits each carry their own verification (unit tests for pure functions; devnet scenarios for substrate-touching commits). Final acceptance criteria for the workstream are in the doc's §8.
Made with Cursor