fix(agent): plumb on-chain CG id into finalization gossip (unblocks RS sampling for fresh publishes)#758
Merged
Conversation
`publishFromSWM` emitted `targetContextGraphId: undefined` whenever the caller didn't explicitly set `options.subContextGraphId` / `options.contextGraphId` (i.e. every non-REMAP publish). Receiving cores then promoted the SWM snapshot into the legacy `<cgName>/_meta` graph instead of the per-cgId `<cgName>/context/<cgId>/_meta` graph that the RS prover's `extractV10KCFromStore` reads from — so every freshly published KC failed sampling with `KCNotFoundError` even though SWM had been replicated correctly. Reproduced end-to-end via `scripts/devnet-test-rfc39-comprehensive.sh` Scenario A; with the fix all four scenarios (public, curated 1-chunk, curated multi-chunk, late-join auto-backfill) land on-chain `submitChallengeProof`. The publisher already resolves `onChainId` (explicit REMAP target OR `getContextGraphOnChainId` lookup) — thread that into the gossip's `targetContextGraphId` while keeping `ctxGraphIdStr` REMAP-only so we don't trip the publisher's REMAP-delete branch for regular publishes. Also expose `DEVNET_CORE_ASK_TRAC` so the devnet bootstrap can use a realistic ask (e.g. 0.5 TRAC/KiB) without editing the script in place. Co-authored-by: Cursor <cursoragent@cursor.com>
4 tasks
matic031
pushed a commit
to KilianTrunk/dkg
that referenced
this pull request
Jun 2, 2026
…-cd68fa689 KCs PR OriginTrail#758's `fix(agent): always plumb on-chain CG id into finalization gossip` (cd68fa6) closes the publisher side of the RS `kc-not-synced` bug — every freshly-published KC's `targetContextGraphId` now lands in the gossip envelope so receiving cores promote the SWM snapshot into the per-cgId `<cgName>/context/<cgId>/_meta` graph that the prover's `extractV10KCFromStore` reads from. That fix shipped without unit-test coverage (the regression guard is `scripts/devnet-test-rfc39- comprehensive.sh` Scenario A, which doesn't run in CI) and leaves two operational gaps: 1) Pre-fix publishers still floating in a mesh during rolling upgrades gossip `targetContextGraphId: undefined`. An upgraded receiver that reads the wire literally still downgrades to legacy `<cgName>/_meta` promotion and the RS prover stays stuck on `kc-not-synced`. 2) Every KC published to a receiver BEFORE its daemon got upgraded has its `_meta` parked at the legacy URI. Restarting the upgraded daemon doesn't retroactively promote those — they remain un-provable forever unless someone copies the meta into place. Confirmed both on testnet during diagnosis: beacon-01 reported `kc-not-synced` for every recent challenge against CG OriginTrail#4 (`miles-publish-stress-26may`); a SPARQL probe found 16,409 triples in `<cg>/_meta` (628 KCs worth of `dkg:batchId` + KA `partOf` + publication URIs) but 0 in `<cg>/context/4/_meta`. This PR adds: * FinalizationHandler defensive lookup. New optional `ResolveContextGraphOnChainId` constructor callback resolves the on-chain id locally when the gossip envelope's `targetContextGraphId` is empty. DKGAgent now wires `getContextGraphOnChainId(cgName)` (which reads the subscribed-CG cache + ontology graph; same lookup the publisher uses on the outbound side) so an upgraded core stays useful in a mixed-version mesh. Resolver failure / not-on-chain returns cleanly fall back to legacy `<cgName>/_meta` promotion — pure belt-and-braces, no regression for already-correct gossip. * finalization-handler-defensive-cg-id.test.ts (6 cases) pins the three resolution branches: (a) gossip-set takes precedence, (b) resolver-fallback fires when wire is empty, (c) legacy URI is the ultimate fallback. Tests use the existing dedup-guard mechanism as a probe for the resolved `ctxGraphId` — no chain mock, no merkle setup, ~150 lines total. * POST /api/random-sampling/backfill-percgid-meta admin endpoint that copies the per-KC subset of `<cgName>/_meta` into `<cgName>/context/<cgId>/_meta` for every subscribed CG with an on-chain id. Filter mirrors the publisher's promotion (`dkg-publisher.ts:1407-1422`): subjects with `dkg:batchId`, KA UALs reached via `dkg:partOf`, and publication URIs reached via `dkg:authoredBy` — CG-lifecycle subjects (createdAt, accessPolicy on the cgEntity) are correctly excluded. Idempotent (`already-populated` short-circuit on a non-empty target). Supports `dryRun: true` for probing and `contextGraphIds: [...]` for targeting specific CGs. * backfill-rs-percgid-meta-route.test.ts (6 cases) covers happy-path copy, idempotence, dry-run, the lifecycle-subject filter, the not-on-chain skip, and the CG-restriction args. * scripts/backfill-rs-percgid-meta.mjs — operator-facing one-shot driver. Resolves the daemon URL + bearer token via the existing `scripts/lib/dkg-daemon.mjs` helper, prints a per-CG report, exits non-zero only on hard failure (not on "nothing to do"). Operator workflow on testnet beacons after this PR lands: 1. Upgrade daemon binary to rc.12 (picks up cd68fa6 + this PR's receiver fallback). 2. `node scripts/backfill-rs-percgid-meta.mjs --dry-run` against each beacon to preview. 3. Re-run without `--dry-run` to copy the orphan meta into place. 4. Watch `/api/random-sampling/status` — `submittedCount` should start climbing within one sampling period. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
publishFromSWMemittedtargetContextGraphId: undefinedon every non-REMAP publish, so receiving cores promoted the SWM snapshot into the legacy<cgName>/_metagraph instead of the per-cgId<cgName>/context/<cgId>/_metagraph that the RS prover'sextractV10KCFromStorereads from → every freshly published KC failed random sampling withKCNotFoundErroreven though SWM had been replicated correctly.Found while running
scripts/devnet-test-rfc39-comprehensive.shagainstrelease/rc.12HEAD — Scenario A (public-CG regression) reproed it on the very first publish, with the cores'_metagraph stuck at<cgName>/_metawhile the publisher's local copy was correctly at<cgName>/context/3/_meta.The publisher already resolves
onChainId(explicit REMAP target ORgetContextGraphOnChainIdfallback). We thread that into the gossip'stargetContextGraphIdwhile keepingctxGraphIdStrREMAP-only — so we don't accidentally trip the publisher's REMAP-delete branch for regular publishes.Test plan
Manually verified end-to-end on a fresh 6-node devnet (4 core + 2 edge, ask = 0.5 TRAC/KiB, stake = 50k TRAC):
0xa5d29be4…0xac19bd3a…0x79395f63…, ct_count=10x6e2e1aaa…0x7f891d99…LU-11 backfill done … fetched=4 failures=0, 3/3 sibling cores servedAll four landed
submitChallengeProofon chain. Before the fix Scenario A timed out at 180s onkc-not-synced/KCNotFoundErrorfor every core, blocking the whole suite (set -euo pipefail).Also exposes
DEVNET_CORE_ASK_TRACenv var so the devnet bootstrap can use a realistic ask (default unchanged at 1 TRAC).Related
Made with Cursor