Skip to content

fix(agent): plumb on-chain CG id into finalization gossip (unblocks RS sampling for fresh publishes)#758

Merged
branarakic merged 1 commit into
release/rc.12from
fix/finalization-gossip-target-cg-id
May 27, 2026
Merged

fix(agent): plumb on-chain CG id into finalization gossip (unblocks RS sampling for fresh publishes)#758
branarakic merged 1 commit into
release/rc.12from
fix/finalization-gossip-target-cg-id

Conversation

@branarakic
Copy link
Copy Markdown
Contributor

Summary

publishFromSWM emitted targetContextGraphId: undefined on every non-REMAP publish, so receiving cores promoted the SWM snapshot into the legacy <cgName>/_meta graph instead of the per-cgId <cgName>/context/<cgId>/_meta graph that the RS prover's extractV10KCFromStore reads from → every freshly published KC failed random sampling with KCNotFoundError even though SWM had been replicated correctly.

Found while running scripts/devnet-test-rfc39-comprehensive.sh against release/rc.12 HEAD — Scenario A (public-CG regression) reproed it on the very first publish, with the cores' _meta graph stuck at <cgName>/_meta while the publisher's local copy was correctly at <cgName>/context/3/_meta.

The publisher already resolves onChainId (explicit REMAP target OR getContextGraphOnChainId fallback). We thread that into the gossip's targetContextGraphId while keeping ctxGraphIdStr REMAP-only — so we don't accidentally trip the publisher's REMAP-delete branch for regular publishes.

Test plan

  • Manually verified end-to-end on a fresh 6-node devnet (4 core + 2 edge, ask = 0.5 TRAC/KiB, stake = 50k TRAC):

    Scenario kcId proof tx Notes
    A — public CG 1 0xa5d29be4… flat-KC prover, regression check
    B — curated 1-chunk LU-11 2 0xac19bd3a… ct_root=0x79395f63…, ct_count=1
    C — curated multi-chunk 3 0x6e2e1aaa… ct_count=4, multi-leaf Merkle
    D — late-join backfill 4 0x7f891d99… LU-11 backfill done … fetched=4 failures=0, 3/3 sibling cores served

    All four landed submitChallengeProof on chain. Before the fix Scenario A timed out at 180s on kc-not-synced / KCNotFoundError for every core, blocking the whole suite (set -euo pipefail).

  • Also exposes DEVNET_CORE_ASK_TRAC env var so the devnet bootstrap can use a realistic ask (default unchanged at 1 TRAC).

Related

Made with Cursor

`publishFromSWM` emitted `targetContextGraphId: undefined` whenever the
caller didn't explicitly set `options.subContextGraphId` /
`options.contextGraphId` (i.e. every non-REMAP publish). Receiving cores
then promoted the SWM snapshot into the legacy `<cgName>/_meta` graph
instead of the per-cgId `<cgName>/context/<cgId>/_meta` graph that the
RS prover's `extractV10KCFromStore` reads from — so every freshly
published KC failed sampling with `KCNotFoundError` even though SWM had
been replicated correctly. Reproduced end-to-end via
`scripts/devnet-test-rfc39-comprehensive.sh` Scenario A; with the fix
all four scenarios (public, curated 1-chunk, curated multi-chunk,
late-join auto-backfill) land on-chain `submitChallengeProof`.

The publisher already resolves `onChainId` (explicit REMAP target OR
`getContextGraphOnChainId` lookup) — thread that into the gossip's
`targetContextGraphId` while keeping `ctxGraphIdStr` REMAP-only so we
don't trip the publisher's REMAP-delete branch for regular publishes.

Also expose `DEVNET_CORE_ASK_TRAC` so the devnet bootstrap can use a
realistic ask (e.g. 0.5 TRAC/KiB) without editing the script in place.

Co-authored-by: Cursor <cursoragent@cursor.com>
@branarakic branarakic merged commit 24f1167 into release/rc.12 May 27, 2026
1 check failed
@branarakic branarakic deleted the fix/finalization-gossip-target-cg-id branch May 27, 2026 14:36
matic031 pushed a commit to KilianTrunk/dkg that referenced this pull request Jun 2, 2026
…-cd68fa689 KCs

PR OriginTrail#758's `fix(agent): always plumb on-chain CG id into finalization
gossip` (cd68fa6) closes the publisher side of the RS `kc-not-synced`
bug — every freshly-published KC's `targetContextGraphId` now lands in
the gossip envelope so receiving cores promote the SWM snapshot into
the per-cgId `<cgName>/context/<cgId>/_meta` graph that the prover's
`extractV10KCFromStore` reads from. That fix shipped without unit-test
coverage (the regression guard is `scripts/devnet-test-rfc39-
comprehensive.sh` Scenario A, which doesn't run in CI) and leaves two
operational gaps:

1) Pre-fix publishers still floating in a mesh during rolling upgrades
   gossip `targetContextGraphId: undefined`. An upgraded receiver that
   reads the wire literally still downgrades to legacy `<cgName>/_meta`
   promotion and the RS prover stays stuck on `kc-not-synced`.

2) Every KC published to a receiver BEFORE its daemon got upgraded
   has its `_meta` parked at the legacy URI. Restarting the upgraded
   daemon doesn't retroactively promote those — they remain
   un-provable forever unless someone copies the meta into place.

Confirmed both on testnet during diagnosis: beacon-01 reported
`kc-not-synced` for every recent challenge against CG OriginTrail#4
(`miles-publish-stress-26may`); a SPARQL probe found 16,409 triples
in `<cg>/_meta` (628 KCs worth of `dkg:batchId` + KA `partOf` +
publication URIs) but 0 in `<cg>/context/4/_meta`.

This PR adds:

* FinalizationHandler defensive lookup. New optional
  `ResolveContextGraphOnChainId` constructor callback resolves the
  on-chain id locally when the gossip envelope's
  `targetContextGraphId` is empty. DKGAgent now wires
  `getContextGraphOnChainId(cgName)` (which reads the subscribed-CG
  cache + ontology graph; same lookup the publisher uses on the
  outbound side) so an upgraded core stays useful in a
  mixed-version mesh. Resolver failure / not-on-chain returns
  cleanly fall back to legacy `<cgName>/_meta` promotion — pure
  belt-and-braces, no regression for already-correct gossip.

* finalization-handler-defensive-cg-id.test.ts (6 cases) pins the
  three resolution branches: (a) gossip-set takes precedence, (b)
  resolver-fallback fires when wire is empty, (c) legacy URI is the
  ultimate fallback. Tests use the existing dedup-guard mechanism
  as a probe for the resolved `ctxGraphId` — no chain mock, no
  merkle setup, ~150 lines total.

* POST /api/random-sampling/backfill-percgid-meta admin endpoint
  that copies the per-KC subset of `<cgName>/_meta` into
  `<cgName>/context/<cgId>/_meta` for every subscribed CG with an
  on-chain id. Filter mirrors the publisher's promotion
  (`dkg-publisher.ts:1407-1422`): subjects with `dkg:batchId`, KA
  UALs reached via `dkg:partOf`, and publication URIs reached via
  `dkg:authoredBy` — CG-lifecycle subjects (createdAt, accessPolicy
  on the cgEntity) are correctly excluded. Idempotent
  (`already-populated` short-circuit on a non-empty target).
  Supports `dryRun: true` for probing and
  `contextGraphIds: [...]` for targeting specific CGs.

* backfill-rs-percgid-meta-route.test.ts (6 cases) covers
  happy-path copy, idempotence, dry-run, the lifecycle-subject
  filter, the not-on-chain skip, and the CG-restriction args.

* scripts/backfill-rs-percgid-meta.mjs — operator-facing one-shot
  driver. Resolves the daemon URL + bearer token via the existing
  `scripts/lib/dkg-daemon.mjs` helper, prints a per-CG report,
  exits non-zero only on hard failure (not on "nothing to do").

Operator workflow on testnet beacons after this PR lands:
  1. Upgrade daemon binary to rc.12 (picks up cd68fa6 +
     this PR's receiver fallback).
  2. `node scripts/backfill-rs-percgid-meta.mjs --dry-run` against
     each beacon to preview.
  3. Re-run without `--dry-run` to copy the orphan meta into place.
  4. Watch `/api/random-sampling/status` — `submittedCount` should
     start climbing within one sampling period.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant