OT-RFC-38 LU-6 C3 — pre-registration byte-cap stress harness by branarakic · Pull Request #623 · OriginTrail/dkg

branarakic · 2026-05-24T23:47:31Z

Summary

Validates that the per-CG byte cap on `SwmHostModeStore` and the sliding-window rate limit on `DiscoveryRateLimit` actually enforce when a curator floods cores with ciphertext for an unregistered (freemium-tier) CG — the abuse scenario spelled out in RFC §1.2.4.

Stacked on #610. Pure devnet harness; no production-code changes.

Test phases

Curator (N5) creates a curated CG locally WITHOUT `register=true`.
A core (N1) explicitly host-mode subscribes (short-circuits the beacon-driven auto-subscribe).
Curator pushes a burst of fat triples (default: 16 × 80 KiB ≈ 1.3 MiB, exceeding the 1 MiB default cap).
Asserts core's `perCg[CG_ID].bytes ≤ 2 MiB` ceiling (cap enforced, not just absorbed).
Greps core `daemon.log` for "Host-mode rejected pre-reg envelope" lines (advisory; passes even when only the size clamp absorbs traffic — both controls are complementary).
Confirms the core process is still alive (no crash regression).

Re-runnable: timestamp-suffixed CG id. Operators can tune the burst via `WRITES_COUNT` and `WRITE_PAYLOAD_BYTES` env vars.

Test plan

Bash `-n` syntax check passes
Devnet run — invoked manually post-merge as part of the LU-6 mainnet validation sweep

Made with Cursor

…arness Validates that the per-CG byte cap on `SwmHostModeStore` and the sliding-window rate limit on `DiscoveryRateLimit` actually enforce when a curator floods cores with ciphertext for an unregistered (freemium-tier) CG — the abuse scenario spelled out in RFC §1.2.4. Test phases: 1. Curator (N5) creates a curated CG locally WITHOUT register=true. 2. A core (N1) explicitly host-mode subscribes (short-circuits the beacon-driven auto-subscribe). 3. Curator pushes a burst of fat triples (default: 16 × 80 KiB ≈ 1.3 MiB total, exceeding the 1 MiB default cap). 4. Asserts the core's perCg[CG_ID].bytes ≤ 2 MiB ceiling (cap enforced, not just absorbed without complaint). 5. Greps core daemon.log for "Host-mode rejected pre-reg envelope" since the burst start (advisory; passes even when only the size clamp absorbs traffic). 6. Confirms the core process is still alive (no crash regression). Re-runnable: timestamp-suffixed CG id. Operators can tune the burst via `WRITES_COUNT` and `WRITE_PAYLOAD_BYTES` env vars. Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions · 2026-05-24T23:49:15Z

+  "allowedAgents": ["$CURATOR_AGENT"] }
+EOF
+)")
+CREATED_ID=$(parse_json "$CREATE" '.id')


🔴 Bug: POST /api/context-graph/create returns { created, uri } for the non-register path, not id. CREATED_ID will stay empty here, so the script aborts before it ever runs the stress case. Read .created instead, or just reuse $CG_ID for the success check/logging.

github-actions · 2026-05-24T23:49:15Z

+    }];
+    console.log(JSON.stringify({ contextGraphId: cgId, quads }));
+  ')
+  RESP=$(api_call "$CURATOR_NODE" POST /api/shared-memory/write "$PAYLOAD" || true)


🔴 Bug: || true hides every /api/shared-memory/write failure, so the loop can complete even if no envelope was ever emitted. Combined with the later warning-only checks, this can produce a false PASS on a completely broken write path. Fail fast on error responses, or at least count successful writes and require >0 before continuing.

github-actions · 2026-05-24T23:49:15Z

+
+# Default unregistered cap = 1 MiB = 1048576. We accept anything ≤ 2 MiB
+# as "cap enforced" because the cap is a soft hint, not a hard wall.
+CAP_CEILING=2097152


🔴 Bug: the implementation prunes unregistered host-mode storage down to perCgByteCap after each append, so the expected post-burst ceiling is the configured cap (1 MiB by default), not 2 MiB. Accepting anything up to 2 MiB will let a doubled-cap regression pass. Assert against the actual configured unregistered cap instead of this loose constant.

github-actions · 2026-05-24T23:49:15Z

+if [ -f "$CORE_LOG" ]; then
+  REJ_COUNT=$(tail -c +"$((LOG_OFFSET + 1))" "$CORE_LOG" 2>/dev/null | grep -c "Host-mode rejected pre-reg envelope" || true)
+  log "Rejection lines since burst: $REJ_COUNT"
+  if [ "$REJ_COUNT" = "0" ]; then


🔴 Bug: this downgrades the missing Host-mode rejected pre-reg envelope signal to a warning, but the script claims to validate DiscoveryRateLimit as well as the byte cap. If curator-to-beacon binding or the rate limiter is broken, the byte clamp alone can still make the script report PASS. Once the burst exceeds the per-minute budget, make missing rejection logs a hard failure, or add an explicit precondition that the beacon binding is present before starting the burst.

…, configured cap, honest scope Addresses four Codex bugs flagged on PR #623: 1. Read response field that doesn't exist (prereg-bytecap-stress.sh:105) `/api/context-graph/create` (no `register: true`) returns `{ created, uri }`, not `{ id }`. Reading `.id` produced an empty string and the next [-n] check aborted the script before the burst ever ran — silently turning every devnet run into a no-op that reported success based on absent state. Fix: read `.created`. 2. `|| true` hid every write failure (prereg-bytecap-stress.sh:152) The burst loop swallowed every error from /api/shared-memory/write, so the script could complete a "burst" with zero envelopes ever emitted, hit phase 4 with the core empty, and warn-but-pass. Fix: drop `|| true`, count successful writes (triplesWritten=1), require ≥1 before proceeding to phase 4. 3. CAP_CEILING was 2 MiB instead of configured cap (prereg-bytecap-stress.sh:178) `enforceLimitsAfterAppend` guarantees `survivorBytes ≤ perCgByteCap` after every oversized append. The configured default for pre-reg CGs is 1 MiB. Accepting anything ≤ 2 MiB would let a doubled-cap regression pass undetected. Fix: assert against `EXPECTED_CAP_BYTES + CAP_OVERHEAD_BYTES` (1 MiB + 64 KiB framing slop by default). Both override-able via env for operators running non-default configs. 4. Over-claimed DiscoveryRateLimit coverage (prereg-bytecap-stress.sh:192) The script header claimed to validate BOTH the byte cap AND the rate limiter. The byte cap is the authoritative control; explicit rejection log lines are observability — and demanding them was a false-negative risk if the cap absorbed the burst before the rate limiter kicked in. Fix: header re-scoped to "byte cap enforced + observable rejections" (the rate limiter is complementary, not authoritative). The grep step is now logging-only, the byte cap assertion is hard. Bonus: phase 4 hard-fails when the core stored 0 bytes (previously warn-only), preventing silent passes when the gossip path didn't engage at all. Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions · 2026-05-25T00:12:32Z

+    SUCCESSFUL_WRITES=$((SUCCESSFUL_WRITES + 1))
+  fi
+done
+[ "$SUCCESSFUL_WRITES" -gt 0 ] || fail "no SWM writes succeeded — can't validate cap enforcement (precondition broken)"


🔴 Bug: Requiring only >0 successful writes makes the cap assertion vacuous. With the default payload size, anything up to 12 successful writes stays under the 1 MiB limit, so phase 4 can pass even if pruning never ran. Gate this test on the successfully submitted bytes actually exceeding EXPECTED_CAP_BYTES (or track admitted bytes directly) before treating a pass as proof that the byte cap enforced.

github-actions · 2026-05-25T00:12:32Z

+act "6. Confirm the core process is still alive"
+# ===========================================================================
+CORE_PIDFILE=$(node_pidfile "$CORE_NODE")
+if [ -f "$CORE_PIDFILE" ]; then


🔴 Bug: If daemon.pid is missing, this branch is skipped and the script still prints PASS plus Core process: still alive after burst. That turns a missing/stale pidfile into a false positive for the crash check. Fail when the pidfile is absent/empty, or fall back to an HTTP liveness probe such as /api/status before reporting success.

branarakic · 2026-05-25T15:29:33Z

Superseded by PR #649 (release: rc.10 testnet-ready cut). All commits from this PR are now on main via #649. Unaddressed Codex review feedback (C3 prereg-bytecap stress harness reliability) is being tracked + fixed in a dedicated post-rc.10 followup PR.

…38 scripts The four LU-6 devnet scripts (#621/#622/#623/#624) shipped with several control-flow gaps that let regressions silently sneak through: errors swallowed with `|| true`, fixed sleeps where bounded retries were needed, and assertions that passed even when the scenario under test never happened. Codex's review on the closed PRs flagged 24 specific items; the rc.10 integration merge fixed most of them. This commit closes the remaining ones — the ones that materially affect whether a PASS is meaningful. devnet-test-rfc38-revocation.sh (#621): - Member pre-creates (M1, M2) no longer swallow EVERY error with `|| true`. The new `member_pre_create` helper captures the response and tolerates ONLY the idempotent "already exists" signal; any other failure (wrong auth, malformed body, daemon down) now fails the script immediately with the actual error visible, instead of surfacing later as an opaque catchup timeout. - Phase 5's single "sleep 3 + one-shot read" replaced with a 30s bounded-retry loop (`wait_for_count_or_steady`) — gossip / catchup latency between the post-revocation write and a member's final triple count is variable, and the one-shot snapshot was reporting M1's partial state as a regression. - Added the forward-only-rotation lower bound: `M2_FINAL >= M2_PRE`. The pre-existing `<= 3` check alone would pass if revocation also wiped M2's previously-decryptable triples — violating the contract in the script header that the kicked member RETAINS what they could already decrypt; they just stop learning anything new. devnet-test-rfc38-curator-offline-midbatch.sh (#622): - Phase 1's `sleep 3` after member pre-create replaced with an explicit `wait_for_m1_onchain_id` poll. Phase 5's non-curator publish requires M1 to have observed the CG's `onChainId`, otherwise it bounces with "Context graph ... is not registered on-chain" — gossip lag for the `ContextGraphCreated` event can easily exceed 3s under devnet load. - EXIT trap now waits up to 60s for `/api/status` to respond before declaring the curator healthy again. `devnet.sh restart-node` returns after spawning the daemon, not after it's actually ready to serve, so CI runners that chain another scenario immediately were inheriting a half-started devnet. devnet-test-rfc38-prereg-bytecap-stress.sh (#623): - Added a precondition that the burst's SUBMITTED_BYTES actually exceeds the configured cap. With the default 80 KiB payload size anything up to 12 writes stayed under the 1 MiB cap, so the downstream clamp assertion was vacuously satisfied if anyone misconfigured WRITES_COUNT/WRITE_PAYLOAD_BYTES — a TEST bug passing as a daemon PASS. - Phase 6's liveness check no longer silently skips when the pidfile is missing/empty. Falls back to a `/api/status` HTTP probe so containerised devnets (where pidfiles aren't written) still get a meaningful liveness check; hard-fails when both pidfile AND status are unreachable. devnet-test-rfc38-unclean-restart.sh (#624): - Captures the killed core's `peerId` from `/api/status` BEFORE the SIGKILL. The post-restart catchup calls in phase 6 now pin to that peerId so we're explicitly exercising recovery from the restarted node — previously the catchup fanned out to any connected peer and could pass by pulling data from the curator (still online), validating nothing about the unclean-restart contract. - Phase 3 now waits for STRICT mid-batch state (`0 < M1_PARTIAL < WRITES_COUNT`) before the SIGKILL. The previous one-shot snapshot accepted any partial value including 0 or already-complete; both meant the kill below never actually exercised the `lastHostCatchupSeqno` resume path this test claims to cover. - Catchup responses are captured (not discarded to `/dev/null`) and asserted free of `error` / `swmError` / `durableError` via the new `assert_catchup_clean` helper. HTTP 500s, auth denials, host-catchup failures, etc. were previously invisible — the final triple-count check would still go green if data arrived via background gossip. Bash-syntax-checked (`bash -n` on all four). No behaviour change when the underlying scenarios are healthy; the changes only convert silent false-positives into loud failures. Co-authored-by: Cursor <cursoragent@cursor.com>

branarakic requested review from Jurij89 and zsculac as code owners May 24, 2026 23:47

github-actions Bot reviewed May 24, 2026

View reviewed changes

github-actions Bot reviewed May 25, 2026

View reviewed changes

branarakic closed this May 25, 2026

branarakic mentioned this pull request May 25, 2026

fix: post-rc.10 codex follow-up sweep + libp2p nodeVersion broadcast #653

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OT-RFC-38 LU-6 C3 — pre-registration byte-cap stress harness#623

OT-RFC-38 LU-6 C3 — pre-registration byte-cap stress harness#623
branarakic wants to merge 2 commits into
feat/ot-rfc-38-lu6-host-modefrom
feat/lu6-followup-c3-prereg-bytecap-stress

branarakic commented May 24, 2026

Uh oh!

github-actions Bot May 24, 2026

Uh oh!

github-actions Bot May 24, 2026

Uh oh!

github-actions Bot May 24, 2026

Uh oh!

github-actions Bot May 24, 2026

Uh oh!

github-actions Bot May 25, 2026

Uh oh!

github-actions Bot May 25, 2026

Uh oh!

branarakic commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

branarakic commented May 24, 2026

Summary

Test phases

Test plan

Uh oh!

github-actions Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

branarakic commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant