Skip to content

fix(simulator): bump pre-init buffer 1.05x -> 2.0x to fix CI#192

Closed
meyer9 wants to merge 1 commit into
mainfrom
meyer9/fix-simulator-preinit-buffer
Closed

fix(simulator): bump pre-init buffer 1.05x -> 2.0x to fix CI#192
meyer9 wants to merge 1 commit into
mainfrom
meyer9/fix-simulator-preinit-buffer

Conversation

@meyer9
Copy link
Copy Markdown
Collaborator

@meyer9 meyer9 commented Jun 1, 2026

Problem

The basic-benchmarks job in .github/workflows/public-benchmarks.yaml has been failing on every push to main since 2026-05-15 (PR #184). Symptom:

execution reverted: Not enough accounts to load/update

Reproduces deterministically on base-mainnet-simulation @ 25M gas (the first matrix cell, run against both reth and geth), about 7-8 benchmark blocks into a 20-block run. CI history:

Run Date Commit Result
26758739814 2026-06-01 b5290a8 (#191) ❌ failure
26655072445 2026-05-29 4f98307 (#186) ❌ failure
26525697996 2026-05-27 (#188) ❌ failure
...
25935745254 2026-05-15 cf47296 (#184) first failure
25870299556 2026-05-14 cb5b9d6 ✅ last green

The contract revert is at Simulator.sol:82 — a benchmark-fidelity check that current_address_index + load + update <= num_address_initialized.

Mechanism

PR #184 (cf47296) changed targetCalls in the simulator worker:

-targetCalls := uint64(math.Ceil(float64(t.numCallsPerBlock) * t.scaleFactor))
+targetCalls := t.numCallsPerBlock

This was correct for scaleFactor > 1 (the 200M-gas case the PR was targeting). But for scaleFactor < 1 (the 25M case from base-mainnet-simulation, where scaleFactor = 25M / 35M = 0.714), the change INCREASES per-block consumption from ceil(46 × 0.714) = 33 calls/block to 46 calls/block.

Pre-init is sized as:

configForAllBlocks = base × (numCallsPerBlock × NumBlocks × scaleFactor × 1.05)

This gives a uniform 5% buffer post-PR-#184. The buffer is mathematically sufficient for an ideal run, but in practice the boundary is too tight: rounding accumulation across multiple Stats fields (per-tx blockCounts.Round(), big.Int.Int64() truncation, etc.) plus the deterministic per-tx layout pushes consumption past the safety margin well before block 20.

Fix

Bump the multiplier from 1.05 to 2.0. Pre-init now allocates ~2× the predicted consumption. Pre-init runs during setup as a one-time cost (dominated by mineAndConfirm RPC batching), so the wall-clock impact is a few seconds per cell — negligible against the ~5 min/cell, ~1 h/job total runtime.

-configForAllBlocks, err := t.payloadParams.Mul(float64(t.numCallsPerBlock) * float64(t.params.NumBlocks) * t.scaleFactor * 1.05).ToConfig()
+// 2.0x safety multiplier (was 1.05). ...
+configForAllBlocks, err := t.payloadParams.Mul(float64(t.numCallsPerBlock) * float64(t.params.NumBlocks) * t.scaleFactor * 2.0).ToConfig()

Verification

  • go build ./runner/... clean
  • go vet ./runner/payload/simulator/... clean
  • go test ./runner/payload/simulator/... -timeout 60s clean (existing mineAndConfirmBatchSize tests pass; no test covers testForBlocks pre-init sizing directly)
  • Mechanism is symmetric: only the SIZE of the pre-init batch changes, not what it does. Setup still runs InitializeAddressChunk / InitializeStorageChunk exactly the same way.

Out of scope (follow-up)

While digging I noticed two real bugs worth separate PRs:

  1. OpcodeStats.Sub/Add iterate other rather than s ∪ other (types.go:48,38) → per-tx blockCounts.Precompiles is always empty in sendTxs → per-tx actual gas is much lower than the gas value calcNumCalls uses → block utilization is far below the configured gas limit even when the require passes.
  2. Self-recalibrating numCallsPerBlock based on observed on-chain consumption would be more principled than the static safety multiplier and would also fix the base-mainnet-simulation @ 250M under-fill I documented in the gas-limit investigation. Likely a one-method addition to simulatorPayloadWorker.

Both are outside the scope of "make CI green again", which this PR does.

Draft because end-to-end validation requires one CI run.

The basic-benchmarks job in public-benchmarks.yaml has been failing on every
push to main since cf47296 (PR #184, May 15) with:
  execution reverted: Not enough accounts to load/update

Reproduces consistently for base-mainnet-simulation @ 25M gas (the first cell
of the matrix, run on both reth and geth). Fails roughly 7-8 benchmark blocks
into a 20-block run.

Root mechanism: PR #184 changed targetCalls from
  ceil(numCallsPerBlock * scaleFactor)  // pre-fix
to
  numCallsPerBlock                       // post-fix
to address an overshoot for scaleFactor > 1 (e.g. 200M gas case). For
scaleFactor < 1 (small gas limits like 25M), this raised on-chain consumption
within the existing 5% pre-init buffer. CI's base-mainnet-simulation @ 25M
sits right at the boundary where rounding accumulation + per-field interaction
overruns the 5% margin and trips the contract's
  current_address_index + load + update <= num_address_initialized
require well before block 20.

Bumping the multiplier to 2.0x removes the boundary entirely. Pre-init runs
during setup (one-time cost, dominated by mineAndConfirm batching), so this
adds maybe a few seconds of setup wall time per test cell — negligible vs the
~5 minutes per cell + 1h total job time.

Out of scope for this PR but noted for follow-up:
- OpcodeStats.Sub/Add iterate 'other' rather than the union of s and other,
  which causes per-tx blockCounts.Precompiles to always be empty in sendTxs.
  Result: per-tx actual gas is much lower than gas_per_call estimate, so the
  benchmark under-utilizes the gas budget. Not the cause of THIS CI failure,
  but should be fixed.
- The contract require is a benchmark-fidelity check, not a protocol
  constraint; making the worker recalibrate numCallsPerBlock from observed
  on-chain consumption would be a more principled fix than relying on a
  static safety multiplier.
@cb-heimdall
Copy link
Copy Markdown
Collaborator

🟡 Heimdall Review Status

Requirement Status More Info
Reviews 🟡 0/1
Denominator calculation
Show calculation
1 if user is bot 0
1 if user is external 0
2 if repo is sensitive 0
From .codeflow.yml 1
Additional review requirements
Show calculation
Max 0
0
From CODEOWNERS 0
Global minimum 0
Max 1
1
1 if commit is unverified 0
Sum 1

@meyer9
Copy link
Copy Markdown
Collaborator Author

meyer9 commented Jun 1, 2026

Superseded by #194 which combines this buffer bump with the OpcodeStats Add/Sub fix and observed-gas recalibration into one coordinated change.

@meyer9 meyer9 closed this Jun 1, 2026
@meyer9 meyer9 deleted the meyer9/fix-simulator-preinit-buffer branch June 1, 2026 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants