prefetcher: builder-phase prefetch + streaming worker pool#2192
Conversation
Replace per-call Prefetch() invocations with a long-running PrefetchStream that spans the block lifetime. A fixed worker pool pulls transactions from a channel, eliminating pool-startup overhead across the ~500 builder-mode and ~20 idle-mode calls previously issued per block. Idle and builder phases share the same stream, acting as swappable tx providers. Phase handoff uses a two-signal interrupt: hardKill for permanent exit and evmAbort for soft, per-phase aborts that discard in-flight idle work before the builder provider takes over. Prefetch(block, ...) keeps its original signature as a thin wrapper so blockchain.go behavior stays byte-identical.
…mers, builder contribution metric
Correlate tail latency with prefetch miss rate so we can tell whether the
slowest block txs are the ones we failed to prefetch or whether prefetch
quality is not the lever.
Three additions:
* slow_tx_tracker 10-min log: each of the top-K slowest txs is now
annotated inline with MGas/s and a prefetched/not-prefetched flag.
MGas/s uses integer math on nanoseconds — per-tx durations are in
the tens of microseconds and float seconds lose precision.
* worker/txApplyDuration/{prefetched,notPrefetched}: side-by-side p50/
p95/p99 of the two populations. The aggregate worker/txApplyDuration
stays for dashboard compatibility.
* worker/prefetch/builder_added_percent: share of block txs that the
builder phase prefetched, isolating the payoff of the new builder-
phase prefetch beyond the existing overall miss-rate metric.
Sizes the buffer to the idle provider's theoretical per-loop burst (IdleGasBudget / 21k ≈ 4762 at default config) so the scan never blocks on a full channel. Workers drain at ~200k tx/s so the channel sits mostly empty in practice; the buffer exists to absorb the brief scan-phase burst where idle dumps a heap's worth of txs. Memory cost: 4096 * 8 B + channel header ≈ 33 KB, negligible. Handoff drain is O(actual-occupancy), not capacity, so a larger buffer doesn't slow phase transitions.
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
@claude review once |
|
Claude finished @lucca30's task in 3m 52s —— View job Code Review: prefetcher builder-phase prefetch + streaming worker pool
Overall AssessmentThis is a well-structured PR that introduces a builder-phase prefetch path alongside the existing idle-phase prefetcher. The architecture is clean: a single long-lived worker pool shared across phases, with a soft-interrupt ( No critical or high-severity issues found. One medium finding on metric attribution accuracy, and several low/nit observations below. Medium1. Race window in builder-phase metric attribution (
|
…mports fixes Review feedback: the old onSuccess closure checked genParams.builderStarted.Load() to decide whether to attribute a successful prefetch to the builder phase. That check races with buildAndCommitBlock's builderStarted.Store(true), which fires before runPrefetcher reaches the handoff — an idle-phase tx whose EVM execution finishes in the gap between those two moments would be miscounted as builder. Impact was metric-only but the old comment overstated the guarantee. Route attribution through a dedicated inBuilderPhase *atomic.Bool that the coordinator flips to true only after the handoff completes (evmAbort drain + reset). Any onSuccess firing after that point is known to come from post-handoff work, so builder_added_percent now reflects genuine builder-phase contribution. Also fix goimports formatting in core/state_prefetcher.go and miner/worker.go flagged by CI lint.
Code reviewFound 1 issue. Checked for bugs and CLAUDE.md compliance. Bug: Spin loop in File: When The comment at line 2412 says Dropped sends (buffer full) are not retried -- but the current behavior is worse than not retrying: it burns through every remaining transaction doing useless work. Suggested fix: Add |
Review feedback: when the stream channel fills mid-batch, the default branch in streamIdleBatch was dropping the tx but still calling txs.Shift() and continuing to walk the heap. Since dropped sends don't subtract from the gas budget or populate localPrefetched, the outer loop's viability check (nextViableIdleTx) kept returning valid txs, so every remaining entry burned Peek + Shift cycles only to drop. Return on the default branch instead. The outer runIdleTxProvider loop re-snapshots the pool every ~100ms, by which time workers have drained the channel and there's capacity for new sends. Dropping one tx per batch and re-entering the outer loop is strictly cheaper than walking the rest of the heap dropping everything.
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
… resolves Review feedback on buildTxPlan / scanOverflow / collectPlanBatch: * buildTxPlan decremented remaining before calling ltx.Resolve(), so when a LazyTransaction failed to resolve (tx evicted between heap listing and resolution) the budget was consumed for a tx that never entered the plan. Reorder: charge gas only after Resolve succeeds, matching scanOverflow. The prefetched-skip path keeps consuming gas intentionally — those txs are still bound for the block. * scanOverflow and collectPlanBatch called prefetchedHashes.Load() without a nil guard, while the sibling buildTxPlan explicitly guarded. Production is safe because commitWork always initializes the map, but the API inconsistency is a real footgun for unit tests or future callers. Add the missing guards so all three functions accept a nil *sync.Map as "no hashes known, don't skip". No functional change in the current call paths; tests unchanged.
|
@claude Review it again. Do a careful analysis on the block building lifecycle, how prefetch works with block building. If any code introduced may lead to a panic, dead code or decreasing on performance. This PR aims to increase the window of prefetching by do also prefetching while block is building. Some mechanism were introduced like buildTxPlan to let us better predict the most probable txs to be introduced so we can improve our prefetch precision. |
|
Claude finished @lucca30's task in 9m 23s —— View job Deep Review: Builder-Phase Prefetch + Block Building Lifecycle
Note: The three findings from the prior reviews (buildTxPlan gas-before-resolve, nil guards, idle batch spin loop) have all been properly fixed in commits Block Building ↔ Prefetch Lifecycle TraceTraced the full flow to verify correctness of the two-phase design: Lifecycle verdict: The sequencing is sound. Channel creation precedes Panic Analysis
Findings1. 🔴 Worker goroutines in
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #2192 +/- ##
===========================================
+ Coverage 52.58% 52.68% +0.10%
===========================================
Files 885 885
Lines 156286 156673 +387
===========================================
+ Hits 82179 82544 +365
- Misses 68845 68863 +18
- Partials 5262 5266 +4
... and 22 files with indirect coverage changes
🚀 New features to boost your workflow:
|
Review feedback: worker goroutines in PrefetchStream ran without their own recover. Go's recover only catches panics in its own goroutine — the parent runPrefetcher's recover does not protect the children. Any panic in statedb.Copy / ApplyMessage / IntermediateRoot / precompile edge cases would therefore kill the node even though prefetching runs on a throwaway state copy and is best-effort by definition. Add defer/recover per worker (new chain/prefetch/worker/panic meter) and match it on the sendPlan goroutine in fillTransactions (reuses the existing worker/prefetch/panic meter). Also documents the residual attribution edge where a worker stuck in IntermediateRoot (not interruptible by evmAbort) can still reach onSuccess after inBuilderPhase flips. Metric-only impact, bounded by at most one tx per block; added a comment instead of a runtime fix since the window is tiny in practice and closing it would require a more invasive barrier.
There was a problem hiding this comment.
The block building lifecycle looks correct: channel setup happens before , prevents sends-after-close, and the defer on covers all exit paths. No panics, dead code, or correctness regressions found. The two new nit-level issues found by this review (gas limit overestimation in the second call, and a metrics overcount for interrupted txs) are attached as inline comments — they affect only prefetch efficiency and metric accuracy, not block building correctness. The spin loop and nil-guard issues from prior reviews remain unaddressed and worth fixing before merge.
Extended reasoning...
Overview
This PR introduces a builder-phase prefetch path over ~600 lines across , , , , and tests. The review focused on the block building lifecycle, potential panics, dead code, and performance regressions as requested.
Block Building Lifecycle
The sequencing is correct: creates and before calling , ensuring the prefetcher goroutine always finds a valid channel. before channel close prevents any send-to-closed-channel race from goroutines. The covers all exit paths from . No deadlock or goroutine leak paths were found.
New Issues Found (Nit-Level)
Two new nit-level bugs are attached as inline comments: (1) the second call uses the full block gas limit rather than the remaining gas after priority txs, causing to over-plan normal-tier txs that will be rejected — wasting prefetch worker capacity; (2) is incremented in before checks the interrupt flag, so early-abort txs inflate . Both are metrics/efficiency issues only; block building correctness is unaffected.
Security Risks
No security risks. The prefetch path operates on throwaway state copies and cannot affect committed block state. All channels are properly bounded (4096 buffer) with non-blocking sends, preventing DoS from a stuck consumer.
Level of Scrutiny
This is a production-critical path in the Bor block builder. The streaming worker pool, two-signal interrupt scheme, and idle→builder handoff are novel concurrency patterns that warrant human review before merge, especially given several issues identified across multiple review passes that remain unaddressed (spin loop in , nil guards in /, gas ordering in ).
…cution Round 4 of review fixes — four independent issues: * sendPlan gasLimit stale for the second call. The closure captured env.header.GasLimit once, but the second invocation (normal-tier txs) runs after commitTransactions has already consumed gas from env.gasPool. Plan was over-sized, wasting prefetch capacity on txs that the builder would reject. Thread gasLimit as an explicit argument and pass env.gasPool.Gas() on the second call (with a nil guard for the first, where env.gasPool is still nil). * prefetchOneTx early-interrupt return skipped fails.Add(1). txIndex is incremented unconditionally in processTx, so every interrupt-aborted tx was counted as a successful prefetch in blockPrefetchTxsValidMeter. The function docstring already promises fails is bumped on every (0,false) return; the interrupt path was the one branch that didn't. * runPrefetcher shutdown (evmAbort + close(txsCh) + <-streamDone) was sequential, so a panic in runIdleTxProvider or runBuilderTxProvider unwound past it. The PrefetchStream goroutine and its N workers would then block forever on `range txsCh` (hardKill is only checked after a dequeue). Wrap the close in a sync.Once-guarded closure and defer it so the channel is released on every exit path, panics included. * Builder phase could double-execute an in-flight plan tx. scanOverflow checked prefetchedHashes, which is only written by onSuccess after EVM completion — a tx forwarded in one batch but still mid-EVM when the next batch's overflow scan ran could be re-emitted and picked up by a second worker. Track locally-sent hashes in runBuilderTxProvider (single- threaded, plain map is fine) and have scanOverflow consult it alongside prefetchedHashes.
- scanOverflow: break instead of Pop on gas-too-large so accumulated
extendedBudget can cover price-leading accounts on later iterations;
popping permanently evicted them from the long-lived overflow heap.
- runBuilderTxProvider: populate sentThisPhase with the current plan
batch before scanOverflow so bonus collection can't re-emit a tx
already returned by collectPlanBatch in the same window.
- PrefetchStream worker recover: increment ctx.fails so the
{valid,invalid} meters stay consistent with txIndex when a panic
aborts prefetchOneTx mid-flight.
Covers the helpers that prior review rounds kept surfacing bugs in: - scanOverflow: zero budget, budget growth preserves popped accounts (regression test for the h.Pop() → break fix), in-flight skip without budget cost, prefetched skip without budget cost. - forwardTxs: sent-hash recording, dropped-on-full channel must not record, nil sentThisPhase is safe. - collectPlanBatch: closed planCh, timer-fires-on-empty, freed-gas accumulation, prefetched filter, closed gas channel nilled out. - runBuilderTxProvider: no-duplicate-forward invariant across interleaved planCh + freed-gas signals — regression test for the within-iteration dedup fix (sentThisPhase pre-population).
|
@claude ultra-review this PR. Context:
Please focus only on angles the above does not already cover: 1. Concurrency correctness. Trace every interaction between
2. Resource-accounting invariants, proven not just asserted:
3. Perf regression on the hot import path. Compare the new 4. Failure modes:
Scope exclusions:
|
|
Claude finished @lucca30's task in 5m 26s —— View job Ultra-Review: Streaming Prefetcher Concurrency & Correctness
All prior review findings (88c4e44..827c017) verified as addressed. This review covers only the four scoped areas. One actionable finding; remainder is clean. 1. Concurrency CorrectnessTraced all interactions between Lost-wakeup between Double-close of Goroutine stall ordering — Panic during handoff that skips Concurrent No concurrency issues found. 2. Resource-Accounting Invariants(a)
|
| Path | txIndex incremented? | fails incremented? | Counted? |
|---|---|---|---|
processTx → prefetchOneTx succeeds |
Yes (line 196) | No | valid |
processTx → prefetchOneTx interrupt |
Yes (line 196) | Yes (line 233) | invalid |
processTx → prefetchOneTx error (any) |
Yes (line 196) | Yes (lines 239, 247, 258) | invalid |
processTx → worker panic in prefetchOneTx |
Yes (line 196) | Yes (recover at line 137) | invalid |
runWorker → evmAbort=true → continue |
No | No | not counted (correct) |
runWorker → hardKill=true → return |
No | No | not counted (correct) |
At stream exit (line 147-149):
processed := ctx.txIndex.Load()
blockPrefetchTxsValidMeter.Mark(processed - ctx.fails.Load())
blockPrefetchTxsInvalidMeter.Mark(ctx.fails.Load())valid + invalid = (processed - fails) + fails = processed = txIndex. Invariant holds on all exit paths. ✅
(b) No tx hash reaches txsCh twice in a single runBuilderTxProvider invocation
Three dedup layers:
collectPlanBatch(line 2665-2666): checksprefetchedHashes.Load(tx.Hash())— skips txs already completed by idle/prior builder work.sentThisPhasepre-populated (lines 2600-2602) with current plan-batch hashes beforescanOverflowruns — prevents within-iteration plan-vs-overflow duplicates (fixed in 62a72c0).forwardTxs(line 2624) records each successfully-sent tx insentThisPhase— prevents cross-iteration re-emission.
The one gap: forwardTxs drops on a full txsCh buffer (default branch, line 2626-2627), but the pre-populate at lines 2600-2602 already added the hash to sentThisPhase unconditionally. A dropped plan tx still has a sentThisPhase entry, so it won't be re-emitted by scanOverflow in future iterations. However, a dropped bonus tx from scanOverflow was h.Shift()-ed out of the overflow heap (line 1950) and its hash was not pre-populated in sentThisPhase (only plan txs are pre-populated). If it was added to sentThisPhase by forwardTxs on the successful-send branch, it's tracked. If it was dropped, it's gone from the heap and not in sentThisPhase — it can't reappear. So no hash can reach txsCh twice. ✅
(c) builderPrefetchedTxHashes is only written while inBuilderPhase==true
onSuccess at line 2363: if inBuilderPhase.Load() && .... inBuilderPhase is set to true at line 2406, strictly after evmAbort.Store(false) at line 2403. The residual edge case (worker completing IntermediateRoot during the handoff, acknowledged in comments at lines 2351-2356) could misattribute at most 1 tx per block. Documented and acceptable. ✅
3. Perf Regression on Hot Import Path
The import path (blockchain.go:782) calls Prefetch(block, throwaway, vmCfg, false, followupInterrupt), which now wraps PrefetchStream via a pre-filled channel.
Strictly new work per tx on the import path vs the old errgroup code:
- 1 channel send (non-blocking, buffer == len(txs)) + 1 channel receive per tx
- 1
txIndex.Add(1)atomic increment (old code used loop variableidirectly) - 2 nil-checks (
evmAbort,onSuccess) that short-circuit immediately since both are nil
Total overhead: ~2-5 nanoseconds per tx. Negligible against statedb.Copy() + ApplyMessage + IntermediateRoot (~1-10ms each).
Structural improvement: Old code called statedb.Copy() in the dispatching goroutine before spawning work. New code calls statedb.Copy() inside the worker goroutine (prefetchOneTx line 244), allowing copies to happen concurrently — slightly lower latency under contention.
Worker pool size: Identical — max(1, 4*runtime.NumCPU()/5) in both old and new.
No performance regression on the import path. ✅
4. Failure Modes
builderPlanCh closed before runBuilderTxProvider starts
builderPlanCh is created at line 2223 and closed at line 2274, both in buildAndCommitBlock (main goroutine). runBuilderTxProvider runs in the prefetcher goroutine. The ordering: builderStarted.Store(true) (line 2225) → runIdleTxProvider exits (sees builderStarted=true) → handoff → runBuilderTxProvider starts.
Can close(builderPlanCh) at line 2274 fire before runBuilderTxProvider starts? Yes — if fillTransactions + planWg.Wait() completes before the prefetcher goroutine reaches line 2408. In that case, collectPlanBatch reads from a closed planCh and gets builderDone=true immediately (line 2660). runBuilderTxProvider returns immediately. Any plan txs were buffered in the 4096-deep channel and consumed; any freed-gas signals were buffered in the 256-deep builderGasFreedCh. These are lost. This is acceptable: the entire block was already built; prefetching for an already-committed block is pointless. ✅
Empty txPool at buildOverflowHeap time
buildOverflowHeap (line 2635-2641) calls Pending() which returns an empty map. newTransactionsByPriceAndNonce with empty txs returns an empty heap. scanOverflow immediately breaks at line 2917 (ltx == nil). No bonus txs emitted. No crash, no wasted work. ✅
Stale pool snapshot (overflow heap built at T, consumed at T+100ms after pool churned)
buildOverflowHeap is called once at the start of runBuilderTxProvider (line 2576). The overflow heap is then scanned repeatedly over the 2ms batch windows. Pool churn means:
- Txs in the overflow heap may have been evicted from the pool →
ltx.Resolve()returns nil →h.Pop()at line 1944, skipped gracefully. - New txs arriving after the snapshot are invisible to the overflow scan → missed opportunity, but prefetch is best-effort. The plan channel covers the primary path (builder's own view).
No crash, no incorrect behavior. Stale snapshot is a deliberate design tradeoff acknowledged in the PR. ✅
Concurrent hardKill + evmAbort set from different goroutines mid-phase
Already covered in Analysis 1. Both are atomic.Bool; no ordering dependency. Workers prioritize hardKill. The shutdown() function sets evmAbort=true then close(txsCh), both inside sync.Once. If hardKill=true races with shutdown(), workers exit on whichever signal they see first. ✅
🟡 One Remaining Unfixed Finding
prepareWork early-return bypasses defer interruptPrefetch.Store(true) — zombie prefetcher on prepareWork failure
File: miner/worker.go lines 2209-2228
func (w *worker) buildAndCommitBlock(..., interruptPrefetch *atomic.Bool) {
work, err := w.prepareWork(genParams, w.makeWitness)
if err != nil {
return // ← line 2212: exits BEFORE the defer at line 2228 is registered
}
...
defer interruptPrefetch.Store(true) // ← line 2228: too lateThe prefetcher goroutine (spawned unconditionally at line 2191) is never signaled to stop. It runs runIdleTxProvider which loops until gas exhaustion (totalGasPool.Gas() == 0), burning through the full header gas budget (~100M gas at default PrefetchGasLimitPercent=100) across 4*NumCPU/5 workers doing throwaway EVM executions.
This was flagged in the previous review round but not fixed.
Why it's bounded, not a leak: runPrefetcher's deferred shutdown() (via sync.Once) ensures txsCh is closed once runIdleTxProvider returns, so the PrefetchStream goroutine and workers self-terminate. No goroutine leak — just wasted CPU.
Production trigger: prepareWork → engine.Prepare can fail via UnauthorizedSignerError, Heimdall snapshot/span lookup failures, or makeEnv → stateless.NewWitness() errors. On a Giugliano primary producer, Prepare includes the succession-delay time.After — but the runPrefetcher call to makeHeader(..., false) skips this wait and starts streaming immediately, so the zombie is active for the full duration of the wasted work.
Impact: ~100M gas of throwaway EVM work across ~6 workers per prepareWork failure. No correctness/consensus impact, but real CPU waste on an uncommon-but-realistic error path. If prepareWork fails repeatedly (e.g., prolonged Heimdall issues), zombie prefetchers from successive commitWork calls compete for CPU.
Fix: Move the defer before the err check:
func (w *worker) buildAndCommitBlock(..., interruptPrefetch *atomic.Bool) {
defer interruptPrefetch.Store(true) // ← moved up: covers ALL exit paths
work, err := w.prepareWork(genParams, w.makeWitness)
if err != nil {
return
}Summary
| Area | Verdict |
|---|---|
| Lost-wakeup in handoff | Clean |
| Double-close of txsCh | Protected by sync.Once |
| Goroutine stall ordering | Clean — channel close unblocks all paths |
| Panic during handoff | sync.Once defer covers it; PrefetchStream goroutine self-terminates |
txIndex == valid + invalid |
Holds on all 6 exit paths (verified per-path) |
No-dup in runBuilderTxProvider |
Three dedup layers cover all reachable cases |
builderPrefetchedTxHashes attribution |
Correctly gated; 1-tx residual acknowledged |
| Import path perf | ~2-5ns overhead per tx; statedb.Copy() moved to worker (slight improvement) |
builderPlanCh closed early |
Graceful — collectPlanBatch sees closed channel, exits |
| Empty pool | Graceful — empty heap, no work |
| Stale snapshot | By design — Resolve() nil-check handles evicted txs |
| Concurrent kill+abort | Atomic stores, no ordering dependency, workers prioritize hardKill |
| prepareWork early-return | Unfixed — zombie prefetcher burns CPU until gas exhaustion |
The concurrency design is sound. The sync.Once shutdown, two-signal interrupt model, and channel lifecycle are correctly composed. The one actionable item is the defer placement in buildAndCommitBlock — a one-line move.
| Branch: lmartins/lmartins/builder-prefetch-mode-switch
Move `defer interruptPrefetch.Store(true)` to the top of buildAndCommitBlock so the prefetcher goroutine is signaled to stop on every return path — including the early return when prepareWork fails (UnauthorizedSignerError, Heimdall snapshot/span failures, stateless.NewWitness errors). Previously the defer sat after the err check, so on prepareWork failure runIdleTxProvider would loop until gas exhaustion, burning ~100M gas of throwaway EVM work across the worker pool before self-terminating. No correctness/consensus impact; fixes a real CPU-waste path on an uncommon-but-realistic error branch. Bounded by PrefetchStream's sync.Once shutdown (no goroutine leak), but the wasted work was material under prolonged Heimdall issues when consecutive commitWork calls could pile up zombie prefetchers.
Pass sentThisPhase into collectPlanBatch and skip txs already present, mirroring the existing scanOverflow dedup pattern. Closes the third edge of the builder-phase dedup matrix: 1. within-iter plan↔overflow — covered by pre-populate sentThisPhase before scanOverflow (commit 62a72c0) 2. plan→overflow cross-iter — covered by scanOverflow's sentThisPhase read (commit 88c4e44) 3. overflow→plan cross-iter — this commit Scenario: scanOverflow emits tx T in iteration N (h.Shift() past T, sentThisPhase[T]=true). Worker W1 begins multi-ms EVM on T. In iteration N+1, collectPlanBatch reads a buffered copy of T from planCh; prefetchedHashes is still empty because onSuccess hasn't fired, so T slips through and gets forwarded a second time. Impact is wasted worker capacity only (throwaway state, no consensus/correctness effect), but the race opens wider exactly on contract-heavy workloads where builder-phase prefetch matters most. Kurtosis run #2 scenario E didn't catch this because value-transfer EVM finishes in ~10µs, well inside the 2ms batch window — so prefetchedHashes raced fast enough to plug the gap. Added TestCollectPlanBatch_SkipsInflight as the regression test.
- hoist prefetch tunables (chan buf, idle loop interval, gas-pct default/cap) into the file's top const block - rename sentThisPhase to inFlightHashes — the variable tracks txs forwarded on txsCh whose onSuccess hasn't fired yet - promote sendPlan from a closure inside fillTransactions to a free function alongside buildTxPlan
The streaming prefetcher's per-tx IntermediateRoot call (introduced via the
intermediateRootPrefetch flag, hardcoded true) was found to add 80–130%
prefetch wall time for at most ~10% commit speedup (≈0.1 ms in absolute
terms). With snapshots active, EVM reads bypass the trie entirely, so the
only path to value is warming pebble's block cache for the subsequent
Commit. Under realistic clean-cache sizes that working set is already
resident, leaving the warming work redundant in nearly every regime.
Upstream go-ethereum's prefetcher likewise does not compute intermediate
roots — its Prefetch is (block, statedb, cfg, interrupt) only.
Flips the call site to false (parameter retained for API stability) and
checks in three Go benchmarks documenting the methodology and findings:
- TestIntermediateRootPrefetch_AccuracyVsCost: in-memory baseline,
three access patterns (hot 3-slot, unique-spread, per-sender counter)
- TestIntermediateRootPrefetch_PebbleAccuracyVsCost: real pebble disk
with prewarm + Commit, hashdb meter deltas
- TestIntermediateRootPrefetch_HeavyColdContract: 100k pre-populated
storage slots with a constrained clean cache, modelling fat contracts
that haven't been touched recently
Across all three, flag=true produced zero cache-hit benefit measurable in
StorageHitFromPrefetch / clean cache deltas, with consistent prefetch CPU
overhead. The residual ~10% commit speedup observed in the heavy-cold
regime is attributable to pebble block-cache warming and amounts to
≈0.13 ms per affected block.
…lict # Conflicts: # miner/worker_test.go
There was a problem hiding this comment.
An organization admin can view or raise the cap at claude.ai/admin-settings/claude-code. The cap resets at the start of the next billing period.
Once the cap resets or is raised, reopen this pull request to trigger a review.
…lict # Conflicts: # miner/worker.go
There was a problem hiding this comment.
An organization admin can view or raise the cap at claude.ai/admin-settings/claude-code. The cap resets at the start of the next billing period.
Once the cap resets or is raised, reopen this pull request to trigger a review.
Extract resolveEvmInterrupt helper so the evmAbort/hardKill fallback wiring is unit-testable. Add focused unit tests for preloadReaderForTx (bad-signature, contract-To code preload, EOA/nil-To) and streamIdleBatch (per-loop gas budget). Brings T1 logic and T2 semantic mutation kill rate to 100% on the diff.
Code reviewFound 1 issue. Checked for bugs and CLAUDE.md compliance. Latent nil-dereference in block production path (miner/worker.go:1963-1975) Line 1970 nil-checks Currently unreachable: both call sites in Per security-common.md: No panics in consensus, sync, or block production paths. Suggested fix -- add if builderPlanCh == nil || genParams == nil || plainTxs == nil { |
Drops a redundant uint64() conversion in the streamIdleBatch test and trims the trailing blank line goimports flagged.
The function nil-checked genParams before reading prefetchedTxHashes but then unconditionally dereferenced genParams.planWg.Add(1). Currently unreachable — callers in fillTransactions derive builderPlanCh from genParams.builderPlanCh so genParams is always non-nil — but the inconsistent guard is misleading and would panic on the block-production path if a future caller drops that invariant.
|



Purpose
Today's prefetcher only runs in a speculative idle phase: before block-building starts, it scans the tx pool and warms state for what it guesses the block might contain. Whatever it misses — late-arriving p2p txs, txs unlocked by freed gas, anything the pool view didn't reflect at guess-time — pays the full cache-miss cost at commit time. On contract-heavy blocks this is the tail of the miss-rate distribution.
This PR adds three more prefetch phases, synchronized to the live block build, so the prefetcher stops guessing and starts following what the builder is actually about to do. All three target exactly the gap the idle phase leaves: near-certain txs the builder will commit, warmed with near-zero speculation.
Prefetch lifecycle (after this PR)
One long-lived worker pool per block; four sequential tx providers feed it. The pool is never torn down between phases — the mode switch is a provider swap, not a prefetcher restart.
buildAndCommitBlockis still assembling its environmentcommitTransactionspass (priority + normal)sendPlanclones the price-nonce heap;buildTxPlanwalks it withremainingGas()as budget, emits every non-prefetched tx that fitscommitTransactions, right before each applybuilderPlanChas the builder reaches itltx.Gas − actualUseddelivered viabuilderGasFreedCh; overflow heap scanned when budget accumulatesPop()as "too large"All three builder phases dedup against
prefetchedTxHashesand asentThisPhaselocal set, so no tx is re-executed. The three-edge dedup matrix (plan↔overflow within iter, plan→overflow cross-iter, overflow→plan cross-iter) is fully closed.Handoff sequence (idle → builder)
Coordinated by a two-signal interrupt to avoid pool teardown:
builderStarted→ coordinator setsevmAbort.Store(true). In-flight idle EVM execution aborts via the EVM interrupt; workers entering the loop see the flag and skip.txsChnon-blockingly.evmAbort.Store(false). Workers resume, now fed by the builder provider (upfront-plan + per-tx + freed-gas overflow).No duplicate prefetches, no lost builder txs, no worker pool churn.
Precision impact
New headline metric:
worker/prefetch/builder_added_percent— fraction of a block's txs the builder phase (upfront plan + per-tx + overflow combined) warmed on its own, beyond what idle had already done. Attributes each prefetch completion to the phase it fired under, so operators can see at a glance how much of a block's cache warming idle alone could not have reached.Also split:
worker/txApplyDuration/{prefetched,notPrefetched}— quantifies the cache-miss penalty this lifecycle is closing.Together these two metrics answer: how many txs needed builder-phase help, and how much apply-time that help saved.
Implementation notes
range txsCh(buffer 4096, ≈33 KB); phase switches change producer only, not pool.Prefetch(block, ...)keeps its original signature as a thin wrapper aroundPrefetchStream. Same topology, same parallelism, samePrefetchResultshape.statedb.Copy()— no shared state, no consensus-path coupling.Validation
go build ./...+golangci-lint runcleango test -race ./core/... ./miner/...— 121s, all prefetch + slow-tx tests pass. Includes 13 new unit tests on the pure primitives (scanOverflow,forwardTxs,collectPlanBatch,buildTxPlan, no-duplicate-forward invariant across 10k iterations) and 3 stream-lifecycle integration tests.diffguard --base origin/develop— no new complexity violations.processed == fails + successfulin everyPrefetchStreamexit (2180+ closures).chain_prefetch_worker_panicmeter = 1050, harness log count = 1050 (exact match). Node kept producing blocks throughout.prepareWorkearly-return bypassingdefer interruptPrefetch.Store(true)) — fixed.