Skip to content

Blockstm redesign#5

Open
cffls wants to merge 10 commits into
developfrom
blockstm_redesign
Open

Blockstm redesign#5
cffls wants to merge 10 commits into
developfrom
blockstm_redesign

Conversation

@cffls
Copy link
Copy Markdown
Owner

@cffls cffls commented Apr 28, 2026

Description

Please provide a detailed description of what was done in this PR

Changes

  • Bugfix (non-breaking change that solves an issue)
  • Hotfix (change that solves an urgent issue, and requires immediate attention)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (change that is not backwards-compatible and/or changes current functionality)
  • Changes only for a subset of nodes

Breaking changes

Please complete this section if any breaking changes have been made, otherwise delete it

Nodes audience

In case this PR includes changes that must be applied only to a subset of nodes, please specify how you handled it (e.g. by adding a flag with a default value...)

Checklist

  • I have added at least 2 reviewer or the whole pos-v1 team
  • I have added sufficient documentation in code
  • I will be resolving comments - if any - by pushing each fix in a separate commit and linking the commit hash in the comment reply
  • Created a task in Jira and informed the team for implementation in Erigon client (if applicable)
  • Includes RPC methods changes, and the Notion documentation has been updated

Cross repository changes

  • This PR requires changes to heimdall
    • In case link the PR here:
  • This PR requires changes to matic-cli
    • In case link the PR here:

Testing

  • I have added unit tests
  • I have added tests to CI
  • I have tested this code manually on local environment
  • I have tested this code manually on remote devnet using express-cli
  • I have tested this code manually on amoy
  • I have created new e2e tests into express-cli

Manual tests

Please complete this section with the steps you performed if you ran manual tests for this functionality, otherwise delete it

Additional comments

Please post additional comments in this section if you have them, otherwise delete it

…g-loop

(feat): disable pending block creation loop via flag
cffls added a commit that referenced this pull request May 1, 2026
…rnal review

An external reviewer found six issues in V2's correctness/operability
surface. Fixes for each, plus targeted regression tests.

#1 (CRITICAL) — V2 swallowed ApplyMessage errors. applyMessage at
core/parallel_state_processor.go:735 ignored execErr when result==nil,
so a tx with a consensus-level error (bad nonce, intrinsic gas under-
flow, insufficient upfront gas, blob fork-gating violation, etc.)
settled as a zero-gas successful no-op. Serial returns the error and
aborts the block (state_processor.go:222). V2 now records execErr on
the PDB, the settle path skips the tx, and Process surfaces the error
to BlockChain so it can fall back to serial — same behaviour as the
panicked-PDB path. Test: TestV2StateProcessor_ApplyMessageErrorFailsBlock.

#2 (CRITICAL) — SelfDestruct not published to MVStore. FlushToMVStore
wrote nonces, storage, code, created, balance deltas, but never the
destructed set. Cross-tx readers saw destroyed accounts as still alive
with stale code/storage/nonce. Pre-EIP-6780 chains: tx B reading a
just-destroyed account got base-state values; SetStorageDirectWithOrigins
at settle time would resurrect the account. Fix: publish destructions
under SuicidePath (the same flag V1 already uses on its MVHashMap), and
gate Exist/GetCode/GetCodeHash/GetState/GetCommittedState/GetNonce on
priorDestructed so cross-tx reads return defaults. priorDestructed is
cached per-tx so the four getters share one MVStore lookup per address.
Test: TestPDB_CrossTxSelfDestructVisibility.

#3 (HIGH) — V2 receipts had zero BlockHash. buildV2Receipt didn't set
BlockHash and passed common.Hash{} to GetLogs. Receipt-trie consensus
was unaffected (BlockHash is not in the consensus encoding) but RPC
consumers got 0x000…0 for blockHash on V2-processed blocks. Thread
block.Hash() through ExecuteV2BlockSTM → newV2SettleFn → buildV2Receipt
and into GetLogs. Test: TestV2StateProcessor_ReceiptHasBlockHash.

#4 (HIGH) — V2 executor ignored cancellation. core/blockstm/v2_executor.go
had no context plumbing, so when serial won the parallel-vs-serial
race and BlockChain called cancel(), V2 ran to completion (~50–200ms)
before the import could continue; if V2 hung, the import couldn't
return. Add ctx.Context to ExecuteV2BlockSTM, plumb it through to the
dispatcher and validation loop, check at task-boundary and validation
boundaries. Updated the misleading "<1ms" comment in blockchain.go.
Test: TestExecuteV2BlockSTM_HonoursCancellation.

#5 (MEDIUM) — numWorkers <= 0 deadlocked the executor. The dispatcher
window collapsed to 0 and the very first task waited forever on an
execDone channel no worker would close (v2_executor.go:355). Clamp
to runtime.NumCPU() in NewV2StateProcessor with a comment explaining
the failure mode. Test: TestNewV2StateProcessor_ClampsNumWorkers.

0xPolygon#6 (LOW, comment-only) — Biased pathdb cache lock removal. The Has →
Set race exists but is benign because reader.Node hash-checks every
cache hit (Verkle-only noHashCheck doesn't apply to Bor). The previous
comment claimed "self-corrects on the next disk read" — actually it
self-corrects via the hash check in reader.go:72. Tightened the
comment.

Verified: ./core/, ./core/state/, ./core/blockstm/ tests pass; the V2
backbone TestV2BlockSTMAllBlocks passes (165s).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cffls cffls force-pushed the blockstm_redesign branch from 3f4731e to ed6bcf2 Compare May 5, 2026 05:06
Introduces BlockSTM v2 — a from-scratch redesign of Bor's parallel
transaction execution engine. V2 speculatively executes block
transactions concurrently, validates each tx's reads against a
multi-version store, and re-executes any whose reads turned stale.
On the 241-block mainnet witness benchmark V2/4w delivers ~1.6×
throughput over serial (570 mgas/s vs 350 mgas/s, AMD Ryzen 7 5800H,
all-in-memory).

V2 runs three coordinated goroutine groups around a per-tx PDB:

  V2StateProcessor.Process    (core/parallel_state_processor.go)
        │
  ExecuteV2BlockSTM           (core/blockstm/v2_executor.go)
        │
  ┌─────┼──────────────┬──────────────┐
  │     │              │              │
  Workers (N)     Validator (1)   Settlement (1)
  ParallelStateDB StoreReads      finalDB
                  BalReads        IntermediateRoot

  Backed by:
    SafeBase       Thread-safe base reads (sync.Map caches over a
                   bounded pool of StateDB.Copy() with concurrent-
                   reads mode on the trieReader)
    MVStore        Sharded multi-version per-key store with a
                   lock-free bloom filter for cold-key reads
    MVBalanceStore Sharded commutative balance delta store
                   (per-tx Add/Sub; reads sum prior entries)

1. Task building. Block transactions become V2Tasks. Same-sender
   chains get pre-computed nonces (SenderNonces) so nonce reads
   on a chain are skipped during validation.

2. Parallel execution. N worker goroutines pull tasks from a
   buffered dispatcher (window numWorkers * InFlightTaskMultiplier).
   Each tx runs in its own ParallelStateDB; reads come from
   SafeBase + MVStore + MVBalanceStore and are recorded in
   StoreReads / BalReads. Writes accumulate locally (DeferMVWrites)
   and flush to MVStore at end-of-tx so concurrent readers only
   ever see FINAL values — never mid-tx reentrancy-guard writes.

3. Sequential validation. A single goroutine validates txs in
   tx-index order. Each recorded read is re-checked against MVStore;
   match by writer/incarnation OR by value-equal fallback (handles
   idempotent writes such as reentrancy-guard SSTOREs that flip
   back). Mismatch → MarkEstimate the failed tx's writes and
   dispatch a re-execution goroutine. Per-key pipelining: readers
   that hit an ESTIMATE entry under Incarnation > 0 block on
   WaitForFinal until the upstream writer is finalized.

4. Pipelined settlement. As txs finalize, a settlement goroutine
   drains chSettle in tx-index order and applies each tx's writes
   to finalDB (the real, single-threaded *state.StateDB) through a
   *Direct setter family that bypasses the journal, then asks
   finalDB for the IntermediateRoot.

V2 is gated on a layered test surface. From cheapest to most
expensive, and what each layer is meant to catch:

1. Compile-time conformance + drift detection
   The PDB shadows StateDB's interface and behaviour, so any
   upstream go-ethereum merge that adds or changes a StateDB
   method would silently bypass V2. A handful of `go test`-time
   checks fail CI before any logic runs:
     - core/vm/statedb_impl_test.go     (PDB satisfies vm.StateDB
                                         via a static assertion)
     - TestPDBMethodParity              (every StateDB method has
                                         a PDB mapping or is in
                                         pdbExemptMethods)
     - TestV2DependencyCompileCheck     (every StateDB method V2
                                         settle calls remains present)
     - TestV2JournalEntryCoverage       (every journal entry kind has
                                         a parallelJournalEntry mapping)
     - TestV2TracingHookParity          (every tracing.Hooks field is
                                         classified as fired-or-skipped)
     - TestV2ForkParity                 (every params.ChainConfig.IsX
                                         fork rule is classified V1/V2)

2. Per-method unit tests (~210 tests across ~25 files)
   Cover individual PDB getters/setters, MVStore / MVBalanceStore
   primitives, V2 executor channel mesh, and SettleTo helpers.
   Highlights:
     - core/state/parallel_statedb_test.go         (76 tests; PDB
                                                    behaviour + the
                                                    Tier-1 mutation
                                                    kill suite — see
                                                    layer 5 below)
     - core/state/parallel_statedb_coverage_test.go (42 tests;
                                                    branch coverage)
     - core/state/parallel_statedb_getter_table_test.go (every PDB
                                                    getter records
                                                    its read with
                                                    the right WriterIdx
                                                    across Committed /
                                                    ESTIMATE / NoEntry /
                                                    AtTxZero)
     - core/state/safe_base_test.go                (sync.Map cache +
                                                    pool semantics)
     - core/blockstm/mvstore_test.go,
       core/blockstm/mvbalance_store_test.go       (versioned store
                                                    primitives)
     - core/blockstm/v2_executor_wait_test.go      (waitForTx /
                                                    waitForFinal +
                                                    cancellation)

3. Direct-setter parity tests
   The *Direct setter family bypasses StateDB's journal at settle
   time. core/state/v2_direct_setter_parity_test.go (7 tests) pins
   that SetXDirect produces a byte-identical state root to journaled
   SetX + Finalise. Catches divergence the moment a future change
   to either path breaks the parity.

4. Differential tests vs serial StateDB
   Hand-written + table-driven scenarios that exercise the PDB
   against a parallel-mirror serial StateDB and assert byte-identical
   output. Catches behaviour drift the parity-table tests can't
   express:
     - core/state/v2_differential_test.go          (PDB-only diff)
     - core/state/v2_executor_differential_test.go (synthetic-env
                                                    executor diff)
     - core/v1_differential_test.go                (V1 vs serial
                                                    parity for the
                                                    legacy in-tree path)

5. Mutation testing (Tier-1 kill tests)
   diffguard runs mutation testing against V2's critical paths.
   Every survivor flagged by a sample run has a corresponding
   targeted test inline in core/state/parallel_statedb_test.go
   under the "Tier-1 mutation kill tests" divider — boundary,
   negation, and return-value mutations on storeReadMatches,
   journal revert, settleTo helpers, applyFeeData, Reset, etc.
   Tier-1 logic kill-rate ≥ 99% on the latest run.

6. Fuzz targets
   Randomized inputs against either a serial mirror or a hand-built
   reference:
     - core/state/v2_fuzz_test.go                (random PDB op
                                                  sequences vs StateDB)
     - core/state/v2_executor_fuzz_test.go       (executor-level fuzz
                                                  on synthetic env)
     - core/v2_serial_parity_fuzz_test.go        (FuzzV2ExecutorVsSerial:
                                                  random tx batches
                                                  through ExecuteV2BlockSTM
                                                  vs an ApplyMessage loop)
   The race-detected fuzz under `-race` caught the shared-trie-reader
   race that the non-race fuzz missed; worth keeping on the nightly.

7. End-to-end consistency + benchmark on real mainnet blocks
   core/mainnet_witness_benchmark_test.go bundles 241 real Polygon
   mainnet blocks (under core/blockstm/testdata/) with their pre-
   block witnesses. Two harnesses share the corpus:
     - TestV2BlockSTMAllBlocks (gated on BOR_BLOCKSTM_TEST=1)
       replays each block through both serial and V2 and asserts
       byte-identical state roots and receipt roots.
     - BenchmarkV2AllBlocks runs serial + V2 across worker counts
       (4 / 8 / 16) and witness-on/off variants on the same corpus.
       Backs the throughput numbers referenced at the top of this
       commit.

8. Runtime invariants under -tags=invariants
   Build-tag-gated runtime assertions inside the executor and the
   PDB. Off in production builds (zero-cost), on in CI:
     - assertSettleOrder              (validation walk's induction)
     - assertReexecVisitedExactlyOnce (drain loop doesn't lose a tx)
     - assertSettleNotPanicked        (panicked PDBs never settle)
   A tiny set of "panic if invariant breaks" tests under
   //go:build invariants verifies the assertions actually fire on
   crafted violations (core/blockstm/v2_executor_invariants_panic_test.go,
   core/state/parallel_statedb_invariants_panic_test.go).

9. Race detector
   All of layers 2-8 are runnable under `go test -race`. CI runs
   the full state + blockstm packages in race mode; the
   TestV2BlockSTMAllBlocks gated test is also race-clean on the
   241-block corpus.

10. Production soak — >1 million Polygon mainnet blocks
    Beyond the unit / parity / fuzz layers above, this branch has
    been used to sync more than 1,000,000 mainnet blocks end-to-end
    on a real node with V2 as the primary processor (with serial
    disabled). Zero state-root divergences, zero panics
    requiring fallback, no consensus-affecting issues observed.
    This is the most stringent layer: real on-chain workload,
    real database backend, real prefetcher contention.

  - intermediateRootTimer metric (chain/intermediateroot) — measures
    the post-execution trie computation in block_validator.go.

The code surface is ~5.1k lines across 39 production .go files,
plus ~11.7k lines across 37 test files. The remaining 484 file
entries in the diff are block + witness fixtures under
core/blockstm/testdata used by TestV2BlockSTMAllBlocks and the
benchmark harness — read-only data, no review needed.

Shapes of change a reviewer should expect:

  - New per-tx state. ParallelStateDB shadows *state.StateDB but
    reads from SafeBase + MVStore + MVBalanceStore and tracks reads
    for validation. Implements vm.StateDB. Has its own journal
    layer (parallelJournalEntry) parallel to StateDB's journal.go.

  - New concurrent stores. MVStore (sharded multi-version per-key
    store with bloom filter) and MVBalanceStore (sharded
    commutative balance deltas) — both new, both load-bearing.

  - New executor. ExecuteV2BlockSTM owns the worker pool +
    in-order validator + pipelined settle goroutine and the
    chSettle / completionCh / execDone channel mesh between them.

  - Concurrent-safe base reads. SafeBase is a thread-safe wrapper
    around a *state.StateDB with sync.Map caches + a bounded pool
    of db.Copy() instances; the pool copies share the underlying
    reader, so the V2 entry point flips trieReader into its
    concurrent-reads mode (sync.Map node-resolve cache instead of
    in-place mutation) — this required surgery in state/database.go,
    state/reader.go, state/trie_prefetcher.go, trie/trie.go,
    trie/secure_trie.go, triedb/pathdb/reader.go, and
    triedb/pathdb/biased_fastcache.go.

  - *Direct setter family on StateDB. Bypass the journal at
    settle time so V2 can replay per-tx PDB writes onto finalDB
    deterministically. Pinned byte-equal to journaled SetX +
    Finalise by TestDirectSetterParity_*.

  - Production fallback. BlockChain wires V2 as the primary
    processor and falls back to serial on panics, ApplyMessage
    consensus errors, ctx cancellation, and witness requests.

Tier 1 — load-bearing executor + per-tx state:

  core/blockstm/v2_executor.go              (+631 new)
  core/parallel_state_processor.go          (+925 V2StateProcessor,
                                             settle-fn closure, env)
  core/state/parallel_statedb.go            (+1147 new)
  core/state/parallel_statedb_validate.go   (+223 new)
  core/state/parallel_statedb_settle.go     (+195 new)
  core/state/parallel_statedb_journal.go    (+127 new)
  core/state/safe_base.go                   (+207 new)

Tier 2 — concurrent stores:

  core/blockstm/mvstore.go                  (+186 new)
  core/blockstm/mvbalance_store.go          (+175 new)

Tier 3 — modified upstream files (highest merge-conflict risk):

  core/state/statedb.go                     (Direct setters,
                                             skipTimers, concurrent
                                             reads enabler)
  core/state/state_object.go                (concurrent-safe getters)
  core/state/database.go                    (concurrent reader)
  core/state/reader.go                      (cache attribution)
  core/state/trie_prefetcher.go             (concurrent prefetch)
  trie/trie.go, trie/secure_trie.go         (concurrent-reads mode)
  triedb/pathdb/reader.go                   (sync.Map node-resolve
                                             cache for concurrent
                                             reads; small lock changes)
  triedb/pathdb/biased_fastcache.go         (lock semantics)
  core/vm/evm.go, jumpdests.go,             (jumpdest cache sharing,
       instructions.go, interface.go,        precompile-cache,
       interpreter.go                        StateDB iface adds)
  core/blockchain.go                        (V2 wiring + fallback)
  core/state_transition.go                  (interrupt plumbing)

Tier 4 — drift-detection tests (read these to understand the
parity contract V2 must hold against StateDB):

  core/state/v2_method_parity_test.go       (every StateDB method
                                             has a PDB mapping)
  core/state/v2_journal_entry_coverage_test.go
                                            (every journal kind has
                                             a parallel mapping)
  core/state/v2_direct_setter_parity_test.go
                                            (SetXDirect ↔ journaled)
  core/state/parallel_statedb_getter_table_test.go
                                            (every PDB getter
                                             records its read)
  core/parallel_state_processor_hooks_parity_test.go
                                            (tracing.Hooks fire-or-
                                             skip classification)
  core/parallel_state_processor_fork_parity_test.go
                                            (params.IsX classification)
  core/v2_serial_parity_fuzz_test.go        (real-tx executor fuzz
                                             vs serial)
  core/mainnet_witness_benchmark_test.go    (gated 241-block end-
                                             to-end consistency +
                                             benchmark harness)

See docs/blockstm-v2.md for full architectural detail, the list of
correctness bug classes V2 prevents, and ongoing-improvement notes.
@cffls cffls force-pushed the blockstm_redesign branch from ed6bcf2 to 268e976 Compare May 5, 2026 05:15
cffls and others added 8 commits May 5, 2026 11:14
Cosmetic / micro-perf fixes from the claude[bot] review of 0xPolygon#2210.
No behaviour change.

* state/statedb.go SubBalanceDirect — add a comment noting the
  uint256.Sub wrap matches the journaled SubBalance path
  (statedb.go:922) and that TestDirectSetterParity_SubBalance pins
  byte-equality between the two. A defensive panic was suggested
  but would diverge from the journaled path and break the parity
  test, so we keep the documentation-only.

* state_transition.go SenderInitBalance — drop the inline IIFE nil
  check; input1 is GetBalance(...) which returns a value type,
  never nil. Straight input1.ToBig() matches the idiom used
  elsewhere in the function.

* vm/evm.go runEcrecoverWithCache — drop the redundant
  RightPadBytes(input, 128) allocation. The [128]byte key is
  zero-initialised, so copy(key[:], input) achieves the same
  result without the extra heap allocation. Caller already
  guarantees len(input) <= 128.

* vm/instructions.go opKeccak256 — replace size.SetBytes(cached.
  (common.Hash).Bytes()) with size.SetBytes32(h[:]) to skip the
  per-cache-hit Bytes() allocation on the SHA3 fast path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 241-block witness fixtures under core/blockstm/testdata/ are
managed via Git LFS (~1.6 GB total). On a fresh clone or a CI
runner that hasn't run `git lfs pull`, the .block / .witness.gz
files are LFS pointer text stubs rather than the real data, and
gzip.NewReader fails with "gzip: invalid header" — exactly what
the unit-tests CI workflow has been hitting.

Detect LFS pointers in readFileMaybeGz via the canonical
"version https://git-lfs.github.com/spec/" prefix and surface a
sentinel errLFSPointer error. loadEmbeddedBlocks and
loadBlocksFromDir then call t.Skipf instead of t.Fatalf when the
fixtures aren't materialized — the harness skips cleanly with a
helpful message ("run `git lfs pull` to materialize testdata")
instead of producing confusing gzip errors.

This is the same prerequisite called out in
docs/blockstm-v2.md → "Test data (Git LFS)".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The lint workflow on 0xPolygon#2210 flagged 15+ issues across V2 files. Fix
each so `make lint` is clean. No behaviour change in production
paths.

* goimports — formatting on ~12 files (idiomatic import grouping
  + blank-line alignment that the in-tree gofmt missed).
* unused — drop dead code:
    - executeWithParallelStateDBV2 + ValidatingParallelStateDB.checkBalance
      in core/mainnet_witness_benchmark_test.go (debug shims, never
      called).
    - timedMockV2State.execDelay + timedMockV2Env.fails in
      core/blockstm/v2_executor_test.go (vestigial fields).
    - ParallelStateDB.priorDestructed convenience wrapper (callers
      use priorDestructedAt).
    - opSubRefund / opWarmAddress diff ops in
      core/state/v2_differential_test.go (no scenario references them).
* copyloopvar — drop the redundant `x := x` loop-variable copies
  across 8 test files (Go 1.22+ no longer needs them).
* unconvert — drop the `time.Duration(result.Phase1)` cast (Phase1
  is already time.Duration) and the `JumpDestCache(newMapJumpDests())`
  cast (already satisfies the interface).
* durationcheck — fix `timeAfter(seconds time.Duration)` in
  core/blockstm/mvbalance_store_test.go: callers passed an int and
  the multiplication `seconds * time.Second` is a duration*duration
  bug. Make the parameter `int` and cast inside.
* copylocks — `*statedb = *backupStateDB` in V1's
  maybeRerunWithoutFeeDelay copies a struct holding atomic.Int64.
  This is single-threaded V1 rerun-from-snapshot; tag with
  `//nolint:govet` and a comment.
* whitespace — drop a leading blank line in v2Env.Execute that
  golangci-lint flagged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three issues flagged in the inline PR review of 0xPolygon#2210.

* core/blockchain.go: V2-failure fallback recovery was broken.
  cancel() ran BEFORE the `if result.parallel && result.err != nil`
  block, so when V2 finished first with an error (panic, ApplyMessage
  consensus error) the still-running serial processor was interrupted
  at its next tx boundary and the fallback `result = <-resultChan`
  received context.Canceled instead of a usable serial result.
  Documented as the recovery contract in the PR description.

  Move cancel() and followupInterrupt.Store(true) to AFTER the
  fallback block. The fallback (when V2 errors with V1 also running)
  gets a real serial result. Once we have the result we plan to
  return, cancel the loser so it stops at its next tx boundary
  before commit advances the pathdb layer (the original intent).

  See review comment r3187036031.

* core/blockchain.go: drop unused AblationSkip* fields from
  BlockChain. Four exported boolean fields (AblationSkipFlush /
  AblationSkipSettle / AblationSkipFinalise / AblationSkipMVRead)
  were declared but never read or written anywhere — repo-wide grep
  confirms zero references outside the declaration site. The
  intended bridge from these BlockChain-level toggles to the per-
  block MVHashMap.Skip* fields (which ARE wired) was never threaded
  through, so flipping the BlockChain field was a silent no-op.
  Exported fields enter the API surface, so keeping them locks us
  into either a SemVer-breaking removal or maintainer confusion;
  drop them now and re-introduce as wired knobs in a separate
  change if/when the ablation experiments need a runtime entry
  point.

  See review comment r3187036037.

* core/blockstm/mvhashmap.go: bloom h2 dimension was constant zero
  for the hottest key class. h2 read bytes [20:24] of Key, which are
  populated only for state keys; NewAddressKey leaves [20:52] zero
  and NewSubpathKey leaves [20:51] zero. h3 also half-degraded for
  those classes ([28:32] zero). Result: address-only and subpath
  reads collapsed the bloom from 3-of-3 to ~2-of-3, which doubles
  the false-positive rate at typical block sizes (~0.07% → ~0.35%
  at 1k unique keys). No correctness impact, just hot-path
  selectivity.

  Re-derive h2 from address bytes [16:20] (always populated) and
  fold the subpath/type bytes [52][53] into h3 so all three hashes
  draw from non-constant ranges for every key class. Updated the
  comment to reflect the new layout.

  See review comment r3187036040.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three more issues flagged in the inline PR review of 0xPolygon#2210.

* core/parallel_state_processor.go: V2 was silently dropping the
  stateless-witness pointer. ProcessBlock wires the witness via
  parallelStatedb.StartPrefetcher("chain", witness, nil), but inside
  V2.Process the prefetcher is restarted with a hard-coded nil for
  the witness slot — and StateDB.StartPrefetcher unconditionally
  overwrites s.witness, so every s.witness != nil-gated collection
  point (CollectStateWitness, CollectCodeWitness, settle-phase trie
  walks) became a no-op for the rest of execution. On
  StatelessSelfValidation and single-block makeWitness paths the
  produced witness landed empty with no error.

  Fix: stash finalDB.Witness() before StopPrefetcher and pass it
  through to the v2-settle prefetcher restart, so the wired pointer
  survives the swap.

  See review comment r3191282978.

* core/state/parallel_statedb.go: SelfDestruct skipped recordWrite
  for the SuicidePath key. FlushToMVStore writes
  (SuicidePath_addr, txIdx, inc, true) for every entry in
  s.destructed, but the key was never appended to s.WriteKeys, so
  MVStore.MarkEstimate / CleanupEstimate could not reach it on
  re-execution. Cross-incarnation invalidation was broken: a stale
  SuicidePath entry from incarnation N survived into N+1's view, and
  a downstream tx that observed it via priorDestructedAt could pass
  validation against state that no longer exists — a state-root
  divergence path. Every other MVStore-targeting writer (SetNonce,
  SetCode, SetState, CreateAccount) already calls recordWrite for
  the same reason; only the destruct path was missed.

  Fix: call s.recordWrite(NewSubpathKey(addr, SuicidePath)) inside
  the !s.destructed[addr] guard, matching the journal-entry guard so
  repeated SelfDestruct in the same tx doesn't append a duplicate.
  Pinned by TestPDB_SelfDestruct_RecordsSuicidePathWrite.

  See review comment r3191282996.

* core/blockchain.go: V2 reader cache hit/miss stats were silently
  dropped. setupBlockReaders called ReadersWithCacheStatsTriple to
  create three independent ReaderWithStats wrappers (prefetch /
  process / parallel) and wired the parallel one into
  parallelStatedb, but reportReaderStats only consumed prefetch and
  process. V2's reads accumulated in the parallel wrapper's atomic
  counters and were discarded each block — and since V2 is the
  primary processor in production, the chain/state/account /
  storage/cache/{hit,miss} meters were essentially empty on the hot
  path.

  Fix: thread parallel through setupBlockReaders' return signature
  and into reportReaderStats. process and parallel both carry the
  roleProcess label and share the same underlying cache, so merge
  their counters into the same meter set rather than introducing a
  new "process_parallel" series.

  See review comment r3191283003.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The diffguard run on commit 4c688e4 flagged 35 surviving mutations
(score 77.7%, T1 logic 84.5%). Triage classified seven as
HIGH-severity — branch / boolean / conditional mutations on V2's
per-tx correctness paths. Add five targeted Tier-1 kill tests for
the five that are deterministically killable:

* TestPDB_EnableReadTracking_InitializesBalAddrs
  Pins the `s.BalAddrs == nil` guard at parallel_statedb.go:335.
  Flipping == to != silently skips the make() on a fresh PDB,
  leaving cap=0 instead of the documented 8 — every recordBalWrite
  reallocates. Test asserts cap >= 8 after EnableReadTracking.

* TestPDB_PriorDestructedAt_RecordsAbsenceRead
  Pins the else-if branch at parallel_statedb.go:531. Removing the
  body drops the absence read recordStoreRead(suicideKey, -1, 0,
  nil); without it, validation can't catch a concurrent prior tx
  destructing addr. Test asserts the absence read appears in
  StoreReads AND that subsequent MVStore writes flip validation
  to invalid.

* TestPDB_Exist_DestructedInBaseReturnsFalse
  Pins the `if suicideIdx >= 0 { ... }` branch at
  parallel_statedb.go:576. Removing the body lets a destructed
  addr fall through to base.Exist and incorrectly return true
  when the account was set up in base. The test seeds the base
  StateDB with code on addr (so the fallthrough path is
  observable) and asserts Exist returns false after a SuicidePath
  write.

* TestPDB_CreateAccount_WritesTrueValue
  Pins the literal `true` at parallel_statedb.go:1014
  (CreateAccount → store.WriteInc). Flipping it to false would
  publish (CreatePath_addr, txIdx, inc, false), defeating the
  value-based fallback in storeReadMatches. Test reads the MVStore
  entry and asserts the value is true.

* TestPDB_DiagnoseBalanceRead_MatchReturnsFalse
  Pins the `false` literal at parallel_statedb_validate.go:215.
  Flipping to true would have a matching balance read produce a
  phantom diagnostic with zero-valued fields; DiagnoseValidation
  would aggregate these as empty "" -category diags. Test asserts
  len(diags) == 0 on a matching read.

The other two HIGH survivors are documented as unkillable in
their current form:

* parallel_statedb.go:751 GetCodeHash `if len(code) == 0` —
  removing the early-return falls through to
  crypto.Keccak256Hash(empty), which equals types.EmptyCodeHash by
  spec. Behaviourally equivalent; can't be killed without locking
  in an internal performance signal.

* v2_executor.go:586 runValidationLoop `cancelled = true` after
  ctx-cancel — the mutation's observable effect (drain runs on
  cancel) is timing-dependent because reexec goroutines exit
  promptly via ctx.Done() in waitForTx/waitForFinal regardless,
  so the post-loop drain completes either way. A deterministic
  kill needs a redesign of the cancel handling.

Each new test was verified to:
  - PASS on unmutated code,
  - FAIL on the corresponding mutated code (sed/python
    in-place mutation, run, restore).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- core/blockstm/mvhashmap.go: drop unused MVHashMap.SkipFlush field. The
  three sibling ablation toggles are wired through the V2 path; SkipFlush
  alone has no consumer, and as an exported field on a public type it
  would be SemVer-impactful to remove later. Same dead-scaffolding pattern
  removed from BlockChain.AblationSkip* in 4c688e4.

- core/parallel_state_processor.go: stop discarding the prefetcher-warmed
  shared JUMPDEST cache. v2Env.Execute previously called
  evm.SetJumpDestCache(e.jumpDests) unconditionally, overriding the
  shared cache that vm.NewEVM had just wired from vmConfig. Now the
  override (and the per-v2Env allocation) are gated on the absence of
  vmConfig.SharedJumpDestCache: production paths via ProcessBlock keep
  the prefetcher's analysis; benchmarks and single-block witness paths
  that bypass that wiring still get the per-v2Env fallback.

- core/blockstm/v2_executor.go: wrap the two predecessor receives in
  v2ExecCtx.execute (completionCh[k-1] and execDone[prev]) with ctx
  selects, mirroring waitForTx/waitForFinal. Without these, a worker
  that entered execute past the vfailed[k-1] predicate could hang
  forever on cancellation — runValidationLoop deliberately skips the
  finishReexec drain on cancel, so completionCh[k-1] is never closed,
  wg.Wait blocks indefinitely, and the V2 driver, worker pool,
  validator/settlement, MVStore/MVBalanceStore/SafeBase, and parallel
  StateDB all leak. Early return from execute is safe — the worker
  loop unconditionally closes execDone[taskIdx], unblocking cascading
  downstream waiters.

Benchmarks (V2Embedded, n=10): -2.63% on mean (t=-3.90, statistically
significant); V2AllBlocks 241 blocks neutral within noise. Production
prefetcher-cache reuse not measurable in current benchmark harnesses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…spec-tests sweep

Drove V2 BlockSTM through the full Ethereum execution-spec-tests v5.1.0
blockchain corpus (67,032 subtests) and chased every V2-only divergence.
Mixed in are the production nil-guards required for V2 (and serial) to
run at all on non-Bor chain configs — without them the spec-test fixtures
panic in state_transition.execute on the burnt-contract dereference.

Final result: V2 matches serial pass-for-pass on all 67,032 subtests, up
from 1,676 passes / 253 V2-only divergences at the start of the sweep.
@cffls cffls force-pushed the blockstm_redesign branch from a38aa86 to e7aaf2d Compare May 9, 2026 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants