perf(evm): delegate inline arithmetic/stack opcodes to doExecute handlers#483
Conversation
59cf42e to
6c8a4c2
Compare
⚡ Performance Regression Check Results✅ Performance Check Passed (interpreter)Performance Benchmark Results (threshold: 25%)
Summary: 194 benchmarks, 3 regressions ✅ Performance Check Passed (multipass)Performance Benchmark Results (threshold: 25%)
Summary: 194 benchmarks, 0 regressions |
There was a problem hiding this comment.
Pull request overview
Refactors the computed-goto interpreter fast path to delegate previously inlined arithmetic/logic and stack-manipulation opcodes to centralized handler/helper implementations, aiming to simplify the hot dispatch loop and align behavior with existing opcode handler code.
Changes:
- Replaced inline implementations of binary/ternary/unary arithmetic & logic opcodes with
HANDLER_CALL(...::doExecute())delegation. - Replaced inline stack ops (
POP,PUSH0,PUSHn,DUPn,SWAPn) withexecute*OpcodeNoGas(...)helpers in the computed-goto path. - Added an explicit revision gate in
CLZtarget before delegating to the handler.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| TARGET_ADD: | ||
| HANDLER_CALL(AddHandler::doExecute()); | ||
| TARGET_MUL: |
There was a problem hiding this comment.
HANDLER_CALL advances Pc unconditionally (via ++Pc inside the macro) before checking Context.getStatus(). With the new delegation, doExecute() can set error statuses (e.g., via EVM_STACK_CHECK), which means the interpreter may record the program counter after the faulting opcode in the computed-goto fast path. This differs from the other interpreter paths in this file (which break before incrementing Frame->Pc on error) and from the previous inline implementation here. Consider changing the HANDLER_CALL flow so Pc is only advanced when Context is still EVMC_SUCCESS (or otherwise keep PC reporting consistent on failure).
| Frame->Sp = sp; | ||
| Frame->Pc = Pc; | ||
| executePopOpcodeNoGas(Frame, Context); | ||
| sp = Frame->Sp; | ||
| ++Pc; | ||
| if (INTX_UNLIKELY(Context.getStatus() != EVMC_SUCCESS)) | ||
| goto cgoto_error; |
There was a problem hiding this comment.
In the new stack-op delegation, Pc is incremented before checking Context.getStatus(). If executePopOpcodeNoGas() reports an error (e.g., stack underflow), the computed-goto fast path will record Frame->Pc as the next instruction when jumping to cgoto_error, which is inconsistent with the non-computed-goto loops in this file (they check status before Frame->Pc++). Consider only incrementing Pc after confirming EVMC_SUCCESS, to keep error PC/state consistent across interpreter paths.
| TARGET_CLZ : { | ||
| if (INTX_UNLIKELY(sp < 1)) { | ||
| Context.setStatus(EVMC_STACK_UNDERFLOW); | ||
| if (INTX_UNLIKELY(Revision < EVMC_OSAKA)) { | ||
| Context.setStatus(EVMC_UNDEFINED_INSTRUCTION); | ||
| goto cgoto_error; | ||
| } | ||
| auto &A = Frame->Stack[sp - 1]; | ||
| A = intx::clz(A); | ||
| ++Pc; | ||
| DISPATCH_NEXT; | ||
| HANDLER_CALL(ClzHandler::doExecute()); |
There was a problem hiding this comment.
TARGET_CLZ is already gated by the per-revision computed-goto dispatch table (cgoto_table is selected by Revision and opcodes missing from evmc_get_instruction_names_table() map to TARGET_UNDEFINED). The extra runtime Revision < EVMC_OSAKA check here is therefore redundant on this fast path and adds an always-false branch to a hot opcode. Consider removing this check (or relying on the dispatch table) to avoid unnecessary overhead.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // ---- Inline stack ops ---- | ||
| // ---- Binary arithmetic/logic ops (delegate to doExecute) ---- | ||
| TARGET_ADD: | ||
| HANDLER_CALL(AddHandler::doExecute()); |
There was a problem hiding this comment.
Why were inline blocks used here originally? If we switch back to a handler function, are there any performance costs? It seems this was previously refactored from a handler to inline code. We should verify that the macro + function approach is effectively as efficient as the inlined logic
There was a problem hiding this comment.
We benchmarked this on ethereum blockchain cases; no measurable overhead from the handler call form vs. inlined version.
After rebasing onto current upstream/main (which now includes DTVMStack#458 / DTVMStack#460 / DTVMStack#482 / DTVMStack#483 perf work) and running a 10-rep evmone-bench on the 27 paper benches, the cumulative PR delta has collapsed to noise (raw geomean +1.15%, +0.46% after correcting a single-iteration outlier on main/blake2b_shifts/8415nulls via a focused 20-rep re-measurement). 0 benches above the +/-25% CI gate. The A-vs-PR-base -2.73% from this commit's own optimization is unchanged; the framing shift is that the absolute runtime delta of the whole PR vs unmodified main has been absorbed by the intervening upstream perf optimizations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…perf evidence - README.md: status Proposed → Implemented; tick all checklist items (Phase 1–4 done, gates green, perf measured); update Phase 3 header to reflect both non-lifted and lifted-block factory wiring; correct Findings "Future work" note now that lifted-block plumbing (2ebfd29) has landed; bump white-box test count 39 → 40. - Append "What shipped (2026-05-12)" section: 12-commit grouping, four-way perf comparison table (pre-rebase / post-rebase / D1 / lifted-plumbing), summary of decision path. - Track perf-run evidence directories from the rebase decision cycle: bench-2026-05-07/ (original PR data), perf-2026-05-11/ (pre-rebase re-run on current upstream, −1.97% geomean), perf-2026-05-11-rebased/ (post-rebase, −0.57%), perf-2026-05-11-no-483/ (D1 DTVMStack#483-revert hypothesis test, +0.95%), perf-2026-05-11-lifted/ (final lifted-block plumbing, +0.34%). Each directory has summary.md plus raw JSON + analyze.py + run_aba.sh + console captures for full reproducibility. - Track pr493-body-final.md: source-of-truth for the PR description pushed to GitHub via `gh pr edit`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lers
1. Does this PR affect any open issues?(Y/N) and add issue references (e.g. "fix #123", "re #123".):
2. What is the scope of this PR (e.g. component or file name):
3. Provide a description of the PR(e.g. more details, effects, motivations or doc link):
4. Are there any breaking changes?(Y/N) and describe the breaking changes(e.g. more details, motivations or doc link):
5. Are there test cases for these changes?(Y/N) select and add more details, references or doc links:
6. Release note