Skip to content

perf: Stage 6 sumcheck micro-optimizations#85

Merged
MatteoMer merged 1 commit into
mainfrom
perf/stage6-sumcheck-microopts
Apr 17, 2026
Merged

perf: Stage 6 sumcheck micro-optimizations#85
MatteoMer merged 1 commit into
mainfrom
perf/stage6-sumcheck-microopts

Conversation

@MatteoMer
Copy link
Copy Markdown
Owner

@MatteoMer MatteoMer commented Apr 17, 2026

Summary

  • Lift RaPolynomial tag dispatch out of inner loops: Both RamRaVirtualProver and LookupsRaVirtualProver mapFn bodies now dispatch on the first ra_poly's active tag once via inline else, generating specialized loop bodies per variant. Inner loops use @field(..., @tagName(tag)).getBoundCoeff() instead of the tagged union's switch-based dispatch, eliminating per-access tag checks in release mode.
  • Add has_nulls branchless fast path: RaPolynomial compressed rounds (Round1/2/3) now track a has_nulls: bool flag, computed during initRound1 and propagated through bind transitions. When false (common for instruction lookups), getBoundCoeff skips optional index checks entirely.

Context: flamegraph analysis of sha256_2048 showed Stage 6 at 7.2% of total prover time. These are diminishing-returns micro-optimizations — the implementation already closely matches Jolt's approach.

Benchmark (sha256_2048, 10 runs, ReleaseFast, Apple Silicon)

Metric Before (ms) After (ms) Delta Change
Mean 2814 2672 -142 -5.1%
Median 2759 2651 -108 -3.9%
P25 2690 2607 -83 -3.1%
Min 2582 2540 -43 -1.6%
Stdev 188 105

Improvement is larger than the conservative 0.5–1% estimate — the inline else dispatch likely enables LLVM to further optimize the inner loops (inlining, register allocation) once the variant is statically known. Reduced stdev (188→105) suggests fewer branch mispredictions.

Test plan

  • zig build test — all 539 tests pass
  • zig build -Doptimize=ReleaseFast — clean build
  • Before/after benchmark on sha256_2048 (10 runs each)

🤖 Generated with Claude Code

Eliminate per-access tagged union dispatch in the innermost loops of
RamRaVirtualProver and LookupsRaVirtualProver by switching once on the
first ra_poly's active tag and generating specialized loop bodies via
inline else. Also add a has_nulls flag to RaPolynomial compressed rounds
so getBoundCoeff can skip optional index checks when all indices are
non-null (common for instruction lookups).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@MatteoMer MatteoMer merged commit 11ed864 into main Apr 17, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant