test: fix InterpreterBenchmark so it produces trustworthy numbers by He-Pin · Pull Request #2985 · apache/pekko

He-Pin · 2026-05-21T06:41:45Z

Motivation

InterpreterBenchmark had two independent bugs that made its results unreliable, which becomes a problem the moment anyone wants to evaluate a GraphInterpreter-touching change against it.

The benchmark body was new GraphInterpreterSpecKit { new TestSetup { ... } }. Because that ran inside @Benchmark, every invocation built (and never tore down) a fresh ActorSystem. Long iterations exhausted native threads and JMH ended up reporting empty results once the JVM ran out of resources.
Inside the assembly we used GraphStages.identity[Int] once per slot, but GraphStages.identity is a singleton whose Inlet/Outlet shape is shared across every reference. Chaining N copies (numberOfIds = 5/10) collapses to a single shape and mis-wires the connections; the run logged a flood of Cannot pull port twice errors and ended up reporting nonsense throughput (5/10-stage configs faster than the 1-stage one).

Modification

Make InterpreterBenchmark itself extend GraphInterpreterSpecKit so JMH's @State(Scope.Benchmark) lifecycle reuses one ActorSystem across invocations, and add @TearDown(Level.Trial) to terminate it cleanly.
Define a local IdentityStage[T] with its own Inlet/Outlet per instance and use Vector.fill(numberOfIds)(new IdentityStage[Int]) so each slot in the chain is a distinct stage with a distinct shape.

No changes to production code — this PR only fixes the benchmark.

Result

The benchmark now runs to completion without leaking actor systems and produces stable, monotonic numbers (throughput decreases as numberOfIds grows, as expected). This restores it as a usable baseline for subsequent GraphInterpreter work.

JMH on this branch (JDK 25, G1, single thread, -i 5 -wi 3 -f 1 -t 1):

Benchmark                                             (numberOfIds)   Mode  Cnt      Score      Error   Units
InterpreterBenchmark.graph_interpreter_100k_elements              1  thrpt    5  45238.227 ± 3143.242  ops/ms
InterpreterBenchmark.graph_interpreter_100k_elements              5  thrpt    5  10526.376 ±  151.239  ops/ms
InterpreterBenchmark.graph_interpreter_100k_elements             10  thrpt    5   5350.558 ±  192.965  ops/ms

Pre-fix the 5/10-stage rows were both higher than the 1-stage row (i.e. wrong direction) because the singleton-shape bug meant the chain wasn't actually N stages long.

This is a benchmark-correctness fix, not a performance improvement. There is no production-code change here.

Tests

sbt 'bench-jmh/compile'
sbt 'bench-jmh/headerCheck; bench-jmh/scalafmtCheck'
sbt 'bench-jmh/Jmh/run -i 5 -wi 3 -f 1 -t 1 -rf json -rff /tmp/jmh.json .*InterpreterBenchmark.*' — completes cleanly, scores above.

References

None - benchmark-only fix surfaced while preparing to evaluate GraphInterpreter micro-optimizations against a trustworthy baseline.

@benchmark

Motivation: The previous shape `new GraphInterpreterSpecKit { new TestSetup { ... } }` ran inside @benchmark, so each invocation built (and never tore down) a fresh ActorSystem. Long iterations exhausted native threads and JMH reported empty results once the JVM ran out of resources. Modification: Make the benchmark class itself extend GraphInterpreterSpecKit so JMH's @State(Scope.Benchmark) lifecycle reuses one ActorSystem across all invocations. Add @teardown(Level.Trial) to terminate it cleanly. Result: The benchmark now runs to completion and produces stable numbers, which is a prerequisite for measuring follow-up GraphInterpreter optimizations. Tests: sbt 'bench-jmh/compile'

Motivation: GraphStages.identity is a singleton whose Inlet/Outlet shape is shared across every reference. Chaining N copies into the assembly (numberOfIds = 5/10) collapses to a single shape and mis-wires the connections, which surfaced as a runtime "Cannot pull port twice" error spam during the benchmark and produced nonsense throughput numbers (5/10 stages reported as faster than 1). Modification: Define a local IdentityStage class with its own Inlet/Outlet per instance and use Vector.fill(numberOfIds)(new IdentityStage[Int]). Result: The benchmark wires N distinct stages and produces stable, monotonic numbers (throughput decreases as numberOfIds grows, as expected). Tests: sbt 'bench-jmh/compile'

pjfanning

lgtm

@benchmark

…ter chase hot path (#2986) * test: stop leaking ActorSystem in InterpreterBenchmark per invocation Motivation: The previous shape `new GraphInterpreterSpecKit { new TestSetup { ... } }` ran inside @benchmark, so each invocation built (and never tore down) a fresh ActorSystem. Long iterations exhausted native threads and JMH reported empty results once the JVM ran out of resources. Modification: Make the benchmark class itself extend GraphInterpreterSpecKit so JMH's @State(Scope.Benchmark) lifecycle reuses one ActorSystem across all invocations. Add @teardown(Level.Trial) to terminate it cleanly. Result: The benchmark now runs to completion and produces stable numbers, which is a prerequisite for measuring follow-up GraphInterpreter optimizations. Tests: sbt 'bench-jmh/compile' * test: use per-instance IdentityStage in InterpreterBenchmark Motivation: GraphStages.identity is a singleton whose Inlet/Outlet shape is shared across every reference. Chaining N copies into the assembly (numberOfIds = 5/10) collapses to a single shape and mis-wires the connections, which surfaced as a runtime "Cannot pull port twice" error spam during the benchmark and produced nonsense throughput numbers (5/10 stages reported as faster than 1). Modification: Define a local IdentityStage class with its own Inlet/Outlet per instance and use Vector.fill(numberOfIds)(new IdentityStage[Int]). Result: The benchmark wires N distinct stages and produces stable, monotonic numbers (throughput decreases as numberOfIds grows, as expected). Tests: sbt 'bench-jmh/compile' * optimize: skip afterStageHasRun no-op finalize check in chase hot path Motivation: GraphInterpreter's chase loops dominate hot-path CPU in steady state — JMH stack profiling on InterpreterBenchmark attributes ~50% of stream-related samples to the two while loops at execute:449 / execute:460. Every chase iteration calls afterStageHasRun(activeStage), which in steady state always reads shutdownCounter(activeStage.stageId) and the per-stage finalized flag only to discover the stage has not just completed and skip the body. That is a per-event array load + null check + branch on the hottest path with no work to do, which the JIT cannot fold away because the array is mutable shared state. Modification: Track pendingFinalization: Boolean on the interpreter, set when a stage's shutdownCounter decrements to 0 in completeConnection or transitions to 0 when KeepGoing is cleared in setKeepGoing. Gate the three hot-path afterStageHasRun calls in execute() (post normal-dispatch and the two chase loops) on the flag, resetting it before the call so cascaded completions during finalization re-arm the flag correctly. The slow-frequency callers (init, runAsyncInput) are left untouched. Result: JMH on InterpreterBenchmark (JDK 25, G1, single thread, -i 5 -wi 3 -f 1 -t 1): numberOfIds baseline (ops/ms) with patch (ops/ms) delta 1 45238 ± 3143 50952 ± 4784 +12.6% 5 10526 ± 151 11242 ± 288 +6.8% (CIs disjoint) 10 5350 ± 193 5927 ± 173 +10.8% (CIs disjoint) Allocation rate stays at ~0.6 B/op — no GC impact. All stream-tests pass. Tests: - sbt 'stream/compile' - sbt 'stream/mimaReportBinaryIssues' - clean - sbt 'stream-tests/testOnly *fusing*' - 159 tests, all passed - sbt 'stream-tests/testOnly *Flow*Spec' - 1208 tests, all passed - sbt 'bench-jmh/Jmh/run -i 5 -wi 3 -f 1 -t 1 .*InterpreterBenchmark.*' - numbers above References: Refs #2985 - benchmark fix used to obtain trustworthy JMH numbers above.

He-Pin added 2 commits May 21, 2026 12:51

pjfanning approved these changes May 21, 2026

View reviewed changes

He-Pin merged commit 956d478 into apache:main May 21, 2026
9 checks passed

He-Pin deleted the perf-graph-interpreter-jit-friendly branch May 21, 2026 09:14

He-Pin mentioned this pull request May 21, 2026

optimize: skip afterStageHasRun no-op finalize check in GraphInterpreter chase hot path #2986

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: fix InterpreterBenchmark so it produces trustworthy numbers#2985

test: fix InterpreterBenchmark so it produces trustworthy numbers#2985
He-Pin merged 2 commits into
apache:mainfrom
He-Pin:perf-graph-interpreter-jit-friendly

He-Pin commented May 21, 2026

Uh oh!

pjfanning left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

He-Pin commented May 21, 2026

Motivation

Modification

Result

Tests

References

Uh oh!

pjfanning left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants