test: fix InterpreterBenchmark so it produces trustworthy numbers#2985
Merged
Conversation
Motivation:
The previous shape `new GraphInterpreterSpecKit { new TestSetup { ... } }` ran
inside @benchmark, so each invocation built (and never tore down) a fresh
ActorSystem. Long iterations exhausted native threads and JMH reported empty
results once the JVM ran out of resources.
Modification:
Make the benchmark class itself extend GraphInterpreterSpecKit so JMH's
@State(Scope.Benchmark) lifecycle reuses one ActorSystem across all
invocations. Add @teardown(Level.Trial) to terminate it cleanly.
Result:
The benchmark now runs to completion and produces stable numbers, which is a
prerequisite for measuring follow-up GraphInterpreter optimizations.
Tests:
sbt 'bench-jmh/compile'
Motivation: GraphStages.identity is a singleton whose Inlet/Outlet shape is shared across every reference. Chaining N copies into the assembly (numberOfIds = 5/10) collapses to a single shape and mis-wires the connections, which surfaced as a runtime "Cannot pull port twice" error spam during the benchmark and produced nonsense throughput numbers (5/10 stages reported as faster than 1). Modification: Define a local IdentityStage class with its own Inlet/Outlet per instance and use Vector.fill(numberOfIds)(new IdentityStage[Int]). Result: The benchmark wires N distinct stages and produces stable, monotonic numbers (throughput decreases as numberOfIds grows, as expected). Tests: sbt 'bench-jmh/compile'
He-Pin
added a commit
that referenced
this pull request
May 21, 2026
…ter chase hot path (#2986) * test: stop leaking ActorSystem in InterpreterBenchmark per invocation Motivation: The previous shape `new GraphInterpreterSpecKit { new TestSetup { ... } }` ran inside @benchmark, so each invocation built (and never tore down) a fresh ActorSystem. Long iterations exhausted native threads and JMH reported empty results once the JVM ran out of resources. Modification: Make the benchmark class itself extend GraphInterpreterSpecKit so JMH's @State(Scope.Benchmark) lifecycle reuses one ActorSystem across all invocations. Add @teardown(Level.Trial) to terminate it cleanly. Result: The benchmark now runs to completion and produces stable numbers, which is a prerequisite for measuring follow-up GraphInterpreter optimizations. Tests: sbt 'bench-jmh/compile' * test: use per-instance IdentityStage in InterpreterBenchmark Motivation: GraphStages.identity is a singleton whose Inlet/Outlet shape is shared across every reference. Chaining N copies into the assembly (numberOfIds = 5/10) collapses to a single shape and mis-wires the connections, which surfaced as a runtime "Cannot pull port twice" error spam during the benchmark and produced nonsense throughput numbers (5/10 stages reported as faster than 1). Modification: Define a local IdentityStage class with its own Inlet/Outlet per instance and use Vector.fill(numberOfIds)(new IdentityStage[Int]). Result: The benchmark wires N distinct stages and produces stable, monotonic numbers (throughput decreases as numberOfIds grows, as expected). Tests: sbt 'bench-jmh/compile' * optimize: skip afterStageHasRun no-op finalize check in chase hot path Motivation: GraphInterpreter's chase loops dominate hot-path CPU in steady state — JMH stack profiling on InterpreterBenchmark attributes ~50% of stream-related samples to the two while loops at execute:449 / execute:460. Every chase iteration calls afterStageHasRun(activeStage), which in steady state always reads shutdownCounter(activeStage.stageId) and the per-stage finalized flag only to discover the stage has not just completed and skip the body. That is a per-event array load + null check + branch on the hottest path with no work to do, which the JIT cannot fold away because the array is mutable shared state. Modification: Track pendingFinalization: Boolean on the interpreter, set when a stage's shutdownCounter decrements to 0 in completeConnection or transitions to 0 when KeepGoing is cleared in setKeepGoing. Gate the three hot-path afterStageHasRun calls in execute() (post normal-dispatch and the two chase loops) on the flag, resetting it before the call so cascaded completions during finalization re-arm the flag correctly. The slow-frequency callers (init, runAsyncInput) are left untouched. Result: JMH on InterpreterBenchmark (JDK 25, G1, single thread, -i 5 -wi 3 -f 1 -t 1): numberOfIds baseline (ops/ms) with patch (ops/ms) delta 1 45238 ± 3143 50952 ± 4784 +12.6% 5 10526 ± 151 11242 ± 288 +6.8% (CIs disjoint) 10 5350 ± 193 5927 ± 173 +10.8% (CIs disjoint) Allocation rate stays at ~0.6 B/op — no GC impact. All stream-tests pass. Tests: - sbt 'stream/compile' - sbt 'stream/mimaReportBinaryIssues' - clean - sbt 'stream-tests/testOnly *fusing*' - 159 tests, all passed - sbt 'stream-tests/testOnly *Flow*Spec' - 1208 tests, all passed - sbt 'bench-jmh/Jmh/run -i 5 -wi 3 -f 1 -t 1 .*InterpreterBenchmark.*' - numbers above References: Refs #2985 - benchmark fix used to obtain trustworthy JMH numbers above.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
InterpreterBenchmarkhad two independent bugs that made its results unreliable, which becomes a problem the moment anyone wants to evaluate aGraphInterpreter-touching change against it.new GraphInterpreterSpecKit { new TestSetup { ... } }. Because that ran inside@Benchmark, every invocation built (and never tore down) a freshActorSystem. Long iterations exhausted native threads and JMH ended up reporting empty results once the JVM ran out of resources.GraphStages.identity[Int]once per slot, butGraphStages.identityis a singleton whoseInlet/Outletshape is shared across every reference. Chaining N copies (numberOfIds = 5/10) collapses to a single shape and mis-wires the connections; the run logged a flood ofCannot pull port twiceerrors and ended up reporting nonsense throughput (5/10-stage configs faster than the 1-stage one).Modification
InterpreterBenchmarkitself extendGraphInterpreterSpecKitso JMH's@State(Scope.Benchmark)lifecycle reuses oneActorSystemacross invocations, and add@TearDown(Level.Trial)to terminate it cleanly.IdentityStage[T]with its ownInlet/Outletper instance and useVector.fill(numberOfIds)(new IdentityStage[Int])so each slot in the chain is a distinct stage with a distinct shape.No changes to production code — this PR only fixes the benchmark.
Result
The benchmark now runs to completion without leaking actor systems and produces stable, monotonic numbers (throughput decreases as
numberOfIdsgrows, as expected). This restores it as a usable baseline for subsequentGraphInterpreterwork.JMH on this branch (JDK 25, G1, single thread,
-i 5 -wi 3 -f 1 -t 1):Pre-fix the 5/10-stage rows were both higher than the 1-stage row (i.e. wrong direction) because the singleton-shape bug meant the chain wasn't actually N stages long.
This is a benchmark-correctness fix, not a performance improvement. There is no production-code change here.
Tests
sbt 'bench-jmh/compile'sbt 'bench-jmh/headerCheck; bench-jmh/scalafmtCheck'sbt 'bench-jmh/Jmh/run -i 5 -wi 3 -f 1 -t 1 -rf json -rff /tmp/jmh.json .*InterpreterBenchmark.*'— completes cleanly, scores above.References
None - benchmark-only fix surfaced while preparing to evaluate
GraphInterpretermicro-optimizations against a trustworthy baseline.