fix: stabilise stream tests on JDK 25 nightly (timeout scaling, element counts) by He-Pin · Pull Request #2869 · apache/pekko

He-Pin · 2026-04-18T15:31:30Z

Summary

Fixes 3 categories of stream-test flakiness observed on JDK 25 nightly CI (30+ consecutive days failing). See #2870 for root-cause analysis; see #2871 for the configuration-level mitigation.

Failing tests fixed

Test	Root cause	Fix
`HubSpec` – "must work with long streams" (×4 variants)	20 K elements × FJP FIFO delay = 60+ s timeout	Reduced element counts; remove `throttle` in favour of `Thread.sleep(1)`
`AggregateWithTimeBoundaryAndSimulatedTimeSpec`	`interval = 1.milli` races with scheduler	Changed to `interval = 1.second`
TCK `stochastic_spec103_mustSignalOnMethodsSequentially`	Hardcoded 1 s timeout ignores timefactor	Reads `pekko.test.timefactor` system property
`FlowMapAsyncPartitionedSpec` – "must ignore null-completed futures"	`Random.nextInt(10)` can produce 0, skipping null path	Shift to `Random.nextInt(10) + 1`

Files changed

stream-tests/…/HubSpec.scala — PatienceConfig scaled by timefactor; element counts reduced; throttle → sleep
stream-tests/…/AggregateWithBoundarySpec.scala — interval 1.milli → 1.second (2 locations)
stream-tests-tck/…/Timeouts.scala — read pekko.test.timefactor JVM property
stream-tests/…/FlowMapAsyncPartitionedSpec.scala — always exercise null-future path

ForkJoinPool compensation-thread regression (JDK-8300995) causes test flakiness on JDK 21+ #2870 — JDK-8300995 root cause tracking issue
feat: make ForkJoinPool minimum-runnable configurable and improve pool documentation #2871 — configuration-level improvements (LIFO dispatcher + configurable minimum-runnable)

Motivation: Nightly CI (JDK 25, TIMEFACTOR=3) has been failing consistently for 30+ days due to ForkJoinPool scheduling changes in JDK 25 causing slower throughput and higher scheduler overhead. Four root causes were found: 1. HubSpec.patience used a hard-coded Span(60, Seconds) that was never scaled by the test-timefactor, so the 60 s budget was exhausted on JDK 25 (needs 180 s with TIMEFACTOR=3). 2. AggregateWithTimeBoundaryAndSimulatedTimeSpec used interval = 1.milli with ExplicitlyTriggeredScheduler, which fired up to 400 000 timer callbacks per test-run (timePasses(400.seconds) × 1 ms steps), each requiring a scheduler lock acquisition on JDK 25. 3. TCK Timeouts (defaultTimeoutMillis / defaultNoSignalsTimeoutMillis) were hard-coded to 800 ms / 200 ms and never read the pekko.test.timefactor JVM property, causing stochastic_spec103_mustSignalOnMethodsSequentially to fail on JDK 25. 4. FlowMapAsyncPartitionedSpec."ignore null-completed futures" built the shouldBeNull set from Random.nextInt(10), which produces values 0-9. Because elements are 1-10, the value 0 can never match any element, so the set could be {0} – meaning no element ever returned null and the assertion was a non-deterministic no-op that failed on JDK 17 / Scala 3.3.x in CI. Modification: - HubSpec: multiply the 60 s base by testKitSettings.TestTimeFactor so CI with TIMEFACTOR=3 gets 180 s and TIMEFACTOR=2 gets 120 s. - AggregateWithTimeBoundaryAndSimulatedTimeSpec: change interval from 1.milli to 1.second in the gap and duration tests, reducing timer firings from ~400 000 to ~400 (still sufficient to trigger boundaries). - TCK Timeouts: read pekko.test.timefactor from JVM system properties and scale defaultTimeoutMillis / defaultNoSignalsTimeoutMillis. - FlowMapAsyncPartitionedSpec: replace the random shouldBeNull set with the fixed Set(2, 5, 8), whose values are all in the 1-10 element range, ensuring null filtering is actually exercised deterministically. Result: All four previously-failing test categories should pass on the next nightly run across JDK 17/21/25 × Scala 2.13/3.3. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Motivation: On JDK 25, ForkJoinPool scheduling changes cause increased actor dispatch latency. The original 20K-element long-stream tests reliably time out on JDK 25 CI (timefactor=3 → 180 s patience). Modification: - 'long streams' (buffer=16): 20K → 2K elements (2×1K sources) - 'buffer size is 1': 20K → 200 elements (2×100 sources); bufferSize=1 requires one actor round-trip per element, so count must stay small - 'consumer is slower': 2K → 400 elements; burst=200 covers first 200 elements with no scheduler ticks, keeping wall-clock time low - 'producer is slower': 2K → 400 elements; burst=200 on the throttled source (200 elements) means zero scheduler ticks needed, eliminating ForkJoinPool starvation risk on JDK 25 Result: All four tests now complete in under 100 ms on a loaded JDK 25 machine (burst=200 absorbs all throttled elements instantly; no timer callbacks are scheduled). Full HubSpec (48 tests) passes with timefactor=3. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

pjfanning

Lgtm

pjfanning · 2026-04-18T16:59:16Z

        maxDuration = None,
        currentTimeMs = schedulerTimeMs,
-        interval = 1.milli)
+        interval = 1.second)


I would prefer if this was a bit smaller, eg 500 Millis or 250 Millis but main thing is to try to get the tests passing

Motivation: The earlier nightly fixes solved the immediate JDK 25 failures, but two tradeoffs needed refinement. The mapAsyncPartitioned null test lost its randomness, and the HubSpec long-stream fixes needed to preserve as much coverage as possible while remaining stable under JDK 25 scheduling changes. Modification: - Restore randomness in FlowMapAsyncPartitionedSpec while shifting generated null candidates from 0..9 to 1..10 so the null path is always exercised. - Keep HubSpec patience scaled by test timefactor with a higher 120 s base. - Set plain MergeHub long-stream coverage to 2K elements and bufferSize=1 coverage to 200 elements based on measured JDK 25 limits. - Replace throttle-based slower-consumer/slower-producer timing with deterministic Thread.sleep-based slow paths, keeping those tests at 2K elements without relying on timer callbacks that are unstable on JDK 25. Result: HubSpec passes end-to-end with pekko.test.timefactor=3, and the null-completed futures test keeps its random coverage without silently skipping the null branch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Motivation: JDK 25 ForkJoinPool scheduling regression (JDK-8300995) causes slower task scheduling under load. timefactor=3 was insufficient for some long-running stream tests. Modification: Raise timefactor to 4 for JDK ≥ 25 in the nightly-builds workflow, updating the comment to also reference #2870. Result: Wider timeout budget on JDK 25 reduces spurious test failures caused by scheduling jitter rather than correctness issues. References: #2870, #2573 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

He-Pin and others added 2 commits April 18, 2026 23:30

pjfanning approved these changes Apr 18, 2026

View reviewed changes

pjfanning reviewed Apr 18, 2026

View reviewed changes

He-Pin mentioned this pull request Apr 18, 2026

ForkJoinPool compensation-thread regression (JDK-8300995) causes test flakiness on JDK 21+ #2870

Open

He-Pin force-pushed the fix-nightly-failures branch from 9669db0 to 2e3279a Compare April 18, 2026 18:31

He-Pin mentioned this pull request Apr 18, 2026

feat: make ForkJoinPool minimum-runnable configurable and improve pool documentation #2871

Merged

He-Pin changed the title ~~fix: scale stream test timeouts by timefactor to fix nightly CI on JDK 25~~ fix: stabilise stream tests on JDK 25 nightly (timeout scaling, element counts) Apr 18, 2026

He-Pin mentioned this pull request Apr 18, 2026

feat: enable virtualize=on in stream test dispatcher to bypass JDK 21+ FJP scheduling regression #2872

Closed

He-Pin merged commit b23b4c7 into main Apr 18, 2026
9 checks passed

He-Pin deleted the fix-nightly-failures branch April 18, 2026 19:02

He-Pin added the flaky Related to flaky tests label Apr 18, 2026

He-Pin added this to the 2.0.0-M2 milestone Apr 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stabilise stream tests on JDK 25 nightly (timeout scaling, element counts)#2869

fix: stabilise stream tests on JDK 25 nightly (timeout scaling, element counts)#2869
He-Pin merged 4 commits intomainfrom
fix-nightly-failures

He-Pin commented Apr 18, 2026 •

edited

Loading

Uh oh!

pjfanning left a comment

Uh oh!

pjfanning Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

He-Pin commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Failing tests fixed

Files changed

Related

Uh oh!

pjfanning left a comment

Choose a reason for hiding this comment

Uh oh!

pjfanning Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

He-Pin commented Apr 18, 2026 •

edited

Loading