Skip to content

Scale MapAsyncPartitionedSpec patience with test timefactor#2884

Merged
He-Pin merged 2 commits intoapache:mainfrom
He-Pin:hepin/mapasyncpartitioned-jdk25-timefactor
Apr 22, 2026
Merged

Scale MapAsyncPartitionedSpec patience with test timefactor#2884
He-Pin merged 2 commits intoapache:mainfrom
He-Pin:hepin/mapasyncpartitioned-jdk25-timefactor

Conversation

@He-Pin
Copy link
Copy Markdown
Member

@He-Pin He-Pin commented Apr 22, 2026

Summary

  • scale MapAsyncPartitionedSpec ScalaFutures patience using the configured classic test timefactor
  • keep the typed-stream property suite aligned with nightly JDK 25 timeout dilation
  • remove the fixed 60 second patience assumption that can time out on slower CI runners

Testing

  • sbt scalafmtAll
  • `sbt -Dpekko.test.timefactor=4 -Dpekko.actor.testkit.typed.timefactor=4 'stream-typed-tests/testOnly org.apache.pekko.stream.MapAsyncPartitionedSpec'

JDK 25 nightly runs can take longer to complete the property-based ordered mapAsyncPartitioned checks. Use the configured test timefactor when deriving ScalaFutures patience so the suite tracks CI dilation instead of timing out at a fixed 60 seconds.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@He-Pin He-Pin requested a review from pjfanning April 22, 2026 07:14
@He-Pin He-Pin added this to the 2.0.0-M2 milestone Apr 22, 2026
Copy link
Copy Markdown
Member

@pjfanning pjfanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Motivation:
JDK 25 nightly reproductions showed the RequestNext demand test could observe a legitimate redelivery before the buffered message was re-sent, making the assertion order-sensitive.

Modification:
Update ReliableDeliveryShardingSpec to fish for the expected buffered message and tolerate duplicate earlier deliveries during the renewed-demand race window.

Result:
The sharding delivery spec remains strict about eventually observing the buffered message while no longer flaking on JDK 25 scheduling differences.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@He-Pin He-Pin merged commit a36d126 into apache:main Apr 22, 2026
9 checks passed
@He-Pin He-Pin deleted the hepin/mapasyncpartitioned-jdk25-timefactor branch April 22, 2026 09:43
He-Pin added a commit to He-Pin/incubator-pekko that referenced this pull request Apr 23, 2026
Motivation:
The "resume after multiple failures if resume supervision is in place"
case times out after the default 6-second ScalaFutures patience on the
JDK 25 nightly matrix. The sibling suite
stream-typed-tests MapAsyncPartitionedSpec received the same dilation
treatment in apache#2884; this spec was missed.

Modification:
Override the spec's patienceConfig to use a dilated 30-second timeout
so the whole suite tracks pekko.test.timefactor instead of a fixed 6s.

Result:
Supervised-resume and bulk-throughput cases no longer flake on JDK 25
CI when the environment is under contention.
He-Pin added a commit that referenced this pull request Apr 23, 2026
* fix: stabilise JDK 21+ / JDK 25 nightly test runs

Motivation:
The JDK 21+ ForkJoinPool compensation-thread regression (JDK-8300995 /
JDK-8321335) starves Pekko's actor and remote dispatchers during heavy
ask/await workloads in tests, producing intermittent timeouts on the
nightly matrix (see #2573, #2870).  Several individual tests also rely
on hardcoded timeouts that never scale with `pekko.test.timefactor`,
so they flake even when the dispatcher itself is healthy.

Modification:
- nightly-builds.yml: when the JDK is 21 or newer, raise
  `fork-join-executor.minimum-runnable` to 4 for both
  `pekko.actor.default-dispatcher` and
  `pekko.remote.default-remote-dispatcher`.  This pushes the pool to
  spawn compensation threads earlier and avoids long stalls under the
  new compensation policy without changing scheduling semantics
  (FIFO unchanged, no fairness regressions).
- TlsSpec: dilate the previously hardcoded 15s/17s timeouts in the
  ServerInitiatesViaTcp / CancellingRHSIgnoresBoth scenario so the
  timefactor actually applies on slower CI workers.
- EventSourcedStashOverflowSpec: pass an explicit 30s budget to
  `receiveMessages(stashCapacity)` (the default 12s on JDK 25 is not
  enough to drain 20k messages on slow runners).
- SteppingInmemJournal.step: bump the per-step ask timeout from 3s
  (dilated) to 10s (dilated) so PersistentActorRecoveryTimeoutSpec and
  similar tests retain headroom on JDK 17 with timefactor=2.

Result:
JDK 21+ nightly runs stop hitting compensation-thread starvation in the
artery / remoting suites, and the four targeted timing fixes remove
specific flakes observed in the latest nightly run on JDK 17 / 25.
Production defaults are unchanged - the FJP override is applied only
in the workflow.

* test: scale FlowMapAsyncPartitionedSpec patience with test timefactor

Motivation:
The "resume after multiple failures if resume supervision is in place"
case times out after the default 6-second ScalaFutures patience on the
JDK 25 nightly matrix. The sibling suite
stream-typed-tests MapAsyncPartitionedSpec received the same dilation
treatment in #2884; this spec was missed.

Modification:
Override the spec's patienceConfig to use a dilated 30-second timeout
so the whole suite tracks pekko.test.timefactor instead of a fixed 6s.

Result:
Supervised-resume and bulk-throughput cases no longer flake on JDK 25
CI when the environment is under contention.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants