Skip to content

ForkJoinPool compensation-thread regression (JDK-8300995) causes test flakiness on JDK 21+ #2870

@He-Pin

Description

@He-Pin

Summary

The Pekko nightly CI has observed intermittent test failures on JDK 21 and JDK 25 that trace to a ForkJoinPool behavioral regression in JDK 21+ (JDK-8300995, JDK-8321335).

This issue documents the root cause and proposed mitigations for Pekko's stream tests. Related: #2573.


Root Cause

Pekko's default fork-join-executor uses asyncMode=true (FIFO, via task-peeking-mode = "FIFO"). In JDK 21+, this mode interacts poorly with actor round-trip workloads:

  1. When actor A sends a message to actor B and waits for a reply, the reply task is placed at the back of the FIFO work queue.
  2. With a fixed-size pool (e.g. 8 threads) under load, the reply waits behind all pending tasks before being executed.
  3. The compensation-thread mechanism (which should unblock this) was regressed in JDK 21 for asyncMode=FIFO pools — compensation threads are not reliably created when needed.
  4. This cascades for workloads with many actor hand-offs (e.g. MergeHub buffer=1: one full round-trip per element).

JDK 25 exacerbates the issue because work-stealing heuristics under high CI load diverge further from expected throughput.


Evidence


Mitigations Applied (PR #2869)

  1. Stream test dispatcher: Changed task-peeking-mode from FIFO to LIFO in stream-testkit/src/test/resources/reference.conf. With LIFO, response tasks go to the front of each worker's deque and are processed immediately. Actor mailbox ordering is unaffected (UnboundedMailbox is always FIFO).

  2. minimum-runnable now configurable: PekkoForkJoinPool hardcoded minimumRunnable=1 in the JDK 9+ ForkJoinPool constructor. This is now exposed as minimum-runnable in the fork-join-executor config block (default: 1, preserving current behaviour). Users experiencing actor starvation on JDK 21+ can increase this to e.g. parallelism/2.

  3. HubSpec / AggregateWithBoundarySpec / TCK timeouts / FlowMapAsyncPartitionedSpec: test-level fixes documented in fix: stabilise stream tests on JDK 25 nightly (timeout scaling, element counts) #2869.


Future Work

  • Investigate whether virtualize = on (JDK 21+, Pekko v1.2.0) should be the recommended mode for actor dispatchers on JDK 21+. Virtual threads avoid the FJP compensation-thread regression entirely because the scheduler is cooperative rather than preemptive.
  • Consider increasing the default minimum-runnable on JDK 21+ from 1 to something like parallelism/2 to proactively create compensation threads before complete starvation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions