Summary
10 distinct tests failed against committed code (Timer and Post Merge Action builds) in the 24-hour window ending 2026-05-18T10:00 UTC. None reproduced deterministically with their original seed on a local dev machine, confirming these are timing-dependent flakes.
Reproduction Attempts
All tests were run locally on the current main branch with the exact seed from the failing CI build. 0 out of 10 reproduced (1 could not be attempted locally due to multi-version cluster requirements).
Failing Tests
Summary Table (sorted by total unique builds affected)
| # |
Test |
Builds Affected |
First Seen |
Pattern |
Recent Build |
| 1 |
FullRollingRestartIT.testFullRollingRestart |
252 |
2024-10-11 |
Worsening (spike Jul 2025, resurgence Feb 2026+) |
77274 |
| 2 |
SearchRestCancellationIT.testAutomaticCancellationDuringFetchPhase |
199 |
2024-04-04 |
Worsening (chronic, major spike Nov 2025, Apr-May 2026) |
77265 |
| 3 |
MixedClusterClientYamlTestSuiteIT.test {cluster.health/10_basic/...} |
182 |
2024-03-25 |
Stable/chronic (massive spike Sep 2024, steady low rate since, uptick Apr 2026) |
77225 |
| 4 |
ConcurrentSeqNoVersioningIT.testSeqNoCASLinearizability |
116 |
2024-10-03 |
Worsening (major spike Apr 2026 — 35 builds, likely CPU-speed amplification from m7a.8xlarge migration) |
77306 |
| 5 |
FlightMetricsTests.testComprehensiveMetrics |
74 |
2025-07-25 |
Stable/chronic (~5-11 builds/month consistently since introduction) |
77275 |
| 6 |
ClusterShardLimitIT.testOpenIndexOverLimit |
46 |
2025-10-15 |
Stable (~5-9 builds/month since first appearance) |
77278 |
| 7 |
WarmIndexSegmentReplicationIT.testShardPathDeletionWhenWarmIndexRelocate |
18 |
2025-06-23 |
Worsening (spike in Apr 2026 — 7 builds, up from 0-2/month) |
77235 |
| 8 |
WarmIndexSegmentReplicationIT.testIndexReopenClose |
15 |
2025-03-11 |
Intermittent (clusters in Mar 2025, Aug 2025, Feb 2026) |
77216 |
| 9 |
FlightClientChannelTests.testSetMessageListenerTwice |
12 |
2025-08-21 |
Stable/low (~1-3 builds/month) |
77276 |
| 10 |
ShardIndexingPressureSettingsIT.testShardIndexingPressureNodeLimitUpdateSetting |
5 |
2024-06-03 |
Rare (long dormant period, reappeared Feb 2026) |
77232 |
Detailed Findings
1. FullRollingRestartIT.testFullRollingRestart
- Build: 77274 (Timer, main)
- Error:
java.lang.AssertionError: replica shards haven't caught up with primary expected:<18> but was:<13>
- Seed:
A76DB14919A7F493
- Reproduced locally: No
- First seen: 2024-10-11
- Total builds affected: 252
- Pattern: Worsening. Was dormant Nov 2024–Jun 2025, then exploded in Jul 2025 (105 builds). Quieted Sep 2025–Jan 2026, then resurgent Feb 2026 onward (19 builds in May 2026 alone). The segment replication variant is particularly sensitive to timing.
2. SearchRestCancellationIT.testAutomaticCancellationDuringFetchPhase
- Build: 77265 (Timer, main)
- Error:
java.lang.AssertionError at SearchRestCancellationIT.lambda$ensureSearchTaskIsCancelled$0
- Seed:
398A6DBD80D20844
- Reproduced locally: No
- First seen: 2024-04-04
- Total builds affected: 199
- Pattern: Chronic and worsening. Has been failing since Apr 2024. Major spike in Nov 2025 (41 builds). Currently at 27 builds in May 2026 (18 days in). The cancellation race is highly sensitive to thread scheduling.
3. MixedClusterClientYamlTestSuiteIT.test {cluster.health/10_basic/cluster health with closed index}
- Build: 77225 (Timer, main)
- Error:
expected [2xx] status code but api [cluster.health] returned [408 Request Timeout] — cluster was red with 51 unassigned shards
- Seed:
D13505BF7CE3BF5B
- Reproduced locally: Cannot reproduce (requires multi-version cluster with v3.6.1 nodes)
- First seen: 2024-03-25
- Total builds affected: 182
- Pattern: Chronic. Massive spike in Sep 2024 (96 builds), then settled to low single digits per month. Uptick in Apr 2026 (11 builds). The failure is a cluster health timeout during mixed-version rolling upgrade.
4. ConcurrentSeqNoVersioningIT.testSeqNoCASLinearizability
- Build: 77306 (Timer, 2.19 branch)
- Error:
java.lang.AssertionError: Must be linearizable
- Seed:
5B36C3D32F3299E4
- Reproduced locally: No
- First seen: 2024-10-03
- Total builds affected: 116
- Pattern: Worsening. Steady at 1-6 builds/month through early 2026, then jumped to 35 builds in Apr 2026. This timing aligns with the CI runner migration to m7a.8xlarge (~Apr 15, 2026), strongly suggesting CPU-speed amplification of a latent race condition.
5. FlightMetricsTests.testComprehensiveMetrics
- Build: 77275 (Post Merge Action)
- Error:
BindTransportException: Failed to bind to [/0:0:0:0:0:0:0:1%lo, /127.0.0.1]:PortsRange{portRange='17201'}
- Seed:
EC0F910FC35C666A
- Reproduced locally: No
- First seen: 2025-07-25
- Total builds affected: 74
- Pattern: Stable/chronic. Consistently 4-11 builds/month since introduction. The port binding failure suggests resource contention on CI runners (port not released from a prior test in time).
6. ClusterShardLimitIT.testOpenIndexOverLimit
- Build: 77278 (Timer, main)
- Error:
IllegalStateException: Some shards are still open after the threadpool terminated. Something is leaking index readers or store references.
- Seed:
972067C756961F1C
- Reproduced locally: No
- First seen: 2025-10-15
- Total builds affected: 46
- Pattern: Stable. Consistent 5-9 builds/month since first appearance. The shard leak during teardown suggests an async close path that doesn't complete within the shutdown timeout.
7. WarmIndexSegmentReplicationIT.testShardPathDeletionWhenWarmIndexRelocate
- Build: 77235 (Timer, main)
- Error:
IOException: failed to read (path in temp directory)
- Seed:
41BCBBC845070506
- Reproduced locally: No
- First seen: 2025-06-23
- Total builds affected: 18
- Pattern: Worsening. Was 0-2 builds/month, then jumped to 7 in Apr 2026. Likely CPU-speed amplification causing a race between shard relocation and path deletion.
8. WarmIndexSegmentReplicationIT.testIndexReopenClose
- Build: 77216 (Post Merge Action)
- Error:
AssertionError: Expected: a value equal to or greater than <4L> but: <0L> was less than <4L>
- Seed:
27483C47C34980DA
- Reproduced locally: No
- First seen: 2025-03-11
- Total builds affected: 15
- Pattern: Intermittent. Appears in clusters (Mar 2025, Aug 2025, Feb 2026, May 2026) then goes quiet. Likely a race in segment replication state after index close/reopen.
9. FlightClientChannelTests.testSetMessageListenerTwice
- Build: 77276 (Timer, main)
- Error:
BindTransportException: Failed to bind to [/0:0:0:0:0:0:0:1%lo, /127.0.0.1]:PortsRange{portRange='27501'}
- Seed:
49CAD98AABBE8C7A
- Reproduced locally: No
- First seen: 2025-08-21
- Total builds affected: 12
- Pattern: Stable/low. 1-3 builds/month. Same port binding issue as FlightMetricsTests — likely the same root cause (port contention on CI).
10. ShardIndexingPressureSettingsIT.testShardIndexingPressureNodeLimitUpdateSetting
- Build: 77232 (Timer, main)
- Error:
AssertionError: expected:<23576> but was:<47152> (value is exactly 2x expected)
- Seed:
8DFF60F5474F9992
- Reproduced locally: No
- First seen: 2024-06-03
- Total builds affected: 5
- Pattern: Rare. Only 5 builds total over 2 years. Was dormant Jun 2024–Jan 2026, reappeared Feb 2026. The 2x value suggests a double-application of a setting update, likely a race in cluster state application.
Environmental Notes
Summary
10 distinct tests failed against committed code (Timer and Post Merge Action builds) in the 24-hour window ending 2026-05-18T10:00 UTC. None reproduced deterministically with their original seed on a local dev machine, confirming these are timing-dependent flakes.
Reproduction Attempts
All tests were run locally on the current
mainbranch with the exact seed from the failing CI build. 0 out of 10 reproduced (1 could not be attempted locally due to multi-version cluster requirements).Failing Tests
Summary Table (sorted by total unique builds affected)
FullRollingRestartIT.testFullRollingRestartSearchRestCancellationIT.testAutomaticCancellationDuringFetchPhaseMixedClusterClientYamlTestSuiteIT.test {cluster.health/10_basic/...}ConcurrentSeqNoVersioningIT.testSeqNoCASLinearizabilityFlightMetricsTests.testComprehensiveMetricsClusterShardLimitIT.testOpenIndexOverLimitWarmIndexSegmentReplicationIT.testShardPathDeletionWhenWarmIndexRelocateWarmIndexSegmentReplicationIT.testIndexReopenCloseFlightClientChannelTests.testSetMessageListenerTwiceShardIndexingPressureSettingsIT.testShardIndexingPressureNodeLimitUpdateSettingDetailed Findings
1. FullRollingRestartIT.testFullRollingRestart
java.lang.AssertionError: replica shards haven't caught up with primary expected:<18> but was:<13>A76DB14919A7F4932. SearchRestCancellationIT.testAutomaticCancellationDuringFetchPhase
java.lang.AssertionErroratSearchRestCancellationIT.lambda$ensureSearchTaskIsCancelled$0398A6DBD80D208443. MixedClusterClientYamlTestSuiteIT.test {cluster.health/10_basic/cluster health with closed index}
expected [2xx] status code but api [cluster.health] returned [408 Request Timeout]— cluster was red with 51 unassigned shardsD13505BF7CE3BF5B4. ConcurrentSeqNoVersioningIT.testSeqNoCASLinearizability
java.lang.AssertionError: Must be linearizable5B36C3D32F3299E45. FlightMetricsTests.testComprehensiveMetrics
BindTransportException: Failed to bind to [/0:0:0:0:0:0:0:1%lo, /127.0.0.1]:PortsRange{portRange='17201'}EC0F910FC35C666A6. ClusterShardLimitIT.testOpenIndexOverLimit
IllegalStateException: Some shards are still open after the threadpool terminated. Something is leaking index readers or store references.972067C756961F1C7. WarmIndexSegmentReplicationIT.testShardPathDeletionWhenWarmIndexRelocate
IOException: failed to read(path in temp directory)41BCBBC8450705068. WarmIndexSegmentReplicationIT.testIndexReopenClose
AssertionError: Expected: a value equal to or greater than <4L> but: <0L> was less than <4L>27483C47C34980DA9. FlightClientChannelTests.testSetMessageListenerTwice
BindTransportException: Failed to bind to [/0:0:0:0:0:0:0:1%lo, /127.0.0.1]:PortsRange{portRange='27501'}49CAD98AABBE8C7A10. ShardIndexingPressureSettingsIT.testShardIndexingPressureNodeLimitUpdateSetting
AssertionError: expected:<23576> but was:<47152>(value is exactly 2x expected)8DFF60F5474F9992Environmental Notes