Summary
10 test failures were observed against committed code (Timer and Post Merge Action builds on main) in the 24 hours ending 2026-05-16T10:00Z. All failures are from known-flaky tests with long histories in the metrics cluster. None reproduced locally with the original CI seed, confirming they are timing/concurrency-dependent.
Failures Observed
| # |
Test |
Build |
Error |
Seed Reproduces Locally? |
| 1 |
SegmentReplicationStatsIT.testSegmentReplicationNodeAndIndexStats |
77113 |
AssertionError (assertTrue at line 415) |
No |
| 2 |
EhCacheDiskCacheTests.testComputeIfAbsentConcurrently |
77074 |
expected:<1> but was:<2> |
No |
| 3 |
SearchRestCancellationIT.testAutomaticCancellationMultiSearchDuringQueryPhase |
77071 |
AssertionError (assertTrue) |
No |
| 4 |
RemotePrimaryLocalRecoveryIT.testLocalRecoveryRollingRestartAndNodeFailure |
77069 |
AssertionError: unexpected during rollingRestart |
No |
| 5 |
RecoveryWhileUnderLoadIT.testRecoverWithRelocationAndDerivedSource |
77057 |
shard has pending operations |
No |
| 6 |
FullRollingRestartIT.testFullRollingRestart_withNoRecoveryPayloadAndSource |
77056 |
replica shards haven't caught up: expected:<20> but was:<17> |
No |
| 7 |
RemoteStoreKafkaIT.testPeriodicFlush |
77039 |
ConditionTimeoutException: not fulfilled within 1 minute |
No |
| 8 |
IndexingIT.testIndexingWithSegRep |
77039 |
expected:<0> but was:<1> |
N/A (BWC test) |
| 9 |
RemoteStoreReplicationSourceTests.testGetMergedSegmentFilesDownloadTimeout |
77033 |
Suite timeout reached |
No |
| 10 |
RemoteStoreReplicationSourceTests.classMethod |
77033 |
Suite-level timeout (same as #9) |
N/A |
Historical Flake Rates (sorted by total builds affected)
Data from gradle-check-* indices across all build types (including PR builds). Unique build counts represent distinct CI runs where the test failed.
| Test |
Total Builds |
First Seen |
Monthly Trend (recent 6 months) |
Pattern |
IndexingIT |
711 |
2024-03-25 |
5, 12, 19, 19, 5, 12 |
Stable high |
SearchRestCancellationIT |
440 |
2024-03-26 |
14, 5, 7, 6, 23, 38 |
Worsening |
FullRollingRestartIT |
277 |
2024-10-11 |
1, 0, 35, 25, 24, 16 |
Improving |
RecoveryWhileUnderLoadIT |
272 |
2024-04-03 |
1, 1, 13, 29, 28, 27 |
Stable high (since Feb 2026) |
RemotePrimaryLocalRecoveryIT |
163 |
2024-03-26 |
1, 0, 2, 6, 7, 3 |
Stable low |
RemoteStoreReplicationSourceTests |
161 |
2024-04-17 |
3, 3, 1, 2, 3, 4 |
Stable low |
RemoteStoreKafkaIT |
119 |
2025-03-18 |
5, 1, 8, 3, 7, 5 |
Stable |
EhCacheDiskCacheTests |
79 |
2024-03-28 |
1, 3, 0, 2, 14, 14 |
Worsening |
SegmentReplicationStatsIT |
7 |
2024-12-06 |
0, 0, 0, 0, 0, 1 |
Rare |
Key Observations
-
SearchRestCancellationIT is the most concerning — failure rate jumped from ~6/month to 38 in May 2026 (partial month). The April 2026 CI runner migration from m5.8xlarge to m7a.8xlarge correlates with the uptick, suggesting CPU-speed amplification of a latent race.
-
EhCacheDiskCacheTests shows a similar worsening pattern (3→14→14), also correlating with the runner migration. The concurrent test (testComputeIfAbsentConcurrently) is likely sensitive to thread scheduling differences on faster hardware.
-
RecoveryWhileUnderLoadIT stabilized at a high rate (~28/month) since February 2026, predating the runner change. This may be a separate issue.
-
FullRollingRestartIT is actually improving (35→16 over 4 months), suggesting prior fixes are having effect.
-
No seeds reproduced locally — all 9 locally-runnable tests passed with their CI seeds. This confirms the failures depend on factors outside seed control (thread scheduling, GC timing, network simulation timing).
Methodology
- Committed-code failures identified via metrics cluster query filtering on
invoke_type: Timer with git_reference: main, and invoke_type: Post Merge Action
- Seeds extracted from Jenkins test report API (
errorStackTrace SeedInfo or JVM args in stdout)
- Local reproduction attempted with
./gradlew <module>:<task> --tests "<class>.<method>" -Dtests.seed=<SEED>
- Historical data aggregated across all
gradle-check-* indices using monthly date histograms with cardinality aggregation on build_number
Summary
10 test failures were observed against committed code (Timer and Post Merge Action builds on
main) in the 24 hours ending 2026-05-16T10:00Z. All failures are from known-flaky tests with long histories in the metrics cluster. None reproduced locally with the original CI seed, confirming they are timing/concurrency-dependent.Failures Observed
SegmentReplicationStatsIT.testSegmentReplicationNodeAndIndexStatsAssertionError(assertTrue at line 415)EhCacheDiskCacheTests.testComputeIfAbsentConcurrentlyexpected:<1> but was:<2>SearchRestCancellationIT.testAutomaticCancellationMultiSearchDuringQueryPhaseAssertionError(assertTrue)RemotePrimaryLocalRecoveryIT.testLocalRecoveryRollingRestartAndNodeFailureAssertionError: unexpectedduring rollingRestartRecoveryWhileUnderLoadIT.testRecoverWithRelocationAndDerivedSourceshard has pending operationsFullRollingRestartIT.testFullRollingRestart_withNoRecoveryPayloadAndSourcereplica shards haven't caught up: expected:<20> but was:<17>RemoteStoreKafkaIT.testPeriodicFlushConditionTimeoutException: not fulfilled within 1 minuteIndexingIT.testIndexingWithSegRepexpected:<0> but was:<1>RemoteStoreReplicationSourceTests.testGetMergedSegmentFilesDownloadTimeoutRemoteStoreReplicationSourceTests.classMethodHistorical Flake Rates (sorted by total builds affected)
Data from
gradle-check-*indices across all build types (including PR builds). Unique build counts represent distinct CI runs where the test failed.IndexingITSearchRestCancellationITFullRollingRestartITRecoveryWhileUnderLoadITRemotePrimaryLocalRecoveryITRemoteStoreReplicationSourceTestsRemoteStoreKafkaITEhCacheDiskCacheTestsSegmentReplicationStatsITKey Observations
SearchRestCancellationIT is the most concerning — failure rate jumped from ~6/month to 38 in May 2026 (partial month). The April 2026 CI runner migration from
m5.8xlargetom7a.8xlargecorrelates with the uptick, suggesting CPU-speed amplification of a latent race.EhCacheDiskCacheTests shows a similar worsening pattern (3→14→14), also correlating with the runner migration. The concurrent test (
testComputeIfAbsentConcurrently) is likely sensitive to thread scheduling differences on faster hardware.RecoveryWhileUnderLoadIT stabilized at a high rate (~28/month) since February 2026, predating the runner change. This may be a separate issue.
FullRollingRestartIT is actually improving (35→16 over 4 months), suggesting prior fixes are having effect.
No seeds reproduced locally — all 9 locally-runnable tests passed with their CI seeds. This confirms the failures depend on factors outside seed control (thread scheduling, GC timing, network simulation timing).
Methodology
invoke_type: Timerwithgit_reference: main, andinvoke_type: Post Merge ActionerrorStackTraceSeedInfo or JVM args in stdout)./gradlew <module>:<task> --tests "<class>.<method>" -Dtests.seed=<SEED>gradle-check-*indices using monthly date histograms with cardinality aggregation onbuild_number