Flaky test report: committed-code failures on 2026-05-16

## Summary

10 test failures were observed against committed code (Timer and Post Merge Action builds on `main`) in the 24 hours ending 2026-05-16T10:00Z. All failures are from known-flaky tests with long histories in the metrics cluster. None reproduced locally with the original CI seed, confirming they are timing/concurrency-dependent.

## Failures Observed

| # | Test | Build | Error | Seed Reproduces Locally? |
|---|---|---|---|---|
| 1 | `SegmentReplicationStatsIT.testSegmentReplicationNodeAndIndexStats` | [77113](https://build.ci.opensearch.org/job/gradle-check/77113/) | `AssertionError` (assertTrue at line 415) | No |
| 2 | `EhCacheDiskCacheTests.testComputeIfAbsentConcurrently` | [77074](https://build.ci.opensearch.org/job/gradle-check/77074/) | `expected:<1> but was:<2>` | No |
| 3 | `SearchRestCancellationIT.testAutomaticCancellationMultiSearchDuringQueryPhase` | [77071](https://build.ci.opensearch.org/job/gradle-check/77071/) | `AssertionError` (assertTrue) | No |
| 4 | `RemotePrimaryLocalRecoveryIT.testLocalRecoveryRollingRestartAndNodeFailure` | [77069](https://build.ci.opensearch.org/job/gradle-check/77069/) | `AssertionError: unexpected` during rollingRestart | No |
| 5 | `RecoveryWhileUnderLoadIT.testRecoverWithRelocationAndDerivedSource` | [77057](https://build.ci.opensearch.org/job/gradle-check/77057/) | `shard has pending operations` | No |
| 6 | `FullRollingRestartIT.testFullRollingRestart_withNoRecoveryPayloadAndSource` | [77056](https://build.ci.opensearch.org/job/gradle-check/77056/) | `replica shards haven't caught up: expected:<20> but was:<17>` | No |
| 7 | `RemoteStoreKafkaIT.testPeriodicFlush` | [77039](https://build.ci.opensearch.org/job/gradle-check/77039/) | `ConditionTimeoutException: not fulfilled within 1 minute` | No |
| 8 | `IndexingIT.testIndexingWithSegRep` | [77039](https://build.ci.opensearch.org/job/gradle-check/77039/) | `expected:<0> but was:<1>` | N/A (BWC test) |
| 9 | `RemoteStoreReplicationSourceTests.testGetMergedSegmentFilesDownloadTimeout` | [77033](https://build.ci.opensearch.org/job/gradle-check/77033/) | Suite timeout reached | No |
| 10 | `RemoteStoreReplicationSourceTests.classMethod` | [77033](https://build.ci.opensearch.org/job/gradle-check/77033/) | Suite-level timeout (same as #9) | N/A |

## Historical Flake Rates (sorted by total builds affected)

Data from `gradle-check-*` indices across all build types (including PR builds). Unique build counts represent distinct CI runs where the test failed.

| Test | Total Builds | First Seen | Monthly Trend (recent 6 months) | Pattern |
|---|---|---|---|---|
| `IndexingIT` | 711 | 2024-03-25 | 5, 12, 19, 19, 5, 12 | Stable high |
| `SearchRestCancellationIT` | 440 | 2024-03-26 | 14, 5, 7, 6, 23, 38 | **Worsening** |
| `FullRollingRestartIT` | 277 | 2024-10-11 | 1, 0, 35, 25, 24, 16 | Improving |
| `RecoveryWhileUnderLoadIT` | 272 | 2024-04-03 | 1, 1, 13, 29, 28, 27 | Stable high (since Feb 2026) |
| `RemotePrimaryLocalRecoveryIT` | 163 | 2024-03-26 | 1, 0, 2, 6, 7, 3 | Stable low |
| `RemoteStoreReplicationSourceTests` | 161 | 2024-04-17 | 3, 3, 1, 2, 3, 4 | Stable low |
| `RemoteStoreKafkaIT` | 119 | 2025-03-18 | 5, 1, 8, 3, 7, 5 | Stable |
| `EhCacheDiskCacheTests` | 79 | 2024-03-28 | 1, 3, 0, 2, 14, 14 | **Worsening** |
| `SegmentReplicationStatsIT` | 7 | 2024-12-06 | 0, 0, 0, 0, 0, 1 | Rare |

## Key Observations

1. **SearchRestCancellationIT** is the most concerning — failure rate jumped from ~6/month to 38 in May 2026 (partial month). The April 2026 CI runner migration from `m5.8xlarge` to `m7a.8xlarge` correlates with the uptick, suggesting CPU-speed amplification of a latent race.

2. **EhCacheDiskCacheTests** shows a similar worsening pattern (3→14→14), also correlating with the runner migration. The concurrent test (`testComputeIfAbsentConcurrently`) is likely sensitive to thread scheduling differences on faster hardware.

3. **RecoveryWhileUnderLoadIT** stabilized at a high rate (~28/month) since February 2026, predating the runner change. This may be a separate issue.

4. **FullRollingRestartIT** is actually improving (35→16 over 4 months), suggesting prior fixes are having effect.

5. **No seeds reproduced locally** — all 9 locally-runnable tests passed with their CI seeds. This confirms the failures depend on factors outside seed control (thread scheduling, GC timing, network simulation timing).

## Methodology

- Committed-code failures identified via metrics cluster query filtering on `invoke_type: Timer` with `git_reference: main`, and `invoke_type: Post Merge Action`
- Seeds extracted from Jenkins test report API (`errorStackTrace` SeedInfo or JVM args in stdout)
- Local reproduction attempted with `./gradlew <module>:<task> --tests "<class>.<method>" -Dtests.seed=<SEED>`
- Historical data aggregated across all `gradle-check-*` indices using monthly date histograms with cardinality aggregation on `build_number`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky test report: committed-code failures on 2026-05-16 #267

Summary

Failures Observed

Historical Flake Rates (sorted by total builds affected)

Key Observations

Methodology

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

#	Test	Build	Error	Seed Reproduces Locally?
1	`SegmentReplicationStatsIT.testSegmentReplicationNodeAndIndexStats`	77113	`AssertionError` (assertTrue at line 415)	No
2	`EhCacheDiskCacheTests.testComputeIfAbsentConcurrently`	77074	`expected:<1> but was:<2>`	No
3	`SearchRestCancellationIT.testAutomaticCancellationMultiSearchDuringQueryPhase`	77071	`AssertionError` (assertTrue)	No
4	`RemotePrimaryLocalRecoveryIT.testLocalRecoveryRollingRestartAndNodeFailure`	77069	`AssertionError: unexpected` during rollingRestart	No
5	`RecoveryWhileUnderLoadIT.testRecoverWithRelocationAndDerivedSource`	77057	`shard has pending operations`	No
6	`FullRollingRestartIT.testFullRollingRestart_withNoRecoveryPayloadAndSource`	77056	`replica shards haven't caught up: expected:<20> but was:<17>`	No
7	`RemoteStoreKafkaIT.testPeriodicFlush`	77039	`ConditionTimeoutException: not fulfilled within 1 minute`	No
8	`IndexingIT.testIndexingWithSegRep`	77039	`expected:<0> but was:<1>`	N/A (BWC test)
9	`RemoteStoreReplicationSourceTests.testGetMergedSegmentFilesDownloadTimeout`	77033	Suite timeout reached	No
10	`RemoteStoreReplicationSourceTests.classMethod`	77033	Suite-level timeout (same as #9)	N/A

Test	Total Builds	First Seen	Monthly Trend (recent 6 months)	Pattern
`IndexingIT`	711	2024-03-25	5, 12, 19, 19, 5, 12	Stable high
`SearchRestCancellationIT`	440	2024-03-26	14, 5, 7, 6, 23, 38	Worsening
`FullRollingRestartIT`	277	2024-10-11	1, 0, 35, 25, 24, 16	Improving
`RecoveryWhileUnderLoadIT`	272	2024-04-03	1, 1, 13, 29, 28, 27	Stable high (since Feb 2026)
`RemotePrimaryLocalRecoveryIT`	163	2024-03-26	1, 0, 2, 6, 7, 3	Stable low
`RemoteStoreReplicationSourceTests`	161	2024-04-17	3, 3, 1, 2, 3, 4	Stable low
`RemoteStoreKafkaIT`	119	2025-03-18	5, 1, 8, 3, 7, 5	Stable
`EhCacheDiskCacheTests`	79	2024-03-28	1, 3, 0, 2, 14, 14	Worsening
`SegmentReplicationStatsIT`	7	2024-12-06	0, 0, 0, 0, 0, 1	Rare

Flaky test report: committed-code failures on 2026-05-16 #267

Description

Summary

Failures Observed

Historical Flake Rates (sorted by total builds affected)

Key Observations

Methodology

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions