You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Flaky test report: committed-code failures on 2026-05-26
Summary
18 test failures were recorded against committed code (Timer/Post Merge Action builds on main) in the past 24 hours across 8 distinct builds. These failures map to 10 distinct test methods (capping at 10 as requested).
None of the failures reproduced locally with the original seed, which is consistent with timing-dependent flakiness (seeds control Random streams but not thread scheduling, GC pauses, or network timing).
Summary Table (sorted by total unique builds affected)
Reproduction: Skipped — requires multi-version BWC cluster infrastructure not available locally
History: 482 unique builds affected since March 2024. This is a chronically flaky BWC test with periodic spikes (185 builds in Sep 2024, 26 in May 2026). The failure rate is stable and has been ongoing for over 2 years.
Pattern: Chronic, stable. Multiple test parameterizations fail together (both "complete term" and "partial term" variants).
History: 124 unique builds since Oct 2024. Notable spike: 35 builds in Apr 2026, 24 in May 2026 (up from 3-6/month previously).
Pattern: Chronic, worsening significantly since April 2026. The timing aligns with the m7a.8xlarge runner migration — faster CPUs likely tighten the race window in this CAS linearizability test. Error was a cluster health timeout.
History: 15 unique builds, all in May 2026 (first failure May 8).
Pattern: Brand new and worsening rapidly. Error: "afterRefresh must fire exactly once for the merge — Expected: <1> but: was <0>". This suggests a race between merge completion and refresh listener notification.
No seed-based reproduction: None of the 9 tests that could be run locally reproduced with their original seeds. This is expected for timing-dependent races — the seeds control randomized parameters but not thread interleaving.
Chronic offenders: MixedClusterClientYamlTestSuiteIT has been flaky for over 2 years with 482 affected builds. ConcurrentSeqNoVersioningIT has been flaky since Oct 2024 with 124 affected builds.
Flaky test report: committed-code failures on 2026-05-26
Summary
18 test failures were recorded against committed code (Timer/Post Merge Action builds on
main) in the past 24 hours across 8 distinct builds. These failures map to 10 distinct test methods (capping at 10 as requested).None of the failures reproduced locally with the original seed, which is consistent with timing-dependent flakiness (seeds control
Randomstreams but not thread scheduling, GC pauses, or network timing).Summary Table (sorted by total unique builds affected)
Detailed Findings
1. MixedClusterClientYamlTestSuiteIT (310_match_bool_prefix)
5F29C2578D2BCC552. ConcurrentSeqNoVersioningIT.testSeqNoCASLinearizability
3B342D4DE6E6AEC53. ClusterShardLimitIT.testOpenIndexOverLimit
1692B8C36D0111B8writable_warm_index.enabled=trueis the one failing.4. RareClusterStateIT.testDisassociateNodesWhileShardInit
CF25BD78D2BF71825. DataFormatAwareEngineTests.testApplyMergeChangesUpdatesCatalogAndNotifiesListeners
8757F4613DE85FFF(build 78265),DFE8C39BE1E377C(build 78249)6. WarmIndexBasicIT.testLocalDirectoryFilesAfterRefresh
DE8A4CBCF51653EE7. WarmIndexSegmentReplicationIT.testReplicationAfterForceMergeOnPrimaryShardsOnly
3B342D4DE6E6AEC58. FlightOutboundHandlerContextPropagationTests.testThreadContextPropagatedThroughStreamResponseBatch
40CB66C49A747F9F9. MergedSegmentWarmerIT.testCleanupRedundantPendingMergeFile
DEE521945FF2C97310. OsProbeTests.testGetProcessNativeMemoryBytes_returnsDifferenceWhenRssAnonExceedsCommitted
E4AF6E84948102C1Observations
Runner migration impact: Tests Bump com.diffplug.spotless from 5.6.1 to 6.2.1 #2, Bump opensearch-core from current to 1.2.4 in /buildSrc/src/testKit/opensearch.build #4, Bump junit from 4.13.1 to 4.13.2 in /buildSrc/src/testKit/testingConventions #6, and Bump jopt-simple from 5.0.2 to 5.0.4 in /libs/cli #8 all show significant worsening starting in April 2026, coinciding with the m5.8xlarge → m7a.8xlarge CI runner migration. Faster CPUs tighten race windows in concurrent tests.
New tests with high flake rates: Tests Bump com.diffplug.spotless from 5.6.1 to 6.3.0 #5 (DataFormatAwareEngineTests) and Bump junit from 4.13.1 to 4.13.2 in /buildSrc/src/testKit/testingConventions #6 (WarmIndexBasicIT) are both new (first failures in late April/May 2026) and already accumulating failures rapidly. These likely need immediate attention.
No seed-based reproduction: None of the 9 tests that could be run locally reproduced with their original seeds. This is expected for timing-dependent races — the seeds control randomized parameters but not thread interleaving.
Chronic offenders: MixedClusterClientYamlTestSuiteIT has been flaky for over 2 years with 482 affected builds. ConcurrentSeqNoVersioningIT has been flaky since Oct 2024 with 124 affected builds.