Skip to content

Flaky test report: committed-code failures on 2026-05-12 #263

@andrross

Description

@andrross

Summary

One committed-code test failure was detected in the past 24 hours (Timer builds on main). An additional failure from 33 hours ago is included for completeness. Both are chronic flaky tests with long histories.

Failing Tests

1. SearchRestCancellationIT.testAutomaticCancellationDuringFetchPhase

Field Value
Build 76533
Module qa/smoke-test-http
Seed 9433723F17A2BCD6
Error java.lang.AssertionError at SearchRestCancellationIT.java:222assertBusy timeout waiting for search task cancellation
Reproduced locally No — passed with seed, passed with 5 iterations
First failure 2024-04-04
Total unique builds affected 190

Monthly failure pattern (unique builds):

Month Builds
2024-04 7
2024-05 1
2024-06 5
2024-07 9
2024-08 8
2024-09 10
2024-10 4
2024-11 3
2024-12 6
2025-01 2
2025-02 2
2025-03 2
2025-04 2
2025-05 10
2025-06 2
2025-07 5
2025-08 0
2025-09 7
2025-10 9
2025-11 41
2025-12 9
2026-01 4
2026-02 4
2026-03 4
2026-04 16
2026-05 18 (partial month)

Pattern: Chronic flake since April 2024. Major spike in November 2025 (41 builds). Currently worsening — April and May 2026 show elevated rates (16-18 builds/month). The April 2026 uptick correlates with the CI runner migration to m7a.8xlarge (faster CPUs amplify timing-sensitive races).


2. RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadWithReducedAllowedNodes

Field Value
Build 76481
Module server (internalClusterTest)
Seed C8CCF036B428F9A5
Error java.lang.AssertionError: replica shards haven't caught up with primary expected:<25> but was:<22>
Reproduced locally No — passed with seed
First failure 2024-04-03
Total unique builds affected 112

Monthly failure pattern (unique builds):

Month Builds
2024-04 1
2024-05 1
2025-02 1
2025-04 1
2025-06 29
2025-07 21
2025-08 9
2025-09 1
2026-02 4
2026-03 17
2026-04 13
2026-05 14 (partial month)

Pattern: Episodic flake. First major spike in June-July 2025 (29-21 builds), then subsided. Re-emerged in February 2026 and currently worsening (13-17 builds/month in Mar-May 2026). The March 2026 re-emergence predates the CI runner change, suggesting a code-level regression or environmental sensitivity unrelated to CPU speed alone.


Summary Table

Test Builds Affected First Seen Trend Reproduced
SearchRestCancellationIT.testAutomaticCancellationDuringFetchPhase 190 2024-04-04 Worsening No
RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadWithReducedAllowedNodes 112 2024-04-03 Worsening No

Notes

  • Neither test is reproducible with the original seed. Both failures depend on thread scheduling/timing rather than deterministic random state.
  • The SearchRestCancellationIT failure is an assertBusy timeout — the search task cancellation doesn't complete within the polling window.
  • The RecoveryWhileUnderLoadIT failure is a replication lag assertion — replicas don't catch up to the primary's doc count within the timeout.
  • Both patterns are consistent with environmental sensitivity (CPU speed, thread scheduling) rather than deterministic code bugs.
  • Data source: gradle-check-* indices on metrics.opensearch.org, queried across all build types (Timer, Pull Request, Post Merge Action).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions