Skip to content

Flaky test report: committed-code failures on 2026-05-03 #254

@andrross

Description

@andrross

Summary

Analysis of committed-code (Timer and Post Merge Action) test failures from gradle-check builds in the 24 hours ending 2026-05-03 10:00 UTC. 7 builds contained failures: 75616, 75626, 75637, 75638, 75639, 75651, 75655. 8 distinct test failures were identified across 6 test classes (build 75638 contained 13 failures across 8 org.opensearch.http.* classes, all with the identical error, counted as one issue).

Summary Table

# Test Build Seed Repro? First Seen Total Builds Affected Trend
1 SearchRestCancellationIT (+ 7 other http.* classes) 75638 No 2024-03 407 Chronic, stable ~10-20/month with spikes
2 FullRollingRestartIT (2 methods) 75637, 75655 No 2024-10 267 Worsening since mid-2025, ~25-105/month
3 RecoveryWhileUnderLoadIT 75639 No 2024-04 249 Worsening since mid-2025, ~13-77/month
4 RemoteStoreReplicationSourceTests 75616 No 2024-04 158 Spiked Aug-Oct 2025 (~21-42/month), now lower
5 RemotePrimaryLocalRecoveryIT 75651 No 2024-04 61 Chronic low-level, ~1-11/month
6 IngestFromKafkaIT 75626 No 2025-10 26 Worsening: 2→8→13 builds/month (Mar-Apr 2026)
7 InternalEngineTests.testForceMerge... 75626 Yes 2024-04 15 Chronic low-level, ~1-3/month

Detailed Findings

1. org.opensearch.http.* tests (build 75638, seed F3E5EDC48B3341D2)

Affected classes: DetailedErrorsDisabledIT, DetailedErrorsEnabledIT, IndexingPressureRestIT, NoHandlerIT, ResponseHeaderPluginIT, SearchRestCancellationIT, ShardIndexingPressureRestIT, SystemIndexRestIT

Error: All 13 failures have the identical error: 1 channels still being tracked in RestCancellableNodeClient while there should be none expected:<0> but was:<1>

Module: qa:smoke-test-http:integTest

Reproduction: Not reproducible locally with seed F3E5EDC48B3341D2. This is a teardown-time race condition where a channel is still tracked after the test completes. The seed does not control the timing that triggers this.

History: 407 unique builds affected since March 2024. This is the single most prolific flaky test family in the repository. Failure rate is chronic at ~10-20 builds/month with occasional spikes (47 in Nov 2025, 23 in Apr 2026). The RestCancellableNodeClient channel tracking assertion fires during test teardown when an HTTP channel outlives the test.

Pattern: Chronic, stable. No significant worsening or improvement trend.


2. org.opensearch.recovery.FullRollingRestartIT (builds 75637, 75655)

Methods:

  • testFullRollingRestart {SEGMENT} — seed 85DC310E3B9633AB, error: replica shards haven't caught up with primary expected:<22> but was:<17>
  • testFullRollingRestart_withNoRecoveryPayloadAndSource {SEGMENT} — seed 76BACE4695D25E71, error: replica shards haven't caught up with primary expected:<18> but was:<15>

Reproduction: Neither seed reproduced locally.

History: 267 unique builds since Oct 2024. Major spike in Jun-Aug 2025 (24→105→43 builds/month), then dropped, then resumed at ~25-36/month in Feb-Apr 2026. The failures are exclusively in the SEGMENT replication strategy parameterization.

Pattern: Worsening. The mid-2025 spike correlates with the April 2025 CI runner migration to m7a.8xlarge — faster CPUs may be exposing a timing-dependent replica catch-up race.


3. org.opensearch.recovery.RecoveryWhileUnderLoadIT (build 75639, seed F33EBA4EF05FD7A8)

Method: testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest {SEGMENT}

Error: java.lang.RuntimeException: java.util.NoSuchElementException: No value present

Reproduction: Not reproducible locally with seed.

History: 249 unique builds since Apr 2024. Similar pattern to FullRollingRestartIT — major spike in Jun-Aug 2025 (77→43→14), then sustained at ~13-29/month through 2026. Also exclusively SEGMENT replication strategy.

Pattern: Worsening since mid-2025. Same environmental correlation as FullRollingRestartIT.


4. org.opensearch.indices.replication.RemoteStoreReplicationSourceTests (build 75616, seed D7E4D57832EDE21C)

Error: Test abandoned because suite timeout was reached (>= 1200000 msec).

Reproduction: Not reproducible locally (test passed in 12s). The original failure was a suite-level timeout, purely environmental.

History: 158 unique builds since Apr 2024. Spiked dramatically in Aug-Oct 2025 (42→32→21 builds/month), then subsided to ~1-3/month. The spike timing suggests an environmental or infrastructure change during that period.

Pattern: Improving. The Aug-Oct 2025 spike has resolved and the test is now failing at a low background rate.


5. org.opensearch.remotemigration.RemotePrimaryLocalRecoveryIT.testLocalRecoveryFlowWithReplicas (build 75651, seed 5FDBED829EE0A54)

Error: java.lang.AssertionError: unexpected

Reproduction: Not reproducible locally with seed.

History: 61 unique builds since Apr 2024. Chronic low-level flake at ~1-11 builds/month. No clear trend — the rate fluctuates but doesn't show sustained worsening or improvement.

Pattern: Chronic, stable at low level.


6. org.opensearch.plugin.kafka.IngestFromKafkaIT.testAllActiveOffsetBasedLag (build 75626, seed F3BCC2462D663D99)

Error: java.lang.AssertionError (no detail message)

Reproduction: Not reproducible locally with seed.

History: 26 unique builds since Oct 2025. This is a relatively new test. Failure rate is worsening: 2 builds in Oct 2025, then 8 in Mar 2026, 13 in Apr 2026, 3 so far in May 2026.

Pattern: Worsening. New test with increasing failure rate.


7. org.opensearch.index.engine.InternalEngineTests.testForceMergeWithSoftDeletesRetentionAndRecoverySource (build 75626, seed F3BCC2462D663D99)

Error: Expected: a collection with size <0> but: collection size was <1>

Reproduction:Reproduced deterministically with seed F3BCC2462D663D99. The failure is at InternalEngineTests.java:2098.

History: 15 unique builds since Apr 2024. Very low-level chronic flake at ~1-3 builds/month. The deterministic reproduction with seed suggests this is a test-logic or code bug rather than a timing issue — the specific random parameters chosen by this seed expose a real edge case.

Pattern: Chronic, stable at very low level. Deterministic with seed.


Reproduction Environment

  • Host: dev-dsk, Linux 5.10.252 amd64
  • JDK: Eclipse Adoptium 25.0.2 (64-bit), 16 CPUs
  • Repository: opensearch-project/OpenSearch at HEAD of main

Notes

  • Build 75638 is a single event where the RestCancellableNodeClient channel tracking assertion failed during teardown, causing all 8 test classes in qa:smoke-test-http:integTest to fail. This is one root cause, not 13 independent failures.
  • The SEGMENT replication strategy parameterization appears disproportionately in failures (FullRollingRestartIT, RecoveryWhileUnderLoadIT). The mid-2025 spike in these tests correlates with the CI runner migration to faster m7a.8xlarge instances around April 2025.
  • Only 1 of 8 tests reproduced deterministically with its seed (InternalEngineTests). The other 7 depend on timing, thread scheduling, or environmental factors that the seed does not control.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions