Skip to content

Flaky test report: committed-code failures on 2026-04-28 #249

@andrross

Description

@andrross

Summary

8 distinct test failures were detected against committed code (Timer/main and Post Merge Action builds) in the 24 hours ending 2026-04-28T10:00 UTC, across 6 builds. None of the failures reproduced locally with the original CI seed, confirming they are environment-sensitive flaky tests.

Summary Table

Sorted by total unique builds affected historically (all build types including PR builds):

# Test Builds Affected First Seen Recent Build Pattern
1 ClientYamlTestSuiteIT (date_histogram profiler) 217 2024-03-26 75199 Stable (chronic)
2 IndexActionIT.testAutoGenerateIdNoDuplicates 234 2024-03-26 75231 Stable (chronic)
3 ConcurrentSeqNoVersioningIT.testSeqNoCASLinearizability 89 2024-10-03 75218 Worsening
4 EhCacheDiskCacheTests.testComputeIfAbsentConcurrently 52 2024-03-28 75190 Worsening
5 ReindexIT.testReindexTask 22 2024-06-18 75226 Stable (low-rate)
6 IngestFromKafkaIT.testAllActiveOffsetBasedLag 20 2025-10-15 75248 Worsening
7 InternalSnapshotsInfoServiceTests.testErroneousSnapshotShardSizes 15 2024-04-08 75223 Stable (low-rate)
8 QueueResizableOpenSearchThreadPoolExecutorTests.testResizeQueueSameSize 9 2024-06-10 75243 Stable (low-rate)

Detailed Findings

1. ClientYamlTestSuiteIT.test {p0=search.aggregation/10_histogram/date_histogram profiler}

  • Recent build: 75199 (Timer/main)
  • Seed: A35AD3C0D03A3C20
  • Reproduced locally: No
  • First seen: 2024-03-26
  • Total unique builds affected: 217
  • Monthly pattern: Chronic flake present in nearly every month since March 2024. Peak of 65 builds in Sep 2024. Recent months: 8 builds in Apr 2026. This is the most prolific flaky test by build count.
  • Assessment: Stable chronic flake. Has been failing consistently for over 2 years with no sign of improvement.

2. IndexActionIT.testAutoGenerateIdNoDuplicates {p0={"cluster.indices.replication.strategy":"SEGMENT"}}

  • Recent build: 75231 (Post Merge Action)
  • Seed: E72A825249AC1D94
  • Reproduced locally: No
  • First seen: 2024-03-26
  • Total unique builds affected: 234
  • Monthly pattern: Persistent flake with a spike to 46 builds in Apr 2025. Consistently 7-20 builds/month in recent months. 15 builds in Apr 2026.
  • Assessment: Stable chronic flake. The highest absolute build count of any test in this report.

3. ConcurrentSeqNoVersioningIT.testSeqNoCASLinearizability

  • Recent build: 75218 (Timer/main)
  • Seed: 220320CBD3478DB4
  • Reproduced locally: No
  • First seen: 2024-10-03
  • Total unique builds affected: 89
  • Monthly pattern: Steady 1-6 builds/month through most of its history, but spiked to 10 in Feb 2026 and 24 in Apr 2026 (month not yet complete).
  • Assessment: Worsening. The April 2026 spike is significantly above the historical baseline and suggests the flake rate is increasing.

4. EhCacheDiskCacheTests.testComputeIfAbsentConcurrently

  • Recent build: 75190 (Timer/main)
  • Seed: 12F821F3353AE6DB
  • Reproduced locally: No
  • First seen: 2024-03-28
  • Total unique builds affected: 52
  • Monthly pattern: Was 0-5 builds/month historically. Jumped to 13 builds in Apr 2026 (month not yet complete), the highest month ever.
  • Assessment: Worsening. The April 2026 spike is a clear escalation from the historical baseline of ~1-3 builds/month.

5. ReindexIT.testReindexTask

  • Recent build: 75226 (Timer/main)
  • Seed: 9BE37F4BC2D937EC
  • Reproduced locally: No
  • First seen: 2024-06-18
  • Total unique builds affected: 22
  • Monthly pattern: Low-rate flake, typically 0-2 builds/month. Appears sporadically across the full history.
  • Assessment: Stable low-rate flake. Not worsening but persistent.

6. IngestFromKafkaIT.testAllActiveOffsetBasedLag

  • Recent build: 75248 (Timer/main)
  • Seed: D91EB0F3C801A290
  • Reproduced locally: No
  • First seen: 2025-10-15
  • Total unique builds affected: 20
  • Monthly pattern: First appeared Oct 2025 (2 builds), went quiet Nov 2025 - Feb 2026, then 8 builds in Mar 2026 and 10 in Apr 2026.
  • Assessment: Worsening. Relatively new test that has rapidly increased in flake frequency.

7. InternalSnapshotsInfoServiceTests.testErroneousSnapshotShardSizes

  • Recent build: 75223 (Timer/main)
  • Seed: DE377A74EBA95E56
  • Reproduced locally: No
  • First seen: 2024-04-08
  • Total unique builds affected: 15
  • Monthly pattern: Low-rate flake, typically 0-2 builds/month. Had a small cluster of 4 builds in Apr 2024 when first introduced, then sporadic single failures.
  • Assessment: Stable low-rate flake. Long-lived but infrequent.

8. QueueResizableOpenSearchThreadPoolExecutorTests.testResizeQueueSameSize

  • Recent build: 75243 (Post Merge Action)
  • Seed: 8559E51248A3EA1C
  • Reproduced locally: No
  • First seen: 2024-06-10
  • Total unique builds affected: 9
  • Monthly pattern: Very low-rate flake. 9 failures spread across nearly 2 years. Typically 0-2 per month with many months having zero failures.
  • Assessment: Stable low-rate flake. Rare but persistent.

Methodology

  • Data source: gradle-check-* indices in the OpenSearch metrics cluster at metrics.opensearch.org
  • Committed-code filter: invoke_type=Timer AND git_reference=main OR invoke_type=Post Merge Action AND pull_request_title starts with "push trigger main"
  • Historical data: All build types (including PR builds) queried across all available monthly indices
  • Local reproduction: Each test was run with the exact seed from the failing CI build using the REPRODUCE WITH command from the Jenkins console log. Environment: OpenJDK 25.0.2+10-LTS (Temurin), Linux.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions