Skip to content

Flaky test: TestCompactor_DeleteLocalSyncFiles (arm64) recurs after #7567 — 30s CompactionRunsCompleted>=2 poll times out #7608

@sandy2008

Description

@sandy2008

AI Tool Usage Notice
If you used an AI tool to help draft this issue,
please make sure you have reviewed and validated all content before submitting.
You are responsible for the accuracy and quality of everything in this report.
Low-quality or unreviewed AI-generated submissions may be closed without further investigation.
See our Generative AI Contribution Policy for details.

Describe the bug

TestCompactor_DeleteLocalSyncFiles (pkg/compactor) is flaky again on arm64, despite the fix in #7567 (which closed #7565). It now times out on the post-fix poll that #7567 added:

--- FAIL: TestCompactor_DeleteLocalSyncFiles (32.80s)
    compactor_test.go:1855: expected true, got false
FAIL	github.com/cortexproject/cortex/pkg/compactor

The poll added by #7567:

// pkg/compactor/compactor_test.go:1855
// Wait for at least two completed cycles so we sample after a steady-state
// ownership cycle ...
cortex_testutil.Poll(t, 30*time.Second, true, func() any {
	return prom_testutil.ToFloat64(c2.CompactionRunsCompleted) >= 2 &&
		len(c2.listTenantsWithMetaSyncDirectories()) > 0
})

On arm64 the second compactor does not reliably reach 2 completed cycles with at least one owned tenant within 30s, so the poll added to fix the original flake itself times out. The original symptom (#7565: "Should not be zero, but was 0") is gone, but the test is still non-deterministic on slow/loaded arm64 runners.

To Reproduce

Steps to reproduce the behavior:

  1. Start Cortex (master @ 40a27ad or later — i.e. with fix(compactor): fix flaky TestCompactor_DeleteLocalSyncFiles on arm64 #7567 merged)
  2. Run repeatedly on arm64 (flaky):
    go test -count=20 -run TestCompactor_DeleteLocalSyncFiles ./pkg/compactor/
    

Expected behavior

The test passes deterministically on arm64; the second compactor reaches the required steady-state ownership cycles within the poll window (or the wait condition is made robust to arm64 CI timing).

Environment:

  • Infrastructure: GitHub Actions CI, ubuntu-24.04-arm (arm64), test-no-race job
  • Deployment tool: N/A (Go unit test)

Additional Context

Observed on master CI run (2026-06-08), i.e. after #7567 merged: https://github.com/cortexproject/cortex/actions/runs/27123579544 (job test-no-race (arm64)).

Related: #7565 (original report, closed), #7567 (fix that proved insufficient).

Filed from CI failure-log analysis with AI assistance; the run link and compactor_test.go:1855 were reviewed and verified against master before submitting.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions