Skip to content

Flaky test: TestPartitionCompactor_ShouldCompactOnlyUsersOwnedByTheInstanceOnShardingEnabledAndMultipleInstancesRunning — 60s CompactionRunsCompleted poll times out #7607

@sandy2008

Description

@sandy2008

AI Tool Usage Notice
If you used an AI tool to help draft this issue,
please make sure you have reviewed and validated all content before submitting.
You are responsible for the accuracy and quality of everything in this report.
Low-quality or unreviewed AI-generated submissions may be closed without further investigation.
See our Generative AI Contribution Policy for details.

Describe the bug

TestPartitionCompactor_ShouldCompactOnlyUsersOwnedByTheInstanceOnShardingEnabledAndMultipleInstancesRunning (pkg/compactor) intermittently fails when a compactor does not complete a compaction run within the 60s poll window:

--- FAIL: TestPartitionCompactor_ShouldCompactOnlyUsersOwnedByTheInstanceOnShardingEnabledAndMultipleInstancesRunning (77.67s)
    compactor_paritioning_test.go:1217: expected true, got false

The assertion:

// pkg/compactor/compactor_paritioning_test.go:1217
cortex_testutil.Poll(t, 60*time.Second, true, func() any {
	return prom_testutil.ToFloat64(c.CompactionRunsCompleted) >= 1
})

With multiple compactors under shuffle-sharding, ring convergence at startup is sometimes slow enough on loaded CI runners that a compactor does not reach one completed run within 60s. (Note the non-partition sibling test, TestCompactor_ShouldCompactOnlyUsersOwnedByTheInstanceOnShardingEnabledAndMultipleInstancesRunning, uses a 120s poll for the same check.) This is the same class of sharding/ring-convergence startup-timing flake as the repeatedly-adjusted compactor ownership tests (e.g. #7486, #7503), but the partition variant has no dedicated issue.

To Reproduce

Steps to reproduce the behavior:

  1. Start Cortex (recent master)
  2. Run repeatedly (flaky):
    go test -count=20 -run TestPartitionCompactor_ShouldCompactOnlyUsersOwnedByTheInstanceOnShardingEnabledAndMultipleInstancesRunning ./pkg/compactor/
    

Expected behavior

Each compactor reliably completes at least one compaction run within the poll window and the test passes deterministically (or the poll timeout accounts for realistic ring-convergence time, matching the non-partition variant's 120s).

Environment:

  • Infrastructure: GitHub Actions CI, ubuntu-24.04 (amd64), test job
  • Deployment tool: N/A (Go unit test)

Additional Context

Observed on CI (2026-05-30): https://github.com/cortexproject/cortex/actions/runs/26632776611 (job test (amd64)). The failure is intermittent — the test passes on the large majority of runs.

Filed from CI failure-log analysis with AI assistance; the run link and compactor_paritioning_test.go:1217 were reviewed and verified against master before submitting.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions