Problem
TestDistributorQuerier_QueryIngestersWithinBoundary/maxT_well_after_lookback_boundary is flaky, particularly on slow ARM CI runners.
Example failure: https://github.com/cortexproject/cortex/actions/runs/24256703215/job/70829934477
--- FAIL: TestDistributorQuerier_QueryIngestersWithinBoundary (0.00s)
--- FAIL: TestDistributorQuerier_QueryIngestersWithinBoundary/maxT_well_after_lookback_boundary (0.00s)
distributor_queryable_test.go:638:
Error Trace: distributor_queryable_test.go:638
Error: "[]" should have 1 item(s), but has 0
Test: TestDistributorQuerier_QueryIngestersWithinBoundary/maxT_well_after_lookback_boundary
Messages: should manipulate when maxT is well after boundary
Root Cause
The test captures time.Now() at setup and uses it to compute query boundaries relative to a 1-hour lookback window. However, distributorQuerier.Select() calls time.Now() again internally to compute the ingester query boundary (pkg/querier/distributor_queryable.go:120).
The failing subtest "maxT well after lookback boundary" sets queryMaxT = testNow - 50min. Inside Select, the boundary is computed as realNow - 1h. If realNow has drifted more than 10 seconds past testNow (due to slow test execution on ARM runners), then minT > maxT, the query short-circuits with an empty result, and no distributor call is made.
The 10-second margin in the test case is too tight for slow CI environments.
This test was introduced in #7323.
Possible Solutions
- Inject a clock — Have
distributorQuerier accept a now function (defaulting to time.Now) so tests can control time.
- Increase the margin — Change
-lookback + 10*time.Second to a larger value like -lookback + 5*time.Minute to tolerate clock drift.
Option 1 is the more robust fix. Option 2 is a quick mitigation.
Affected Files
pkg/querier/distributor_queryable_test.go (test setup at line 606-612)
pkg/querier/distributor_queryable.go (wall-clock call at line 120)
Problem
TestDistributorQuerier_QueryIngestersWithinBoundary/maxT_well_after_lookback_boundaryis flaky, particularly on slow ARM CI runners.Example failure: https://github.com/cortexproject/cortex/actions/runs/24256703215/job/70829934477
Root Cause
The test captures
time.Now()at setup and uses it to compute query boundaries relative to a 1-hour lookback window. However,distributorQuerier.Select()callstime.Now()again internally to compute the ingester query boundary (pkg/querier/distributor_queryable.go:120).The failing subtest "maxT well after lookback boundary" sets
queryMaxT = testNow - 50min. InsideSelect, the boundary is computed asrealNow - 1h. IfrealNowhas drifted more than 10 seconds pasttestNow(due to slow test execution on ARM runners), thenminT > maxT, the query short-circuits with an empty result, and no distributor call is made.The 10-second margin in the test case is too tight for slow CI environments.
This test was introduced in #7323.
Possible Solutions
distributorQuerieraccept anowfunction (defaulting totime.Now) so tests can control time.-lookback + 10*time.Secondto a larger value like-lookback + 5*time.Minuteto tolerate clock drift.Option 1 is the more robust fix. Option 2 is a quick mitigation.
Affected Files
pkg/querier/distributor_queryable_test.go(test setup at line 606-612)pkg/querier/distributor_queryable.go(wall-clock call at line 120)