You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TestProtobufCodecFuzz (integration/query_fuzz_test.go around the TestProtobufCodecFuzz function) compares two single-binary Cortex instances loaded with the same generated time series: cortex-1 uses the default JSON query codec, while cortex-2 sets -api.querier-default-codec=protobuf. The test then runs identical promqlsmith queries through both clients and expects identical results, since the only intended difference is the response encoding.
A small but recurring number of TestProtobufCodecFuzz failures have been observed in PR CI (≈2 over the last 18 days per the prior integration_query_fuzz flake census). However, no actual failure log was captured in this investigation pass — the sampled failing integration_query_fuzz runs all had TestProtobufCodecFuzz passing while other tests in the same job failed. This issue is therefore filed as a tracking issue to capture the next failure with enough detail to classify the root cause; it does not yet assert a concrete bug.
Most recent occurrence
Unknown — no concrete failing run/job URL for TestProtobufCodecFuzz was captured during this investigation. The prior census found 2 occurrences in the 18-day window but did not preserve their specific run links separately from the other tests' samples.
Failure excerpt
no log captured during this investigation; will update once a failure is observed
Empirical flake rate (rough, from prior census)
Test
PR-CI occurrences (last 18 days)
Master-CI occurrences
TestProtobufCodecFuzz
~2 (unverified link)
0
integration_query_fuzz job (any test)
~15
0
Approximate TestProtobufCodecFuzz PR-CI rate <1%. Count is unconfirmed.
Sample prior failures
No TestProtobufCodecFuzz-confirmed failure links recovered in this pass. Sampled integration_query_fuzz failing runs are linked below for reference, but each was caused by a different test:
HYPOTHESIS — unverified. The test setup pushes identical series into two Cortex single-binary instances and only varies the query response codec; both ends invoke the same upstream Prometheus engine. There is no expected semantic divergence, so any failure must come from one of:
Comparator sensitivity to encoded/decoded result ordering. If JSON and protobuf codecs produce series in different orders, but the test comparator handles ordering correctly (comparer uses cmp.Equal on model.Value, which canonicalises), this would not surface — but float precision could.
Readiness/freshness race between the two independent Cortex instances despite waitUntilReady. Less plausible since both share the same input data — both should be at the same state.
A real protobuf codec edge case (e.g. NaN handling, special float marshalling) for a specific PromQL result shape.
Map-iteration ordering inside the response codec producing a series-list-order difference that the comparer doesn't fully canonicalise.
Without a captured failing query + err1/err2 or res1/res2, none of the above can be confirmed.
Proposed fix
Improve observability first, then fix once classified:
Preserve and emit the promqlsmith seed on failure. The runner already uses rand.New(rand.NewSource(now.Unix())); log that seed in the test failure output so a failing case can be reproduced locally.
Ensure the failing query plus err1/err2 or res1/res2 is visible in CI logs — current t.Logf calls already do this, but verify the output doesn't get truncated by GitHub Actions log limits when 2000 cases run. Possibly: print the failing case to a dedicated artifact file uploaded on failure.
Re-run targeted CI/local loops for TestProtobufCodecFuzz on both amd64 and arm64 until at least one concrete failure is captured.
Only then choose the fix: codec normalization, comparator adjustment, or readiness synchronization.
Why not other approaches
Disable the test pre-emptively. Loses coverage of a real user-facing API codec path with no confirmed defect.
Relax the comparator generically. Without knowing what category of mismatch we're seeing, this risks masking a real protobuf codec bug.
Capture at least one full TestProtobufCodecFuzz CI failure with failing query and either err1/err2 or res1/res2 reproduced in the issue.
Classify the failure as: codec correctness, comparator/order-only, freshness/readiness, float precision, or another concrete cause.
Add a targeted fix or narrow follow-up issue based on the captured evidence.
Confirm no TestProtobufCodecFuzz failures across a representative sample after the fix.
This issue is intentionally light on specifics because the failure mode is unverified at the time of filing. Future occurrences should be linked here; reclassifying or splitting the issue is expected once evidence accumulates.
Summary
TestProtobufCodecFuzz(integration/query_fuzz_test.goaround theTestProtobufCodecFuzzfunction) compares two single-binary Cortex instances loaded with the same generated time series:cortex-1uses the default JSON query codec, whilecortex-2sets-api.querier-default-codec=protobuf. The test then runs identicalpromqlsmithqueries through both clients and expects identical results, since the only intended difference is the response encoding.A small but recurring number of
TestProtobufCodecFuzzfailures have been observed in PR CI (≈2 over the last 18 days per the priorintegration_query_fuzzflake census). However, no actual failure log was captured in this investigation pass — the sampled failingintegration_query_fuzzruns all hadTestProtobufCodecFuzzpassing while other tests in the same job failed. This issue is therefore filed as a tracking issue to capture the next failure with enough detail to classify the root cause; it does not yet assert a concrete bug.Most recent occurrence
Unknown — no concrete failing run/job URL for
TestProtobufCodecFuzzwas captured during this investigation. The prior census found 2 occurrences in the 18-day window but did not preserve their specific run links separately from the other tests' samples.Failure excerpt
Empirical flake rate (rough, from prior census)
TestProtobufCodecFuzzintegration_query_fuzzjob (any test)Approximate
TestProtobufCodecFuzzPR-CI rate <1%. Count is unconfirmed.Sample prior failures
No
TestProtobufCodecFuzz-confirmed failure links recovered in this pass. Sampledintegration_query_fuzzfailing runs are linked below for reference, but each was caused by a different test:TestParquetFuzz)TestExpandedPostingsCacheFuzz)TestVerticalShardingFuzz)TestPrometheusCompatibilityQueryFuzz)Root cause
HYPOTHESIS — unverified. The test setup pushes identical series into two Cortex single-binary instances and only varies the query response codec; both ends invoke the same upstream Prometheus engine. There is no expected semantic divergence, so any failure must come from one of:
comparerusescmp.Equalonmodel.Value, which canonicalises), this would not surface — but float precision could.waitUntilReady. Less plausible since both share the same input data — both should be at the same state.Without a captured failing query +
err1/err2orres1/res2, none of the above can be confirmed.Proposed fix
Improve observability first, then fix once classified:
promqlsmithseed on failure. The runner already usesrand.New(rand.NewSource(now.Unix())); log that seed in the test failure output so a failing case can be reproduced locally.err1/err2orres1/res2is visible in CI logs — currentt.Logfcalls already do this, but verify the output doesn't get truncated by GitHub Actions log limits when 2000 cases run. Possibly: print the failing case to a dedicated artifact file uploaded on failure.TestProtobufCodecFuzzon both amd64 and arm64 until at least one concrete failure is captured.Why not other approaches
TestParquetFuzz(Flaky test: TestParquetFuzz — topk/bottomk tie non-determinism not fully filtered #7543) and apply the sameWithEnabledAggrsfix.TestProtobufCodecFuzzalready passesWithEnabledAggrs(enabledAggrs)and compares Cortex-to-Cortex (not Cortex-to-Prometheus), so the topk-tie hole doesn't apply.Acceptance criteria
TestProtobufCodecFuzzCI failure with failing query and eithererr1/err2orres1/res2reproduced in the issue.TestProtobufCodecFuzzfailures across a representative sample after the fix.This issue is intentionally light on specifics because the failure mode is unverified at the time of filing. Future occurrences should be linked here; reclassifying or splitting the issue is expected once evidence accumulates.