Skip to content

Flaky test: TestProtobufCodecFuzz — unconfirmed root cause (tracking issue) #7548

@sandy2008

Description

@sandy2008

Summary

TestProtobufCodecFuzz (integration/query_fuzz_test.go around the TestProtobufCodecFuzz function) compares two single-binary Cortex instances loaded with the same generated time series: cortex-1 uses the default JSON query codec, while cortex-2 sets -api.querier-default-codec=protobuf. The test then runs identical promqlsmith queries through both clients and expects identical results, since the only intended difference is the response encoding.

A small but recurring number of TestProtobufCodecFuzz failures have been observed in PR CI (≈2 over the last 18 days per the prior integration_query_fuzz flake census). However, no actual failure log was captured in this investigation pass — the sampled failing integration_query_fuzz runs all had TestProtobufCodecFuzz passing while other tests in the same job failed. This issue is therefore filed as a tracking issue to capture the next failure with enough detail to classify the root cause; it does not yet assert a concrete bug.

Most recent occurrence

Unknown — no concrete failing run/job URL for TestProtobufCodecFuzz was captured during this investigation. The prior census found 2 occurrences in the 18-day window but did not preserve their specific run links separately from the other tests' samples.

Failure excerpt

no log captured during this investigation; will update once a failure is observed

Empirical flake rate (rough, from prior census)

Test PR-CI occurrences (last 18 days) Master-CI occurrences
TestProtobufCodecFuzz ~2 (unverified link) 0
integration_query_fuzz job (any test) ~15 0

Approximate TestProtobufCodecFuzz PR-CI rate <1%. Count is unconfirmed.

Sample prior failures

No TestProtobufCodecFuzz-confirmed failure links recovered in this pass. Sampled integration_query_fuzz failing runs are linked below for reference, but each was caused by a different test:

Root cause

HYPOTHESIS — unverified. The test setup pushes identical series into two Cortex single-binary instances and only varies the query response codec; both ends invoke the same upstream Prometheus engine. There is no expected semantic divergence, so any failure must come from one of:

  1. Comparator sensitivity to encoded/decoded result ordering. If JSON and protobuf codecs produce series in different orders, but the test comparator handles ordering correctly (comparer uses cmp.Equal on model.Value, which canonicalises), this would not surface — but float precision could.
  2. Readiness/freshness race between the two independent Cortex instances despite waitUntilReady. Less plausible since both share the same input data — both should be at the same state.
  3. A real protobuf codec edge case (e.g. NaN handling, special float marshalling) for a specific PromQL result shape.
  4. Map-iteration ordering inside the response codec producing a series-list-order difference that the comparer doesn't fully canonicalise.

Without a captured failing query + err1/err2 or res1/res2, none of the above can be confirmed.

Proposed fix

Improve observability first, then fix once classified:

  1. Preserve and emit the promqlsmith seed on failure. The runner already uses rand.New(rand.NewSource(now.Unix())); log that seed in the test failure output so a failing case can be reproduced locally.
  2. Ensure the failing query plus err1/err2 or res1/res2 is visible in CI logs — current t.Logf calls already do this, but verify the output doesn't get truncated by GitHub Actions log limits when 2000 cases run. Possibly: print the failing case to a dedicated artifact file uploaded on failure.
  3. Re-run targeted CI/local loops for TestProtobufCodecFuzz on both amd64 and arm64 until at least one concrete failure is captured.
  4. Only then choose the fix: codec normalization, comparator adjustment, or readiness synchronization.

Why not other approaches

  • Disable the test pre-emptively. Loses coverage of a real user-facing API codec path with no confirmed defect.
  • Relax the comparator generically. Without knowing what category of mismatch we're seeing, this risks masking a real protobuf codec bug.
  • Assume the same root cause as TestParquetFuzz (Flaky test: TestParquetFuzz — topk/bottomk tie non-determinism not fully filtered #7543) and apply the same WithEnabledAggrs fix. TestProtobufCodecFuzz already passes WithEnabledAggrs(enabledAggrs) and compares Cortex-to-Cortex (not Cortex-to-Prometheus), so the topk-tie hole doesn't apply.

Acceptance criteria

  • Capture at least one full TestProtobufCodecFuzz CI failure with failing query and either err1/err2 or res1/res2 reproduced in the issue.
  • Classify the failure as: codec correctness, comparator/order-only, freshness/readiness, float precision, or another concrete cause.
  • Add a targeted fix or narrow follow-up issue based on the captured evidence.
  • Confirm no TestProtobufCodecFuzz failures across a representative sample after the fix.

This issue is intentionally light on specifics because the failure mode is unverified at the time of filing. Future occurrences should be linked here; reclassifying or splitting the issue is expected once evidence accumulates.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions