Flaky test: TestProtobufCodecFuzz — unconfirmed root cause (tracking issue)

## Summary

`TestProtobufCodecFuzz` (`integration/query_fuzz_test.go` around the `TestProtobufCodecFuzz` function) compares two single-binary Cortex instances loaded with the same generated time series: `cortex-1` uses the default JSON query codec, while `cortex-2` sets `-api.querier-default-codec=protobuf`. The test then runs identical `promqlsmith` queries through both clients and expects identical results, since the only intended difference is the response encoding.

A small but recurring number of `TestProtobufCodecFuzz` failures have been observed in PR CI (≈2 over the last 18 days per the prior `integration_query_fuzz` flake census). However, **no actual failure log was captured** in this investigation pass — the sampled failing `integration_query_fuzz` runs all had `TestProtobufCodecFuzz` passing while other tests in the same job failed. This issue is therefore filed as a **tracking issue** to capture the next failure with enough detail to classify the root cause; it does not yet assert a concrete bug.

## Most recent occurrence

Unknown — no concrete failing run/job URL for `TestProtobufCodecFuzz` was captured during this investigation. The prior census found 2 occurrences in the 18-day window but did not preserve their specific run links separately from the other tests' samples.

## Failure excerpt

```text
no log captured during this investigation; will update once a failure is observed
```

## Empirical flake rate (rough, from prior census)

| Test | PR-CI occurrences (last 18 days) | Master-CI occurrences |
|------|-----|----|
| `TestProtobufCodecFuzz` | ~2 (unverified link) | 0 |
| `integration_query_fuzz` job (any test) | ~15 | 0 |

Approximate `TestProtobufCodecFuzz` PR-CI rate <1%. Count is unconfirmed.

## Sample prior failures

No `TestProtobufCodecFuzz`-confirmed failure links recovered in this pass. Sampled `integration_query_fuzz` failing runs are linked below for reference, but each was caused by a different test:

- https://github.com/cortexproject/cortex/actions/runs/26265920573/job/77309556574 (`TestParquetFuzz`)
- https://github.com/cortexproject/cortex/actions/runs/26188714336/job/77052093282 (`TestExpandedPostingsCacheFuzz`)
- https://github.com/cortexproject/cortex/actions/runs/26184234190/job/77036311567 (`TestVerticalShardingFuzz`)
- https://github.com/cortexproject/cortex/actions/runs/26005855514/job/76437467814 (`TestPrometheusCompatibilityQueryFuzz`)

## Root cause

**HYPOTHESIS — unverified.** The test setup pushes identical series into two Cortex single-binary instances and only varies the query response codec; both ends invoke the same upstream Prometheus engine. There is no expected semantic divergence, so any failure must come from one of:

1. **Comparator sensitivity to encoded/decoded result ordering.** If JSON and protobuf codecs produce series in different orders, but the test comparator handles ordering correctly (`comparer` uses `cmp.Equal` on `model.Value`, which canonicalises), this would not surface — but float precision could.
2. **Readiness/freshness race between the two independent Cortex instances** despite `waitUntilReady`. Less plausible since both share the same input data — both should be at the same state.
3. **A real protobuf codec edge case** (e.g. NaN handling, special float marshalling) for a specific PromQL result shape.
4. **Map-iteration ordering inside the response codec** producing a series-list-order difference that the comparer doesn't fully canonicalise.

Without a captured failing query + `err1`/`err2` or `res1`/`res2`, none of the above can be confirmed.

## Proposed fix

Improve observability first, then fix once classified:

1. **Preserve and emit the `promqlsmith` seed on failure.** The runner already uses `rand.New(rand.NewSource(now.Unix()))`; log that seed in the test failure output so a failing case can be reproduced locally.
2. **Ensure the failing query plus `err1`/`err2` or `res1`/`res2` is visible in CI logs** — current `t.Logf` calls already do this, but verify the output doesn't get truncated by GitHub Actions log limits when 2000 cases run. Possibly: print the failing case to a dedicated artifact file uploaded on failure.
3. **Re-run targeted CI/local loops** for `TestProtobufCodecFuzz` on both amd64 and arm64 until at least one concrete failure is captured.
4. **Only then** choose the fix: codec normalization, comparator adjustment, or readiness synchronization.

## Why not other approaches

- **Disable the test pre-emptively.** Loses coverage of a real user-facing API codec path with no confirmed defect.
- **Relax the comparator generically.** Without knowing what category of mismatch we're seeing, this risks masking a real protobuf codec bug.
- **Assume the same root cause as `TestParquetFuzz` (#7543) and apply the same `WithEnabledAggrs` fix.** `TestProtobufCodecFuzz` already passes `WithEnabledAggrs(enabledAggrs)` and compares Cortex-to-Cortex (not Cortex-to-Prometheus), so the topk-tie hole doesn't apply.

## Acceptance criteria

- [ ] Capture at least one full `TestProtobufCodecFuzz` CI failure with failing query and either `err1`/`err2` or `res1`/`res2` reproduced in the issue.
- [ ] Classify the failure as: codec correctness, comparator/order-only, freshness/readiness, float precision, or another concrete cause.
- [ ] Add a targeted fix or narrow follow-up issue based on the captured evidence.
- [ ] Confirm no `TestProtobufCodecFuzz` failures across a representative sample after the fix.

---

*This issue is intentionally light on specifics because the failure mode is unverified at the time of filing. Future occurrences should be linked here; reclassifying or splitting the issue is expected once evidence accumulates.*


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky test: TestProtobufCodecFuzz — unconfirmed root cause (tracking issue) #7548

Summary

Most recent occurrence

Failure excerpt

Empirical flake rate (rough, from prior census)

Sample prior failures

Root cause

Proposed fix

Why not other approaches

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Test	PR-CI occurrences (last 18 days)	Master-CI occurrences
`TestProtobufCodecFuzz`	~2 (unverified link)	0
`integration_query_fuzz` job (any test)	~15	0

Flaky test: TestProtobufCodecFuzz — unconfirmed root cause (tracking issue) #7548

Description

Summary

Most recent occurrence

Failure excerpt

Empirical flake rate (rough, from prior census)

Sample prior failures

Root cause

Proposed fix

Why not other approaches

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions