Skip to content

[pull] main from grafana:main#190

Merged
pull[bot] merged 2 commits intoMu-L:mainfrom
grafana:main
Jul 29, 2025
Merged

[pull] main from grafana:main#190
pull[bot] merged 2 commits intoMu-L:mainfrom
grafana:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Jul 29, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.3)

Can you help keep this open source service alive? 💖 Please sponsor : )

charleskorn and others added 2 commits July 29, 2025 19:28
…aluator` and `Materializer` (#12122)

#### What this PR does

This PR refactors the responsibilities of some of the types in MQE, and
introduces some new types. The goal is to make implementing remote
execution in queriers of query plans created by query-frontends easier.

Main changes:
* The `Materializer` type has been introduced: it owns the
responsibility of converting a query plan into operators (which was
previously the split betwen `Engine` and `Query`)
* The `Evaluator` type has been introduced: it owns the responsibility
of evaluating an operator and passing the data to the evaluation
observer. In the future, it'll support evaluating multiple related
operators from the same query plan.
* The `Query` type now exists primarily to provide an implementation of
the `promql.Query` interface, bridging the interface that Prometheus and
its querying API expect with how MQE operates

There should be no user-visible behaviour changes.

Once we have remote execution of query plans, I expect the interactions
between types to look something like this:
* query-frontends create a query plan by calling
`Engine.NewInstantQuery` or `Engine.NewRangeQuery`, and pass some or all
of the query plan to queriers
* queriers materialize the appropriate part of the query plan with
`Materializer`, pass this to an `Evaluator`, and stream results back to
query-frontends as they are produced by the `Evaluator`

#### Which issue(s) this PR fixes or relates to

(none)

#### Checklist

- [x] Tests updated.
- [n/a] Documentation added.
- [n/a] `CHANGELOG.md` updated - the order of entries should be
`[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry
is not needed, please add the `changelog-not-needed` label to the PR.
- [n/a]
[`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md)
updated with experimental features.
#12048)

#### What this PR does

We've recently started seeing intermittent panics from our
store-gateways in our production environment, and I believe this may be
the cause. In total transparency I used Gemini to find this as I'm not
very familiar with this part of the codebase, but this certainly seems
like a valid bug, if potentially not also the root cause. I've spent a
bit of time trying to come up with a test case that can reproduce it,
but that's been a challenge and I wanted to get eyes on this.

First thousand lines of the crash, the entire logs are >100k lines since
it's every goroutine:
[Logs-2025-07-08
18_10_49.txt](https://github.com/user-attachments/files/21168977/Logs-2025-07-08.18_10_49.txt)

AI Slop Explanation:
The ensureIndexHeaderLoaded call was using the original context instead
of the errgroup context (gctx), which could cause goroutines to get
stuck in select operations when other goroutines in the same errgroup
failed or were canceled. This led to goroutine leaks as seen in the
stack trace where goroutines were stuck waiting on gate.Start() calls.

The fix ensures that ensureIndexHeaderLoaded respects the errgroup's
cancellation, allowing proper cleanup when any goroutine in the group
fails.

#### Which issue(s) this PR fixes or relates to

#### Checklist

- [ ] Tests updated.
- [ ] Documentation added. (N/A)
- [x] `CHANGELOG.md` updated - the order of entries should be
`[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry
is not needed, please add the `changelog-not-needed` label to the PR.
- [ ]
[`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md)
updated with experimental features.

Co-authored-by: Vladimir Varankin <vladimir.varankin@grafana.com>
@pull pull bot locked and limited conversation to collaborators Jul 29, 2025
@pull pull bot added the ⤵️ pull label Jul 29, 2025
@pull pull bot merged commit 7b9c5f2 into Mu-L:main Jul 29, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants