Avoid sending all term filters to the tag filter#53
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR reduces metastore load caused by very large tag-filter ASTs (notably from large TermSet queries) by restricting extracted tag filters to the tag fields actually present in the targeted indexes, instead of extracting all term filters indiscriminately.
Changes:
- Extend
extract_tags_from_queryto accept an optional set of tag field names and prune irrelevant tag filters early in the tag-filter AST. - In root search planning, compute tag fields from the resolved index metadata and extract only relevant tag filters before listing splits.
- Update janitor and search tests to use the new
extract_tags_from_querysignature and cover cases with/without tag fields.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| quickwit/quickwit-search/src/tests.rs | Updates split-pruning tests to validate behavior when tag fields exist vs. when tag extraction is disabled via an empty tag-field set. |
| quickwit/quickwit-search/src/root.rs | Restricts tag-filter extraction to tag fields present in the targeted indexes before querying the metastore for relevant splits. |
| quickwit/quickwit-janitor/src/actors/delete_task_planner.rs | Adapts to the new extract_tags_from_query API while preserving prior behavior (extract all). |
| quickwit/quickwit-doc-mapper/src/tag_pruning.rs | Adds tag-field-aware pruning of the unsimplified tag-filter AST and expands unit test coverage for filtered extraction. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
09c9c68 to
44fc93b
Compare
44fc93b to
6b55609
Compare
Darkheir
approved these changes
Jun 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
We have observed serious degradations of the metastore performances (even OOMs) due to large TermSet queries (10k+ terms) that send huge tag filter ASTs to Postgres. This PR changes the behavior to only send the terms filters when the target index actually has tag fields, and if so only send the relevant filters.
How was this PR tested?
Added unit tests but mostly trust the CI.