[fix](search) inject MATCH_ALL_DOCS for multi-MUST_NOT queries in lucene mode#60891
Open
airborne12 wants to merge 1 commit intoapache:masterfrom
Open
[fix](search) inject MATCH_ALL_DOCS for multi-MUST_NOT queries in lucene mode#60891airborne12 wants to merge 1 commit intoapache:masterfrom
airborne12 wants to merge 1 commit intoapache:masterfrom
Conversation
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
…ene mode When all terms in a boolean query are MUST_NOT (e.g., "NOT a AND NOT b"), Lucene's BooleanQuery.rewrite() produces a pure-negation query that matches nothing. ES handles this by injecting a MatchAllDocsQuery with SHOULD occur. This fix detects the all-MUST_NOT case after applyLuceneBooleanLogic() and injects MATCH_ALL_DOCS(SHOULD) with minimum_should_match=1, matching ES query_string semantics for pure negation queries. Previously only single-term MUST_NOT was handled (the existing single-term rewrite). Multi-term all-MUST_NOT queries like "NOT a AND NOT b" or "NOT a NOT b" (with op=and) were not covered.
63212e0 to
e61006d
Compare
Member
Author
|
run buildall |
TPC-H: Total hot run time: 28727 ms |
TPC-DS: Total hot run time: 184085 ms |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Related PR: #60814
Problem Summary:
In search() lucene mode, when all terms in a boolean query are MUST_NOT
(e.g.,
NOT a AND NOT borNOT a NOT bwith default_operator=AND),the query incorrectly returns all documents instead of returning all
documents EXCEPT those matching the negated terms.
Root cause: Lucene's BooleanQuery with only MUST_NOT clauses matches
nothing (by design). ES handles this by injecting a MatchAllDocsQuery
with SHOULD occur. Doris only handled the single-term MUST_NOT case
but not multi-term all-MUST_NOT queries.
Fix: After
applyLuceneBooleanLogic(), detect if ALL terms are MUST_NOTand inject
MATCH_ALL_DOCS(SHOULD)withminimum_should_match=1.Release note
Fix search() lucene mode returning incorrect results for multi-MUST_NOT queries like "NOT a AND NOT b".
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)