branch-4.0: [refactor](search) Refactor SearchDslParser to single-phase ANTLR parsing and fix ES compatibility issues #60654#61013
Closed
airborne12 wants to merge 2 commits intoapache:branch-4.0from
Closed
Conversation
…pache#59747) ### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#59394 Problem Summary: The search DSL should only recognize uppercase `AND`, `OR`, `NOT` as boolean operators in search lucene boolean mode. Previously, lowercase `and`, `or`, `not` were also treated as operators, which does not conform to the specification. This PR makes the boolean operators case-sensitive: - Only uppercase `AND`, `OR`, `NOT` are recognized as operators - Lowercase `and`, `or`, `not` are now treated as regular search terms - Using lowercase operators in DSL will result in a parse error ### Release note Make search DSL boolean operators (AND/OR/NOT) case-sensitive in lucene boolean mode.
…sing and fix ES compatibility issues (apache#60654) Problem Summary: The `search()` function's DSL parser had multiple ES compatibility issues and used a two-phase parsing approach (manual pre-parse + ANTLR) that was error-prone. This PR refactors the parser and fixes several bugs: 1. **SearchDslParser refactoring**: Consolidated from two-phase (manual pre-parse + ANTLR) to single-phase ANTLR parsing. The ANTLR grammar now handles all DSL syntax directly, eliminating the fragile manual pre-parse layer. This fixes issues with operator precedence, grouping, and edge cases. 2. **ANTLR grammar improvements**: Updated `SearchLexer.g4` and `SearchParser.g4` to properly handle quoted phrases, field-qualified expressions, prefix/wildcard/regexp patterns, range queries, and boolean operators with correct precedence. 3. **minimum_should_match pipeline**: Added `default_operator` and `minimum_should_match` fields to `TSearchParam` thrift, passing them from FE `SearchPredicate` through to BE `function_search`. When `minimum_should_match > 0`, uses `OccurBooleanQuery` for proper Lucene-style boolean query semantics. 4. **Wildcard/Prefix/Regexp case-sensitivity**: Wildcard and PREFIX patterns are now lowercased when the index has `parser + lower_case=true` (matching ES query_string normalizer behavior). REGEXP patterns are NOT lowercased (matching ES regex behavior where patterns bypass analysis). 5. **MATCH_ALL_DOCS support**: Added `MATCH_ALL_DOCS` clause type for standalone `*` queries and pure NOT query rewrites. Enhanced `AllQuery` with deferred `max_doc` from `context.segment_num_rows` and nullable field support via `NullableScorer`. 6. **BE fixes**: - `regexp_weight._max_expansions`: Changed from 50 to 0 (unlimited) to prevent PREFIX queries from missing documents - `occur_boolean_weight`: Fixed swap→append bug when all SHOULD clauses must match, preserving existing MUST scorers - Variant subcolumn `index_properties` propagation for proper analyzer selection - `lower_case` default handling: inverted index `lower_case` defaults to `"true"` when a parser is configured
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
This was referenced Mar 3, 2026
Closed
Closed
Member
Author
|
Superseded by squashed backport PR #61028 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SearchPredicateconstructor to branch-4.0's nullable handling (setNullableFromNereids()pattern instead of constructor parameter)Merge Order
This is PR 2/12 in the search() function pick chain. Depends on #61012 (#59747).
Check List (For Author)
Test
Behavior changed:
Does this need documentation?