[branch-4.0][fix](search) reject Lucene-syntax search on columns without inverted index#63857
Open
airborne12 wants to merge 3 commits into
Open
[branch-4.0][fix](search) reject Lucene-syntax search on columns without inverted index#63857airborne12 wants to merge 3 commits into
airborne12 wants to merge 3 commits into
Conversation
… index
Issue Number: close #N/A (Jira CIR-20006)
Problem Summary:
SEARCH (Lucene syntax) predicates against columns that have no inverted
index silently fall back to an empty bitmap on BE (vsearch.cpp and
function_search.cpp only log a WARNING then return Status::OK() with an
empty result), making the query look like "no rows matched". That is
indistinguishable from a successful query that simply found nothing and
misleads users.
Validate at planning time in RewriteSearchToSlots, matching the existing
"column does not exist" behavior:
- Normal columns: require OlapTable.getInvertedIndex(column, null) != null.
- Variant subcolumns (parent.path): require any INVERTED index whose
first column equals the parent variant column; the concrete subcolumn
binding is still resolved per-segment in BE, consistent with the
is_variant_sub branch in function_search.cpp.
Also harden OlapTable.getInvertedIndex against NPE when the table has
no TableIndexes set (returns null instead of dereferencing).
SEARCH() with Lucene syntax now throws AnalysisException at planning
time when the referenced column has no inverted index, with guidance to
add one via ALTER TABLE ... ADD INDEX ... USING INVERTED. Previously
such queries silently returned zero rows.
- Test:
- Unit Test: RewriteSearchToSlotsTest updated and extended
(testRewriteSearchThrowsWhenColumnHasNoInvertedIndex,
testRewriteSearchSucceedsWhenColumnHasInvertedIndex,
testRewriteSearchHandlesCaseInsensitiveField switched to a table
with an inverted index on name).
- Behavior changed: Yes - previously silent FALSE now becomes a clear
AnalysisException at planning time.
- Does this need documentation: No
(cherry picked from commit a4a9cf8)
…idation ### What problem does this PR solve? Issue Number: close #N/A (follow-up to Jira CIR-20006 / PR apache#63637) Problem Summary: Test 22 of `regression-test/suites/search/test_search_function.groovy` covered "SEARCH on a column without inverted index" and asserted the error message contained the old BE-side text `"SearchExpr should not be executed without inverted index"`. After the fix for CIR-20006, that scenario is now rejected at FE planning time in `RewriteSearchToSlots.checkInvertedIndexExists`, with an `AnalysisException` whose message contains `"inverted index"` and names the offending column. Update the assertion so the test passes on a build that includes the FE-side check (and also explicitly verifies the catch block actually fired, instead of silently passing if the SQL unexpectedly succeeds). ### Release note None (test-only change). ### Check List (For Author) - Test: - Regression-test only: regression-test/suites/search/test_search_function.groovy - Behavior changed: No - Does this need documentation: No (cherry picked from commit 1102975)
### What problem does this PR solve? Issue Number: close #N/A Related PR: apache#63637 Problem Summary: Variant subcolumn SEARCH rewrites resolved the parent slot case-insensitively, but the inverted-index validation still used the parent name exactly as written in the DSL. A valid predicate such as SEARCH('V.foo:bar') on a table with variant column v could therefore fail validation with Column 'V' not found. Use the resolved parent slot name for validation and normalize the field binding to the canonical parent path. ### Release note SEARCH() on variant subcolumns now resolves the parent column name case-insensitively during inverted-index validation. ### Check List (For Author) - Test: Unit Test - ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.RewriteSearchToSlotsTest - Behavior changed: Yes - valid variant SEARCH predicates with differently-cased parent column names are no longer rejected. - Does this need documentation: No (cherry picked from commit d4d6f38)
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Member
Author
|
run buildall |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposed changes
Backport #63637 to branch-4.0.
This rejects Lucene-syntax
SEARCHon columns without an inverted index and aligns the regression/unit coverage with the new FE-side validation. Conflict resolution kept the branch-4.0SearchDslParserpackage and adaptedIndexTypetoorg.apache.doris.analysis.IndexDef.IndexType.Testing
./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.RewriteSearchToSlotsTestRewriteSearchToSlotsTest: 16 tests passed