Skip to content

[branch-4.0][fix](search) reject Lucene-syntax search on columns without inverted index#63857

Open
airborne12 wants to merge 3 commits into
apache:branch-4.0from
airborne12:sel-42-pr-63637-branch-4.0
Open

[branch-4.0][fix](search) reject Lucene-syntax search on columns without inverted index#63857
airborne12 wants to merge 3 commits into
apache:branch-4.0from
airborne12:sel-42-pr-63637-branch-4.0

Conversation

@airborne12
Copy link
Copy Markdown
Member

Proposed changes

Backport #63637 to branch-4.0.

This rejects Lucene-syntax SEARCH on columns without an inverted index and aligns the regression/unit coverage with the new FE-side validation. Conflict resolution kept the branch-4.0 SearchDslParser package and adapted IndexType to org.apache.doris.analysis.IndexDef.IndexType.

Testing

  • ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.RewriteSearchToSlotsTest
    • RewriteSearchToSlotsTest: 16 tests passed

… index

Issue Number: close #N/A (Jira CIR-20006)

Problem Summary:

SEARCH (Lucene syntax) predicates against columns that have no inverted
index silently fall back to an empty bitmap on BE (vsearch.cpp and
function_search.cpp only log a WARNING then return Status::OK() with an
empty result), making the query look like "no rows matched". That is
indistinguishable from a successful query that simply found nothing and
misleads users.

Validate at planning time in RewriteSearchToSlots, matching the existing
"column does not exist" behavior:

- Normal columns: require OlapTable.getInvertedIndex(column, null) != null.
- Variant subcolumns (parent.path): require any INVERTED index whose
  first column equals the parent variant column; the concrete subcolumn
  binding is still resolved per-segment in BE, consistent with the
  is_variant_sub branch in function_search.cpp.

Also harden OlapTable.getInvertedIndex against NPE when the table has
no TableIndexes set (returns null instead of dereferencing).

SEARCH() with Lucene syntax now throws AnalysisException at planning
time when the referenced column has no inverted index, with guidance to
add one via ALTER TABLE ... ADD INDEX ... USING INVERTED. Previously
such queries silently returned zero rows.

- Test:
    - Unit Test: RewriteSearchToSlotsTest updated and extended
      (testRewriteSearchThrowsWhenColumnHasNoInvertedIndex,
      testRewriteSearchSucceedsWhenColumnHasInvertedIndex,
      testRewriteSearchHandlesCaseInsensitiveField switched to a table
      with an inverted index on name).
- Behavior changed: Yes - previously silent FALSE now becomes a clear
  AnalysisException at planning time.
- Does this need documentation: No

(cherry picked from commit a4a9cf8)
…idation

### What problem does this PR solve?

Issue Number: close #N/A (follow-up to Jira CIR-20006 / PR apache#63637)

Problem Summary:

Test 22 of `regression-test/suites/search/test_search_function.groovy`
covered "SEARCH on a column without inverted index" and asserted the
error message contained the old BE-side text
`"SearchExpr should not be executed without inverted index"`.

After the fix for CIR-20006, that scenario is now rejected at FE planning
time in `RewriteSearchToSlots.checkInvertedIndexExists`, with an
`AnalysisException` whose message contains `"inverted index"` and names
the offending column. Update the assertion so the test passes on a build
that includes the FE-side check (and also explicitly verifies the
catch block actually fired, instead of silently passing if the SQL
unexpectedly succeeds).

### Release note

None (test-only change).

### Check List (For Author)

- Test:
    - Regression-test only: regression-test/suites/search/test_search_function.groovy
- Behavior changed: No
- Does this need documentation: No

(cherry picked from commit 1102975)
### What problem does this PR solve?

Issue Number: close #N/A

Related PR: apache#63637

Problem Summary: Variant subcolumn SEARCH rewrites resolved the parent slot case-insensitively, but the inverted-index validation still used the parent name exactly as written in the DSL. A valid predicate such as SEARCH('V.foo:bar') on a table with variant column v could therefore fail validation with Column 'V' not found. Use the resolved parent slot name for validation and normalize the field binding to the canonical parent path.

### Release note

SEARCH() on variant subcolumns now resolves the parent column name case-insensitively during inverted-index validation.

### Check List (For Author)

- Test: Unit Test
    - ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.RewriteSearchToSlotsTest
- Behavior changed: Yes - valid variant SEARCH predicates with differently-cased parent column names are no longer rejected.
- Does this need documentation: No

(cherry picked from commit d4d6f38)
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@airborne12 airborne12 marked this pull request as ready for review May 29, 2026 13:36
@airborne12
Copy link
Copy Markdown
Member Author

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants