Skip to content

branch-4.0: [fix](inverted index) Fix empty string MATCH on keyword index returning wrong results #60500#60516

Merged
yiguolei merged 1 commit intobranch-4.0from
auto-pick-60500-branch-4.0
Feb 5, 2026
Merged

branch-4.0: [fix](inverted index) Fix empty string MATCH on keyword index returning wrong results #60500#60516
yiguolei merged 1 commit intobranch-4.0from
auto-pick-60500-branch-4.0

Conversation

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Feb 5, 2026

Cherry-picked from #60500

…ng wrong results (#60500)

## Proposed changes

Fix empty string MATCH on keyword index returning wrong results.

The multi-analyzer feature commit (2c950e1) incorrectly added an
empty string check that prevented `MATCH ''` from finding rows with
empty string values in keyword indexes.

For keyword index (no tokenization), empty string is a valid exact match
value and should be matchable. The previous code incorrectly skipped
empty strings with the comment "empty query should match nothing", which
is wrong for keyword indexes.

## Problem

```sql
-- Table with keyword index (no parser)
CREATE TABLE test (id INT, col TEXT, INDEX idx(col) USING INVERTED);
INSERT INTO test VALUES (1, ''), (2, 'data');

-- Before fix: returns 0 (WRONG!)
-- After fix: returns 1 (CORRECT!)
SELECT count() FROM test WHERE col MATCH '';
```

## Changes

This fix removes the empty string check for keyword index paths in:
- `be/src/vec/functions/match.cpp` (slow path)
- `be/src/olap/rowset/segment_v2/inverted_index_reader.cpp` (index path)
- `be/src/olap/rowset/segment_v2/inverted_index/analyzer/analyzer.cpp`

Added regression test `test_empty_string_match.groovy` to cover:
- Empty string match on keyword index (both index and slow paths)
- Empty string match on tokenized index (should return 0)
- match_any and match_all with empty string

## Check List (For Author)

- Test
    - [x] Regression test
    - [x] Unit Test
    - [ ] Manual test
    - [ ] No need to test

- Behavior changed:
- [x] Yes. `MATCH ''` on keyword index now correctly matches rows with
empty string values.

- Does this need documentation?
    - [ ] No.
@github-actions github-actions bot requested a review from yiguolei as a code owner February 5, 2026 02:44
@Thearas
Copy link
Contributor

Thearas commented Feb 5, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring closed this Feb 5, 2026
@dataroaring dataroaring reopened this Feb 5, 2026
@Thearas
Copy link
Contributor

Thearas commented Feb 5, 2026

run buildall

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 25.00% (1/4) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.05% (19112/36024)
Line Coverage 36.20% (178051/491900)
Region Coverage 32.77% (137899/420831)
Branch Coverage 33.68% (59783/177498)

@yiguolei yiguolei merged commit 5a75558 into branch-4.0 Feb 5, 2026
26 of 29 checks passed
@github-actions github-actions bot deleted the auto-pick-60500-branch-4.0 branch February 5, 2026 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants