[BugFix] Fix NOT TEXT_MATCH fence to exclude all docs when Lucene searcher see zero docs#18006
Conversation
…rcher has zero visible docs In `TextMatchFilterOperator.getSearchableDocCount()`, the condition `searchableDocCount > 0` caused a fallback to `numDocs` when the Lucene searcher had refreshed to 0 visible documents. This can occur on a newly created consuming segment before the first Lucene SearcherManager refresh has completed. This made NOT TEXT_MATCH invert over the full `[0, numDocs)` range, producing false positives for all tail documents. Fix by changing the guard to `searchableDocCount >= 0`, so a zero searchable count is respected and NOT results are correctly empty. Adds a regression test covering this corner case.
There was a problem hiding this comment.
Pull request overview
Fixes a correctness issue in NOT TEXT_MATCH evaluation when Lucene’s searcher temporarily reports 0 visible documents on newly created consuming segments (before the first refresh), ensuring the NOT inversion universe does not incorrectly expand to the full [0, numDocs) range.
Changes:
- Update
TextMatchFilterOperatorsearchable-doc “fence” logic to treatsearchableDocCount == 0as a valid value (rather than falling back tonumDocs). - Add a unit test covering the
searchableDocCount = 0case to ensure NOT results are empty.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| pinot-core/src/main/java/org/apache/pinot/core/operator/filter/TextMatchFilterOperator.java | Adjusts fence guard to respect a zero searchable doc count, preventing NOT inversion over non-searchable docs. |
| pinot-core/src/test/java/org/apache/pinot/core/operator/filter/FilterOperatorUtilsTest.java | Adds regression test validating NOT TEXT_MATCH returns empty when Lucene reports 0 visible docs. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18006 +/- ##
=========================================
Coverage 63.30% 63.30%
Complexity 1543 1543
=========================================
Files 3200 3200
Lines 194074 194074
Branches 29883 29883
=========================================
+ Hits 122852 122865 +13
+ Misses 61582 61567 -15
- Partials 9640 9642 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary
In
TextMatchFilterOperator.getSearchableDocCount(), the conditionsearchableDocCount > 0caused a fallback tonumDocswhen the Lucene searcher had 0 visible documents. This can occur on a newly created consuming segment before the first Lucene SearcherManager refresh has completed. This made NOT TEXT_MATCH invert over the full[0, numDocs)range, producing false positives for all tail documents.Fix by changing the guard to
searchableDocCount >= 0, so a zero searchable count is respected and NOT results are correctly empty.This is a follow up to #17880
Test Plan
Added a unit test to validate searchableDocCount can be returned even when it's 0