Skip to content

[BugFix] Fix NOT TEXT_MATCH fence to exclude all docs when Lucene searcher see zero docs#18006

Merged
chenboat merged 1 commit intoapache:masterfrom
heng-kuang-777:heng.kuang/update-default-searchable-fence-to-start-from-zero
Mar 27, 2026
Merged

[BugFix] Fix NOT TEXT_MATCH fence to exclude all docs when Lucene searcher see zero docs#18006
chenboat merged 1 commit intoapache:masterfrom
heng-kuang-777:heng.kuang/update-default-searchable-fence-to-start-from-zero

Conversation

@heng-kuang-777
Copy link
Copy Markdown
Contributor

Summary

In TextMatchFilterOperator.getSearchableDocCount(), the condition searchableDocCount > 0 caused a fallback to numDocs when the Lucene searcher had 0 visible documents. This can occur on a newly created consuming segment before the first Lucene SearcherManager refresh has completed. This made NOT TEXT_MATCH invert over the full [0, numDocs) range, producing false positives for all tail documents.

Fix by changing the guard to searchableDocCount >= 0, so a zero searchable count is respected and NOT results are correctly empty.

This is a follow up to #17880

Test Plan

Added a unit test to validate searchableDocCount can be returned even when it's 0

…rcher has zero visible docs

In `TextMatchFilterOperator.getSearchableDocCount()`, the condition
`searchableDocCount > 0` caused a fallback to `numDocs` when the Lucene
searcher had refreshed to 0 visible documents. This can occur on a newly
created consuming segment before the first Lucene SearcherManager refresh
has completed. This made NOT TEXT_MATCH invert over the full `[0, numDocs)`
range, producing false positives for all tail documents.

Fix by changing the guard to `searchableDocCount >= 0`, so a zero
searchable count is respected and NOT results are correctly empty.
Adds a regression test covering this corner case.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a correctness issue in NOT TEXT_MATCH evaluation when Lucene’s searcher temporarily reports 0 visible documents on newly created consuming segments (before the first refresh), ensuring the NOT inversion universe does not incorrectly expand to the full [0, numDocs) range.

Changes:

  • Update TextMatchFilterOperator searchable-doc “fence” logic to treat searchableDocCount == 0 as a valid value (rather than falling back to numDocs).
  • Add a unit test covering the searchableDocCount = 0 case to ensure NOT results are empty.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/TextMatchFilterOperator.java Adjusts fence guard to respect a zero searchable doc count, preventing NOT inversion over non-searchable docs.
pinot-core/src/test/java/org/apache/pinot/core/operator/filter/FilterOperatorUtilsTest.java Adds regression test validating NOT TEXT_MATCH returns empty when Lucene reports 0 visible docs.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.30%. Comparing base (a8baa36) to head (ab8c7f6).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##             master   #18006   +/-   ##
=========================================
  Coverage     63.30%   63.30%           
  Complexity     1543     1543           
=========================================
  Files          3200     3200           
  Lines        194074   194074           
  Branches      29883    29883           
=========================================
+ Hits         122852   122865   +13     
+ Misses        61582    61567   -15     
- Partials       9640     9642    +2     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.28% <100.00%> (+0.01%) ⬆️
java-21 63.26% <100.00%> (+<0.01%) ⬆️
temurin 63.30% <100.00%> (+<0.01%) ⬆️
unittests 63.30% <100.00%> (+<0.01%) ⬆️
unittests1 55.53% <100.00%> (+<0.01%) ⬆️
unittests2 34.22% <0.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@chenboat chenboat merged commit cfd7268 into apache:master Mar 27, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants