LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery #635

javanna · 2022-02-01T15:04:45Z

IndexSortSortedNumericDocValuesRangeQuery can count matches by computing the first and last matching doc IDs using binary search. I tried to share the code between the query execution and the newly implemented count method, as duplicating code between the two did not look great otherwise.

I expanded the existing tests by issuing an explicit search as well as an explicit count. The existing tests exercised mostly count but now that I have implemented Weight#count we want to exercise both codepath: executing the query as well as the count shortcut.

Checklist

Please review the following and check all that apply:

I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
I have created a Jira issue and added the issue ID to my pull request title.
I have given Lucene maintainers access to contribute to my PR branch. (optional but recommended)
I have developed this patch against the main branch.
I have run ./gradlew check.
I have added tests for my changes.

IndexSortSortedNumericDocValuesRangeQuery can implement its count method and coompute count through a binary search, the same binary search that is used to execute the query itself, whenever all the required conditions are met.

...box/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java

vigyasharma · 2022-02-01T20:49:35Z

...src/test/org/apache/lucene/sandbox/search/TestIndexSortSortedNumericDocValuesRangeQuery.java

+    Query query = new IndexSortSortedNumericDocValuesRangeQuery("another", 1, 42, fallbackQuery);
+    Weight weight = query.createWeight(searcher, ScoreMode.COMPLETE, 1.0f);
+    for (LeafReaderContext context : searcher.getLeafContexts()) {
+      assertNotEquals(-1, weight.count(context));


Nice, thanks for adding a test for this!
Minor: Would be good to actually check the fallback weight count here, and in general, have a different assertion here than the one in testCount()..
Maybe, assertEquals(0, weight.count(context)); here, and assertEquals(1, weight.count(context)); in testCount() ?

you're right. I was initially hesitant on this because I was maybe planning to index more documents, then I could end up with more segments hence asserting on exact count could get more complicated. But if we keep a single doc we should be good and it's good to have a more precise check that also differs for the two scenarios. Thanks for your suggestion!

jpountz · 2022-02-02T17:40:33Z

lucene/CHANGES.txt

@@ -128,6 +128,9 @@ New Features
  based on TotalHitCountCollector that allows users to parallelize counting the
  number of hits. (Luca Cavanna, Adrien Grand)

+* LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery
+  to speed up computing the number of hits when possible. (Luca Cavanna, Adrien Grand)


not sure I deserve having my name on this one :)

eheh, I thought you get the credit because you merge it :)

…esRangeQuery (#635) IndexSortSortedNumericDocValuesRangeQuery can implement its count method and coompute count through a binary search, the same binary search that is used to execute the query itself, whenever all the required conditions are met.

ryo0301 · 2022-02-04T09:31:36Z

...box/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java

@@ -195,7 +211,7 @@ public boolean isCacheable(LeafReaderContext ctx) {
   * {@link DocIdSetIterator} makes sure to wrap the original docvalues to skip over documents with
   * no value.
   */
-  private DocIdSetIterator getDocIdSetIterator(
+  private BoundedDocSetIdIterator getDocIdSetIterator(


Isn't this a typo?
BoundedDocSetIdIterator → BoundedDocIdSetIterator

Yes, I'm pretty sure it is. If you open a PR to rename, I'll merge it.

…cDocValuesRangeQuery (apache#635)" This reverts commit e53f32d.

javanna added 2 commits February 1, 2022 15:53

spotless

80e9b14

vigyasharma reviewed Feb 1, 2022

View reviewed changes

...box/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java Outdated Show resolved Hide resolved

javanna added 2 commits February 1, 2022 21:29

fall back to fallback.count and expand tests

8695b79

spotless

0a1c651

vigyasharma reviewed Feb 1, 2022

View reviewed changes

iter

58c0ca8

jpountz approved these changes Feb 2, 2022

View reviewed changes

jpountz approved these changes Feb 3, 2022

View reviewed changes

jpountz merged commit bade484 into apache:main Feb 3, 2022

ryo0301 reviewed Feb 4, 2022

View reviewed changes

javanna mentioned this pull request Feb 10, 2022

LUCENE-10385: Avoid SimpleText codec in TestIndexSortSortedNumericDocValuesRangeQuery #675

Merged

javanna added a commit to javanna/lucene that referenced this pull request Mar 14, 2022

Revert "LUCENE-10385: Implement Weight#count on IndexSortSortedNumeri…

2ffd6dd

…cDocValuesRangeQuery (apache#635)" This reverts commit e53f32d.

javanna mentioned this pull request Mar 14, 2022

Revert "LUCENE-10385: Implement Weight#count on IndexSortSortedNumeri… #745

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery #635

LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery #635

javanna commented Feb 1, 2022 •

edited

vigyasharma Feb 1, 2022

javanna Feb 1, 2022

jpountz Feb 2, 2022

javanna Feb 3, 2022

ryo0301 Feb 4, 2022

jpountz Feb 4, 2022

LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery #635

LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery #635

Conversation

javanna commented Feb 1, 2022 • edited

Checklist

vigyasharma Feb 1, 2022

Choose a reason for hiding this comment

javanna Feb 1, 2022

Choose a reason for hiding this comment

jpountz Feb 2, 2022

Choose a reason for hiding this comment

javanna Feb 3, 2022

Choose a reason for hiding this comment

ryo0301 Feb 4, 2022

Choose a reason for hiding this comment

jpountz Feb 4, 2022

Choose a reason for hiding this comment

javanna commented Feb 1, 2022 •

edited