New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery #635
Conversation
IndexSortSortedNumericDocValuesRangeQuery can implement its count method and coompute count through a binary search, the same binary search that is used to execute the query itself, whenever all the required conditions are met.
...box/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java
Outdated
Show resolved
Hide resolved
Query query = new IndexSortSortedNumericDocValuesRangeQuery("another", 1, 42, fallbackQuery); | ||
Weight weight = query.createWeight(searcher, ScoreMode.COMPLETE, 1.0f); | ||
for (LeafReaderContext context : searcher.getLeafContexts()) { | ||
assertNotEquals(-1, weight.count(context)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for adding a test for this!
Minor: Would be good to actually check the fallback weight count here, and in general, have a different assertion here than the one in testCount()
..
Maybe, assertEquals(0, weight.count(context));
here, and assertEquals(1, weight.count(context));
in testCount()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right. I was initially hesitant on this because I was maybe planning to index more documents, then I could end up with more segments hence asserting on exact count could get more complicated. But if we keep a single doc we should be good and it's good to have a more precise check that also differs for the two scenarios. Thanks for your suggestion!
@@ -128,6 +128,9 @@ New Features | |||
based on TotalHitCountCollector that allows users to parallelize counting the | |||
number of hits. (Luca Cavanna, Adrien Grand) | |||
|
|||
* LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery | |||
to speed up computing the number of hits when possible. (Luca Cavanna, Adrien Grand) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I deserve having my name on this one :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eheh, I thought you get the credit because you merge it :)
…esRangeQuery (#635) IndexSortSortedNumericDocValuesRangeQuery can implement its count method and coompute count through a binary search, the same binary search that is used to execute the query itself, whenever all the required conditions are met.
@@ -195,7 +211,7 @@ public boolean isCacheable(LeafReaderContext ctx) { | |||
* {@link DocIdSetIterator} makes sure to wrap the original docvalues to skip over documents with | |||
* no value. | |||
*/ | |||
private DocIdSetIterator getDocIdSetIterator( | |||
private BoundedDocSetIdIterator getDocIdSetIterator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a typo?
BoundedDocSetIdIterator → BoundedDocIdSetIterator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm pretty sure it is. If you open a PR to rename, I'll merge it.
…cDocValuesRangeQuery (apache#635)" This reverts commit e53f32d.
IndexSortSortedNumericDocValuesRangeQuery can count matches by computing the first and last matching doc IDs using binary search. I tried to share the code between the query execution and the newly implemented count method, as duplicating code between the two did not look great otherwise.
I expanded the existing tests by issuing an explicit search as well as an explicit count. The existing tests exercised mostly count but now that I have implemented Weight#count we want to exercise both codepath: executing the query as well as the count shortcut.
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.