Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery #635

Merged
merged 5 commits into from Feb 3, 2022

Conversation

javanna
Copy link
Contributor

@javanna javanna commented Feb 1, 2022

IndexSortSortedNumericDocValuesRangeQuery can count matches by computing the first and last matching doc IDs using binary search. I tried to share the code between the query execution and the newly implemented count method, as duplicating code between the two did not look great otherwise.

I expanded the existing tests by issuing an explicit search as well as an explicit count. The existing tests exercised mostly count but now that I have implemented Weight#count we want to exercise both codepath: executing the query as well as the count shortcut.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Lucene maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.

IndexSortSortedNumericDocValuesRangeQuery can implement its count method and coompute count through a binary search, the same binary search that is used to execute the query itself, whenever all the required conditions are met.
Query query = new IndexSortSortedNumericDocValuesRangeQuery("another", 1, 42, fallbackQuery);
Weight weight = query.createWeight(searcher, ScoreMode.COMPLETE, 1.0f);
for (LeafReaderContext context : searcher.getLeafContexts()) {
assertNotEquals(-1, weight.count(context));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for adding a test for this!
Minor: Would be good to actually check the fallback weight count here, and in general, have a different assertion here than the one in testCount()..
Maybe, assertEquals(0, weight.count(context)); here, and assertEquals(1, weight.count(context)); in testCount() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right. I was initially hesitant on this because I was maybe planning to index more documents, then I could end up with more segments hence asserting on exact count could get more complicated. But if we keep a single doc we should be good and it's good to have a more precise check that also differs for the two scenarios. Thanks for your suggestion!

@@ -128,6 +128,9 @@ New Features
based on TotalHitCountCollector that allows users to parallelize counting the
number of hits. (Luca Cavanna, Adrien Grand)

* LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery
to speed up computing the number of hits when possible. (Luca Cavanna, Adrien Grand)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I deserve having my name on this one :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eheh, I thought you get the credit because you merge it :)

@jpountz jpountz merged commit bade484 into apache:main Feb 3, 2022
jpountz pushed a commit that referenced this pull request Feb 3, 2022
…esRangeQuery (#635)

IndexSortSortedNumericDocValuesRangeQuery can implement its count method and coompute count through a binary search, the same binary search that is used to execute the query itself, whenever all the required conditions are met.
@@ -195,7 +211,7 @@ public boolean isCacheable(LeafReaderContext ctx) {
* {@link DocIdSetIterator} makes sure to wrap the original docvalues to skip over documents with
* no value.
*/
private DocIdSetIterator getDocIdSetIterator(
private BoundedDocSetIdIterator getDocIdSetIterator(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a typo?
BoundedDocSetIdIterator → BoundedDocIdSetIterator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm pretty sure it is. If you open a PR to rename, I'll merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants