Some minor code cleanup in IndexSortSortedNumericDocValuesRangeQuery #12003

gsmiller · 2022-12-08T21:44:54Z

Leverage DISI static factory methods more over custom DISI impl where possible
Assert points field is a single-dim in a couple places
Bound cost estimate by the cost of the doc values field (for sparse fields)

Description

This PR contains some minor cleanup I thought might be useful after recently spending a little time looking at this code.

* Leverage DISI static factory methods more over custom DISI impl where possible. * Assert points field is a single-dim. * Bound cost estimate by the cost of the doc values field (for sparse fields).

vigyasharma

Thanks for making these changes Greg, they seem to simplify the overall code flow. Changes look good to me, I only have a couple of questions for my understanding.

vigyasharma · 2022-12-09T21:10:21Z

...box/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java

@@ -692,7 +697,7 @@ public int advance(int target) throws IOException {

    @Override
    public long cost() {
-      return lastDoc - firstDoc;
+      return Math.min(delegate.cost(), lastDoc - firstDoc);


For my understanding, why can't we just return delegate.cost() here? It seems to me that we're returning DocIdSetIterator.range() wherever cost would be equal to lastDoc - firstDoc. Is that not the case?

Good question. So delegate.cost() will approximate the number of documents that could be provided in total by the delegate (in this case, it's NumericDocValues). lastDoc - firstDoc of course will provide the correct number of docs if every doc has a value. In this case, delegate tells us what docs actually have values, and because we're using this, we know not all docs have values, so lastDoc - firstDoc may overestimate. If the number of docs containing a value are very sparse though, it's possible we'll over-estimate by a lot, so providing a ceiling using delegate.cost() could be useful in certain cases. Hope that makes sense?

vigyasharma · 2022-12-09T21:24:18Z

...box/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java

+    if (matchAll(points, queryLowerPoint, queryUpperPoint)) {
+      int maxDoc = context.reader().maxDoc();
+      if (points.getDocCount() == maxDoc) {
+        return delegate;


Curious if we can return DocIdSetIterator.all(maxDoc) here. Does it break correctness is some way?

Good suggestion! That should be functionally correct (and more efficient than relying on delegate here). Thanks!

jpountz

Thanks, this is much better this way indeed! There is just one bit I'm not comfortable with regarding relying on cost() being accurate but maybe we can work around it by providing a match count separately?

jpountz · 2022-12-10T13:12:06Z

...box/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java

-            return disi.lastDoc - disi.firstDoc;
+          DocIdSetIterator disi = getDocIdSetIteratorOrNull(context);
+          if (disi != null && disi instanceof BoundedDocIdSetIterator == false) {
+            return Math.toIntExact(disi.cost());


I worry that this might be a bit fragile since cost() has no guarantee to be accurate. I wonder if we could make getDocIdSetIteratorOrNull() return both a DocIdSetIterator and a number of matches (possibly -1 when unknown) to make this less trappy.

That's a great point. I'll work on making this less fragile. Thanks!

jpountz

LGTM!

…12003) * Leverage DISI static factory methods more over custom DISI impl where possible. * Assert points field is a single-dim. * Bound cost estimate by the cost of the doc values field (for sparse fields).

gsmiller · 2022-12-11T01:29:58Z

Thanks @jpountz. Merged/backported.

LuXugang · 2022-12-12T02:41:48Z

Thanks @gsmiller , a new syntactic sugar record to me and first time appeard in lucene code.

Some minor code cleanup in IndexSortSortedNumericDocValuesRangeQuery.

d309f76

* Leverage DISI static factory methods more over custom DISI impl where possible. * Assert points field is a single-dim. * Bound cost estimate by the cost of the doc values field (for sparse fields).

vigyasharma approved these changes Dec 9, 2022

View reviewed changes

use DISI#all instead of delegate in one case

99238d4

jpountz reviewed Dec 10, 2022

View reviewed changes

gsmiller added 2 commits December 10, 2022 08:46

less britle solution for count

e03c9a6

changes

420d5f0

jpountz approved these changes Dec 10, 2022

View reviewed changes

gsmiller merged commit 8671e29 into apache:main Dec 10, 2022

gsmiller deleted the explore/IndexSortSortedNumericDocValuesRangeQuery-tweaks branch December 10, 2022 20:23

rmuir added this to the 9.5.0 milestone Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some minor code cleanup in IndexSortSortedNumericDocValuesRangeQuery #12003

Some minor code cleanup in IndexSortSortedNumericDocValuesRangeQuery #12003

gsmiller commented Dec 8, 2022

vigyasharma left a comment

vigyasharma Dec 9, 2022

gsmiller Dec 10, 2022

vigyasharma Dec 9, 2022

gsmiller Dec 10, 2022

jpountz left a comment

jpountz Dec 10, 2022

gsmiller Dec 10, 2022

jpountz left a comment

gsmiller commented Dec 11, 2022

LuXugang commented Dec 12, 2022

Some minor code cleanup in IndexSortSortedNumericDocValuesRangeQuery #12003

Some minor code cleanup in IndexSortSortedNumericDocValuesRangeQuery #12003

Conversation

gsmiller commented Dec 8, 2022

Description

vigyasharma left a comment

Choose a reason for hiding this comment

vigyasharma Dec 9, 2022

Choose a reason for hiding this comment

gsmiller Dec 10, 2022

Choose a reason for hiding this comment

vigyasharma Dec 9, 2022

Choose a reason for hiding this comment

gsmiller Dec 10, 2022

Choose a reason for hiding this comment

jpountz left a comment

Choose a reason for hiding this comment

jpountz Dec 10, 2022

Choose a reason for hiding this comment

gsmiller Dec 10, 2022

Choose a reason for hiding this comment

jpountz left a comment

Choose a reason for hiding this comment

gsmiller commented Dec 11, 2022

LuXugang commented Dec 12, 2022