-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some minor code cleanup in IndexSortSortedNumericDocValuesRangeQuery #12003
Some minor code cleanup in IndexSortSortedNumericDocValuesRangeQuery #12003
Conversation
* Leverage DISI static factory methods more over custom DISI impl where possible. * Assert points field is a single-dim. * Bound cost estimate by the cost of the doc values field (for sparse fields).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making these changes Greg, they seem to simplify the overall code flow. Changes look good to me, I only have a couple of questions for my understanding.
@@ -692,7 +697,7 @@ public int advance(int target) throws IOException { | |||
|
|||
@Override | |||
public long cost() { | |||
return lastDoc - firstDoc; | |||
return Math.min(delegate.cost(), lastDoc - firstDoc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, why can't we just return delegate.cost()
here? It seems to me that we're returning DocIdSetIterator.range()
wherever cost would be equal to lastDoc - firstDoc
. Is that not the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. So delegate.cost()
will approximate the number of documents that could be provided in total by the delegate (in this case, it's NumericDocValues
). lastDoc - firstDoc
of course will provide the correct number of docs if every doc has a value. In this case, delegate
tells us what docs actually have values, and because we're using this, we know not all docs have values, so lastDoc - firstDoc
may overestimate. If the number of docs containing a value are very sparse though, it's possible we'll over-estimate by a lot, so providing a ceiling using delegate.cost()
could be useful in certain cases. Hope that makes sense?
if (matchAll(points, queryLowerPoint, queryUpperPoint)) { | ||
int maxDoc = context.reader().maxDoc(); | ||
if (points.getDocCount() == maxDoc) { | ||
return delegate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious if we can return DocIdSetIterator.all(maxDoc)
here. Does it break correctness is some way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion! That should be functionally correct (and more efficient than relying on delegate here). Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is much better this way indeed! There is just one bit I'm not comfortable with regarding relying on cost() being accurate but maybe we can work around it by providing a match count separately?
return disi.lastDoc - disi.firstDoc; | ||
DocIdSetIterator disi = getDocIdSetIteratorOrNull(context); | ||
if (disi != null && disi instanceof BoundedDocIdSetIterator == false) { | ||
return Math.toIntExact(disi.cost()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry that this might be a bit fragile since cost() has no guarantee to be accurate. I wonder if we could make getDocIdSetIteratorOrNull()
return both a DocIdSetIterator
and a number of matches (possibly -1 when unknown) to make this less trappy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great point. I'll work on making this less fragile. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…12003) * Leverage DISI static factory methods more over custom DISI impl where possible. * Assert points field is a single-dim. * Bound cost estimate by the cost of the doc values field (for sparse fields).
Thanks @jpountz. Merged/backported. |
Thanks @gsmiller , a new syntactic sugar |
Description
This PR contains some minor cleanup I thought might be useful after recently spending a little time looking at this code.