-
Notifications
You must be signed in to change notification settings - Fork 1.3k
NumericComparator: immediately check whether a segment is competitive with the recorded bottom #15397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NumericComparator: immediately check whether a segment is competitive with the recorded bottom #15397
Conversation
… with the recorded bottom. When construction a new CompetitiveDISIBuilder, then check whether global min/max points or global min/max doc values skipper are comparative with the bottom. If so, then update competitiveIterator with an empty iterator, because no documents will have a value that is competitive with the current recorded bottom in the current segment. Doing this at CompetitiveDISIBuilder construction is cheap and allows to immediately prune, instead of waiting until doUpdateCompetitiveIterator(...) is invoked.
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
romseygeek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
| if (queueFull) { | ||
| long bottom = leafComparator.bottomAsComparableLong(); | ||
| long minValue = sortableBytesToLong(pointValues.getMinPackedValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a nice optimization. I am wondering if we still need to check for missing point values?
// if some documents have missing points, check that missing values prohibits optimization
if (docCount() < maxDoc && isMissingValueCompetitive()) {
return;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, we still need it. Otherwise we would incorrectly ignore the docs with missing values is missing sort value is competitive. I pushed this commit: 2df7908
I also added a test for this case. The TestSortOptimization test suite didn't catch this problem also after running it for thousands of times . I think this is because in the test, a document without a field is encountered before the queue is full and then bottom is not competitive with missing value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need to reach the hitsThreshold before setting the empty iterator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, this hitsThreshold also needs to be taken into account: 63fe755
jainankitk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
|
Thanks for the review @jainankitk! |
…competitive with the recorded bottom Backporting apache#15397 to branch_10x branch. When construction a new CompetitiveDISIBuilder, then check whether global min/max points or global min/max doc values skipper are comparative with the bottom. If so, then update competitiveIterator with an empty iterator, because no documents will have a value that is competitive with the current recorded bottom in the current segment. Doing this at CompetitiveDISIBuilder construction is cheap and allows to immediately prune, instead of waiting until doUpdateCompetitiveIterator(...) is invoked.
…competitive with the recorded bottom (#15410) Backporting #15397 to branch_10x branch. When construction a new CompetitiveDISIBuilder, then check whether global min/max points or global min/max doc values skipper are comparative with the bottom. If so, then update competitiveIterator with an empty iterator, because no documents will have a value that is competitive with the current recorded bottom in the current segment. Doing this at CompetitiveDISIBuilder construction is cheap and allows to immediately prune, instead of waiting until doUpdateCompetitiveIterator(...) is invoked.
|
Not sure if it's related but jenkins started failing after this commit. |
|
Ok, I don't think it's related, may be just a coincidence. I checked before and after and this fails in both cases: |
|
Quickly looking at the test and I don't think it is doing any sorting or using a numeric comparator. So I also don't think it is related to this change. |
…ator (#138087) We seem to get the pruning performance back on the stock lucene iterator with apache/lucene#15397, which is also present in our forked version of the code.
When construction a new CompetitiveDISIBuilder, then check whether global min/max points or global min/max doc values skipper are comparative with the bottom. If so, then update competitiveIterator with an empty iterator, because no documents will have a value that is competitive with the current recorded bottom in the current segment.
Doing this at CompetitiveDISIBuilder construction is cheap and allows to immediately prune, instead of waiting until doUpdateCompetitiveIterator(...) is invoked.
I didn't add tests in this PR, because I think thatTestSortOptimizationprovides sufficient coverage. But I'm happy to add more tests.