New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent humongous allocations when calculating scalar quantiles #13090
Merged
benwtrent
merged 7 commits into
apache:main
from
benwtrent:feature/scalar-quantization-optimization
Feb 8, 2024
Merged
Prevent humongous allocations when calculating scalar quantiles #13090
benwtrent
merged 7 commits into
apache:main
from
benwtrent:feature/scalar-quantization-optimization
Feb 8, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java
Outdated
Show resolved
Hide resolved
mayya-sharipova
approved these changes
Feb 8, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benwtrent Thanks, the Lucene implementation LGTM as long as we are ok with the math and decreased recall that it brings.
tveasey
approved these changes
Feb 8, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
benwtrent
added a commit
that referenced
this pull request
Feb 8, 2024
The initial release of scalar quantization would periodically create a humongous allocation, which can put unwarranted pressure on the GC & on the heap usage as a whole. This commit adjusts this by only allocating a float array of 20*dimensions and averaging the discovered quantiles from there. Why does this work? - Quantiles based on confidence intervals are (generally) unbiased and doing an average gives statistically good results - The selector algorithm scales linearly, so the cost is just about the same - We need to do more than `1` vector at a time to prevent extreme confidence intervals interacting strangely with edge cases
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The initial release of scalar quantization would periodically create a humongous allocation, which can put unwarranted pressure on the GC & on the heap usage as a whole.
This commit adjusts this by only allocating a float array of 20*dimensions and averaging the discovered quantiles from there.
Why does this work?
1
vector at a time to prevent extreme confidence intervals interacting strangely with edge casesI benchmarked this over 500k vectors.
candidate
baseline
100k vectors
candidate
baseline
There does seem to be a slight increase in merge time (these are single threaded numbers) and a slight change in recall.
But to me, these seem acceptable given we are no longer allocating a ginormous array.