Prevent humongous allocations when calculating scalar quantiles #13090

benwtrent · 2024-02-08T13:40:27Z

The initial release of scalar quantization would periodically create a humongous allocation, which can put unwarranted pressure on the GC & on the heap usage as a whole.

This commit adjusts this by only allocating a float array of 20*dimensions and averaging the discovered quantiles from there.

Why does this work?

Quantiles based on confidence intervals are (generally) unbiased and doing an average gives statistically good results
The selector algorithm scales linearly, so the cost is just about the same
We need to do more than 1 vector at a time to prevent extreme confidence intervals interacting strangely with edge cases

I benchmarked this over 500k vectors.

candidate

Force merge done in: 691533 ms
0.817	 0.04	500000	0	16	250	2343	596410	1.00	post-filter

baseline

Force merge done in: 685618 ms
0.818	 0.04	500000	0	16	250	2346	582242	1.00	post-filter

100k vectors
candidate

0.855	 0.03	100000	0	16	250	2207	144173	1.00	post-filter

baseline

0.858	 0.03	100000	0	16	250	2205	141578	1.00	post-filter

There does seem to be a slight increase in merge time (these are single threaded numbers) and a slight change in recall.

But to me, these seem acceptable given we are no longer allocating a ginormous array.

lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java

mayya-sharipova

@benwtrent Thanks, the Lucene implementation LGTM as long as we are ok with the math and decreased recall that it brings.

…tization-optimization

tveasey

LGTM

The initial release of scalar quantization would periodically create a humongous allocation, which can put unwarranted pressure on the GC & on the heap usage as a whole. This commit adjusts this by only allocating a float array of 20*dimensions and averaging the discovered quantiles from there. Why does this work? - Quantiles based on confidence intervals are (generally) unbiased and doing an average gives statistically good results - The selector algorithm scales linearly, so the cost is just about the same - We need to do more than `1` vector at a time to prevent extreme confidence intervals interacting strangely with edge cases

benwtrent added 4 commits February 7, 2024 09:25

Prevent humongeous allocations during scalar quantizaiton

6c0fc2b

calculate quantiles over 10 vecs at a time

177818a

iter

4402325

fixing scalar quantizer

1e612a8

benwtrent requested a review from mayya-sharipova February 8, 2024 13:40

adding changes

e12e959

mayya-sharipova reviewed Feb 8, 2024

View reviewed changes

lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java Outdated Show resolved Hide resolved

mayya-sharipova approved these changes Feb 8, 2024

View reviewed changes

benwtrent added 2 commits February 8, 2024 09:48

Merge remote-tracking branch 'upstream/main' into feature/scalar-quan…

c8a9473

…tization-optimization

removing unused code, adding comment about scratch sampling

41a8d12

tveasey approved these changes Feb 8, 2024

View reviewed changes

benwtrent merged commit 7da509b into apache:main Feb 8, 2024
4 checks passed

benwtrent deleted the feature/scalar-quantization-optimization branch February 8, 2024 20:56

lintool mentioned this pull request Feb 15, 2024

Errors with openai-ada2-int8 regressions: GCLocker errors castorini/anserini#2314

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent humongous allocations when calculating scalar quantiles #13090

Prevent humongous allocations when calculating scalar quantiles #13090

benwtrent commented Feb 8, 2024

mayya-sharipova left a comment

tveasey left a comment

Prevent humongous allocations when calculating scalar quantiles #13090

Prevent humongous allocations when calculating scalar quantiles #13090

Conversation

benwtrent commented Feb 8, 2024

mayya-sharipova left a comment

Choose a reason for hiding this comment

tveasey left a comment

Choose a reason for hiding this comment