LUCENE-10577: enable quantization of HNSW vectors to 8 bits #947

msokolov · 2022-06-07T19:39:31Z

This is PR #3 for this feature. It is very close to the previous one, just "rebased" on top of the Lucene93 Codec. In this PR I moved the new vector utility methods as package private in util.hnsw so they would be easier to change in the future. I did not attempt any loop-unrolling optimizations. I have tried some incubating vector api implementations, but nothing is ready to share. I re-ran luceneutil tests, and will open a separate PR for adding support for this format to luceneutil. Results continue to look promising

Task QPS	baseline	StdDev	candidate	StdDev	Pct diff	p-value
PKLookup	134.03	(19.3%)	135.28	(15.9%)	0.9% ( -28% - 44%)	0.868
LowTermVector	637.48	(9.0%)	695.72	(10.7%)	9.1% ( -9% - 31%)	0.003
AndHighMedVector	445.66	(8.0%)	559.35	(11.8%)	25.5% ( 5% - 49%)	0.000
HighTermVector	578.16	(9.5%)	737.93	(16.2%)	27.6% ( 1% - 59%)	0.000
AndHighHighVector	418.25	(9.8%)	555.38	(15.2%)	32.8% ( 7% - 64%)	0.000
MedTermVector	383.02	(8.4%)	550.12	(13.5%)	43.6% ( 20% - 71%)	0.000
AndHighLowVector	450.92	(9.5%)	713.77	(20.1%)	58.3% ( 26% - 97%)	0.000

msokolov · 2022-06-07T19:40:36Z

...rd-codecs/src/test/org/apache/lucene/backward_codecs/lucene92/Lucene92HnswVectorsWriter.java

@@ -129,6 +128,12 @@ public void writeField(FieldInfo fieldInfo, KnnVectorsReader knnVectorsReader)
    try {
      // write the vector data to a temporary file
      DocsWithFieldSet docsWithField = writeVectorData(tempVectorData, vectors);
+      int byteSize;


oops some left-over change that came in while rebasing; I'll remove

msokolov · 2022-06-08T15:05:40Z

If folks have time to review, that would be great. The main thing to focus on I think is

how the HnswGraphSearcher/Writer implementation is forked into float[] and BytesRef using generics
adding a VERSION to the index format so we can handle back-compat in a reasonable way
the fact that some of the details of VectorSimilarityFunction are now directly handled in util.hnsw -- I didn't see a better way, but it blurs the abstraction boundary

mayya-sharipova · 2022-06-15T12:36:37Z

lucene/core/src/java/org/apache/lucene/codecs/lucene93/OffHeapVectorValues.java

@@ -41,11 +42,11 @@ abstract class OffHeapVectorValues extends VectorValues
  protected final int byteSize;
  protected final float[] value;

-  OffHeapVectorValues(int dimension, int size, IndexInput slice) {
+  OffHeapVectorValues(int dimension, int size, IndexInput slice, int byteSize) {


For DOT_PRODUCT8 case for vectorValue(..) functions should we use slice.readBytes instead of slice.ReadFloats and then convert them to floats? Otherwise, it looks to me we are reading wrong values.

mayya-sharipova · 2022-06-15T14:16:41Z

@msokolov Thanks for your work, this is a very exciting feature. Really looking forward to it. Extra benefit – less disk space used for vector values.

Overall, this PR looks very good to me. I just have several questions

During merging when writing a merged vector field it looks like we first expand vector values only to again to compress them later? Would be nice to avoid this.
Not sure if possible at this stage, but it would be nice if HnswGraphBuilder and HnswGraphSearcher were not aware of different calculations needed for different similarity functions, and refer all this calculations (dotProduct(BytesRef a..)) to VectorSimilarityFunction

msokolov · 2022-07-11T22:38:14Z

I'm looking to address various comments; just pushed a commit that makes the vector encoding explicit by adding a new enum and parameter "vectorEncoding", splitting this out from "similarityFunction".

During merging when writing a merged vector field it looks like we first expand vector values only to again to compress them later? Would be nice to avoid this.

Oh good catch, @mayya-sharipova I will look into addressing this.

Not sure if possible at this stage, but it would be nice if HnswGraphBuilder and HnswGraphSearcher were not aware of different calculations needed for different similarity functions, and refer all this calculations (dotProduct(BytesRef a..)) to VectorSimilarityFunction

I don't see how to do this efficiently (without many conversions from byte to float) and neatly (without code duplication in tricky algorithmic areas) and with complete API purity, so I sacrificed some purity. If you have any ideas how to do it better, I'm open to changing it though.

msokolov · 2022-07-11T22:39:05Z

Also - if anybody has advice about how to rebase while maintaining this PR I'd be interested. Should I git merge from main??

msokolov · 2022-07-11T23:01:16Z

During merging when writing a merged vector field it looks like we first expand vector values only to again to compress them later? Would be nice to avoid this.

In fact after checking, I don't think we are doing this expand/compress step even though getVectorValues() returns ExpandingVectorValues to the merger. This is because the merger uses the binaryValue() call to write the vectors themselves, and that value is unchanged by EVV, and the graph is created by the (now) polymorphic hnsw utils that also call binaryValue() when they are dealing with a field encoded as bytes.

msokolov · 2022-07-13T02:26:18Z

OK, this last round of commits moves the new vector encoding parameter out of IndexableField and FieldInfo into Codec constructor and internally to the codec, in FieldEntry. It certainly has less visible surface area now. I also merged from main and resolved a bunch of conflicts with the scoring change. I think it is correct (all the unit tests pass), but it wasn't trivial and I think it would be worth running some integration/performance tests just to make sure all is still well.

There's a little bit of code duplication in HnswGraphSearcher where we now have the logic for switching from approximate to exact knn in two places that I don't like. Maybe that can be factored better?

msokolov · 2022-07-14T21:22:28Z

This last commit moves "exact" KNN search from KnnVectorQuery to KnnVectorsReader. I think it is the right thing to do since it lets us optimize the exact KNN and return correct scores, but it does add a new method to the reader API. I called it "searchExhaustively" since I wanted to convey that it scans the whole index (or at least all the documents matching acceptDocs); we don't want users to try "searchExact" since of course that must be better than approximate? But I am open to better names - it is kind of long.

msokolov · 2022-07-15T15:21:38Z

I pushed an updated luceneutil PR adapting to these changes mikemccand/luceneutil#181. Running that perf test I saw consistent gains (20-55% depending on the test case) as compared to the earlier test runs.

I also noticed that the profiler shows the most expensive function during indexing is FixedBitSet.clear(), which makes me think we mioght want to use sparse bitsets for the "upper" layers of the graph which have many fewer nodes.

rmuir · 2022-07-15T19:15:30Z

I think the title of the PR is wrong? We shouldn't be quantizing anything. The user should be supplying a byte[] vector for 8-bit vectors. Floats should not be involved.

mayya-sharipova · 2022-07-18T18:02:46Z

lucene/core/src/java/org/apache/lucene/codecs/lucene93/Lucene93HnswVectorsReader.java

+    switch (fieldEntry.vectorEncoding) {
+      case BYTE -> numBytes = (long) fieldEntry.size() * dimension;
+      case FLOAT32 -> numBytes = (long) fieldEntry.size() * dimension * Float.BYTES;
+      default -> throw new AssertionError("unknown vector encoding " + fieldEntry.vectorEncoding);


Should we also update the error message that follows for BYTE case?

mayya-sharipova · 2022-07-18T19:53:35Z

lucene/core/src/java/org/apache/lucene/util/VectorUtil.java

+      total += a.bytes[aOffset++] * b.bytes[bOffset++];
+    }
+    // divide by 2 * 2^14 (maximum absolute value of product of 2 signed bytes) * len
+    return (1 + total) / (float) (len * (1 << 15));


To make scores non-negative should we do instead:
total / (float) (len * (1 << 15)) + 1?

mayya-sharipova · 2022-07-18T20:03:05Z

lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java

+  protected static TopDocs exhaustiveSearch(
+      VectorValues vectorValues,
+      DocIdSetIterator acceptDocs,
+      VectorSimilarityFunction similarityFunction,


looks like similarityFunction is not necessary here, as we always use dotProductScore

Michael Sokolov added 6 commits June 1, 2022 14:38

LUCENE-10577: enable quantization of HNSW vectors to 8 bits

eb5c2e0

attempting to cherry-pick

64472aa

KnnVectorFormat provides currentVersion()

e447e0b

spotless

b8a85bd

revert unneeded changes in backwards_codec

c50f894

move 8bit vector methods into util.hnsw

67ed386

msokolov commented Jun 7, 2022

View reviewed changes

Michael Sokolov added 2 commits June 7, 2022 19:43

undo spurious change left over from rebasing

1d7e1cd

small tidy

98cf401

mayya-sharipova reviewed Jun 15, 2022

View reviewed changes

Make vector encoding explicit

932ce67

Michael Sokolov added 4 commits July 12, 2022 21:59

move vectorEncoding from FieldInfo to FieldEntry

2fd9b9b

spotless

a6e42f6

fixing tests

3f3e23e

Merge remote-tracking branch 'lucene/main' into LUCENE-10577

3467a31

move exact knn search to codec

fd6c82f

mayya-sharipova reviewed Jul 18, 2022

View reviewed changes

msokolov closed this Jul 30, 2022

msokolov deleted the LUCENE-10577 branch July 30, 2022 22:38

asfimport mentioned this pull request Aug 24, 2022

Enable quantization of HNSW vectors to 8 bits [LUCENE-10577] #11613

Closed

rmuir mentioned this pull request Dec 2, 2022

GITHUB-11830 Better optimize storage for vector connections #11860

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LUCENE-10577: enable quantization of HNSW vectors to 8 bits #947

LUCENE-10577: enable quantization of HNSW vectors to 8 bits #947

msokolov commented Jun 7, 2022

msokolov Jun 7, 2022

msokolov commented Jun 8, 2022

mayya-sharipova Jun 15, 2022

mayya-sharipova commented Jun 15, 2022

msokolov commented Jul 11, 2022

msokolov commented Jul 11, 2022

msokolov commented Jul 11, 2022

msokolov commented Jul 13, 2022

msokolov commented Jul 14, 2022

msokolov commented Jul 15, 2022

rmuir commented Jul 15, 2022

mayya-sharipova Jul 18, 2022

mayya-sharipova Jul 18, 2022 •

edited

mayya-sharipova Jul 18, 2022

LUCENE-10577: enable quantization of HNSW vectors to 8 bits #947

LUCENE-10577: enable quantization of HNSW vectors to 8 bits #947

Conversation

msokolov commented Jun 7, 2022

msokolov Jun 7, 2022

Choose a reason for hiding this comment

msokolov commented Jun 8, 2022

mayya-sharipova Jun 15, 2022

Choose a reason for hiding this comment

mayya-sharipova commented Jun 15, 2022

msokolov commented Jul 11, 2022

msokolov commented Jul 11, 2022

msokolov commented Jul 11, 2022

msokolov commented Jul 13, 2022

msokolov commented Jul 14, 2022

msokolov commented Jul 15, 2022

rmuir commented Jul 15, 2022

mayya-sharipova Jul 18, 2022

Choose a reason for hiding this comment

mayya-sharipova Jul 18, 2022 • edited

Choose a reason for hiding this comment

mayya-sharipova Jul 18, 2022

Choose a reason for hiding this comment

mayya-sharipova Jul 18, 2022 •

edited