Add a MemorySegment Vector scorer - for scoring without copying on-heap #13339

ChrisHegarty · 2024-05-02T19:48:37Z

Add a MemorySegment Vector scorer - for scoring without copying on-heap.

The vector scorer loads values directly from the backing memory segment when available. Otherwise, if the vector data spans across segments, or is a query vector, the scorer copies the vector data on-heap.

A benchmark shows ~2x performance improvement of this scorer over the default copy-on-heap scorer. The benchmark need a little more scrutiny and evaluation on different platforms. Here's the results on a Max M2:

Benchmark                                      (size)   Mode  Cnt  Score   Error   Units
VectorScorerBenchmark.binaryDotProductDefault    1024  thrpt    5  1.391 ± 0.016  ops/us
VectorScorerBenchmark.binaryDotProductMemSeg     1024  thrpt    5  3.013 ± 0.207  ops/us

The scorer currently only operates on vectors with an element size of byte, since loading vector data from float[] (the fallback), is only supported in JDK 22. We can evaluate if and how to support floats separately. See https://bugs.openjdk.org/browse/JDK-8318678

The vector scorer is implicitly tied to the Panama Vector Utils implementation - you can only have a Memory segment scorer if the Panama vector implementation is present. There is a little room for improvement in how these things are initialised and structured.

lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java

...c/java21/org/apache/lucene/internal/vectorization/MemorySegmentByteVectorScorerSupplier.java

ChrisHegarty · 2024-05-02T19:53:53Z

...c/java21/org/apache/lucene/internal/vectorization/MemorySegmentByteVectorScorerSupplier.java

+
+  protected final MemorySegment getSegment(int ord, byte[] scratch) throws IOException {
+    checkOrdinal(ord, maxOrd);
+    int byteOffset = ord * vectorByteSize; // TODO: random + meta size


I wanna generalise this so we can use it for scalar quantised too - so a vector byte size + N bytes

While I don't foresee us doing this, but it is conceivable that the flat vector storage will change.

I think we have adequate test coverage to catch such a weird behavior, but maybe we need to change the name or add a comment reflecting that it relies on how Lucene95 stored the flat vectors?

uschindler

I don't like the additional Provider interface. There should only be one instantiating all implementations that can be vectorized.

lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java

lucene/core/src/java/org/apache/lucene/codecs/hnsw/FlatVectorScorerProvider.java

lucene/core/src/test/org/apache/lucene/search/TestKnnByteVectorQueryMMap.java

msokolov · 2024-05-03T14:59:49Z

So excited to see this finally come to fruition! No more double-buffering!

uschindler

Looks much better now. I will have a closer look later, so not yet a +1

uschindler · 2024-05-03T15:07:25Z

How can i change the review to "undecided"?

ChrisHegarty · 2024-05-03T15:08:55Z

How can i change the review to "undecided"?

I re-requested ur review - so there is no official reviewer yet. Take ur time. I have some luceneutil benchmarks to run, etc.

uschindler · 2024-05-03T15:10:28Z

How can i change the review to "undecided"?

I re-requested ur review - so there is no official reviewer yet. Take ur time. I have some luceneutil benchmarks to run, etc.

The benchmark code seems broken after your changes.

...c/java21/org/apache/lucene/internal/vectorization/MemorySegmentByteVectorScorerSupplier.java

Dismissing Uwe's review, since he is undecided. Can be explicitly added later, when we convince him ;-)

ChrisHegarty · 2024-05-03T16:01:43Z

Dismissing Uwe's review, since he is undecided. Can be explicitly added later, when we convince him ;-)

(Oh, this looks harsh!) I hope that I did this right. If not, I apologise. No offence intended. I just want to reflect @uschindler's comment above about being currently undecided.

ChrisHegarty · 2024-05-17T11:02:06Z

lucene/core/src/test/org/apache/lucene/internal/vectorization/TestVectorScorer.java

+  }
+
+  // Tests with a large amount of data (> 2GB), which ensures that data offsets do not overflow
+  @Nightly


This test creates a big file, but it's currently the only way to test that the offset calculations are correctly handled without overflow (if implemented using int)!

uschindler · 2024-05-18T17:00:57Z

Back from vacation. Will look into that till Tuesday!

uschindler

To me this looks fine. Very clean.
It's not easy to undertstand, but at some point we may change this to be in core classes and we can get rid of all those wrappers.

One thing to think about: I had another idea, which may also help to allow direct use of our vectors from NIOFSDirectory and ByteBuffersDirectory (used by NRT):

How about changing from MemorySegment to ByteBuffer views? They can also be used by the vectorization code.

This would also allow to have the interface we need in main classes and not required to be in Java 21 soureSet. In addition ByteBuffersDirectory and NIOFSDircetory could directly return bytebuffer views, too.

We may add a method like getByteBufferSlice(....).

What do you think?

...ne/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorizationProvider.java

lucene/core/src/java/org/apache/lucene/internal/vectorization/DefaultVectorizationProvider.java

lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java

lucene/core/src/java/org/apache/lucene/codecs/hnsw/FlatVectorScorerUtil.java

ChrisHegarty · 2024-05-21T10:46:41Z

We may add a method like getByteBufferSlice(....).

I experimented locally with similar before, and the performance impact when converting to/from MemorySegment was horrible. I don't disagree that for NIO/BB dir implementations this could be useful, but it would not intersect with the memory segment implementation, since it performs horribly (when converting from BB to MS, and vice versa).

uschindler · 2024-05-21T10:53:56Z

We may add a method like getByteBufferSlice(....).

I experimented locally with similar before, and the performance impact when converting to/from MemorySegment was horrible. I don't disagree that for NIO/BB dir implementations this could be useful, but it would not intersect with the memory segment implementation, since it performs horribly (when converting from BB to MS, and vice versa).

OK, this is not good. From my code review it looked like the DirectByteBuffer also only has a addres and therefor the whole thing is identical by performance (except that we have to get a MemorySegment slice first and then call toByteBuffer()). So this is interesting, but then I we can't do anything.

Mabe theres some optimizations miissing in ByteBuffer support of vector API.

So looks fine then. Working on a final review.

uschindler

One more question: How can we migrate to a later index format version? Because we have the MemorySegmentFlatVectorScorer (which has no knowledge about version) and this one returns the Lucene99MemorySegment* classes.

So in my opinion this is only solvable if we rename also the MemorySegmentFlatVectorScorer to Lucene99MemorySegmentFlatVectorScorer and also adapt the getter name.

Except this inconsistency it looks fine. A bit hairy but if performance is high prio, I accept this as a step towards getting rid of IndexInput totally.

...core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentFlatVectorsScorer.java

uschindler

now all fine.

…ap (apache#13339) Add a MemorySegment Vector scorer - for scoring without copying on-heap. The vector scorer loads values directly from the backing memory segment when available. Otherwise, if the vector data spans across segments the scorer copies the vector data on-heap. A benchmark shows ~2x performance improvement of this scorer over the default copy-on-heap scorer. The scorer currently only operates on vectors with an element size of byte. We can evaluate if and how to support floats separately.

uschindler · 2024-05-21T17:12:07Z

Looks like first Java 22 build also worked fine, so no API incompatibilities in JDK (foreign preview vs final): https://jenkins.thetaphi.de/job/Lucene-main-Linux/48322/consoleText

Add a MemorySegment Vector scorer - for scoring without copying on-heap

1f64bad

ChrisHegarty added the vector-based-search label May 2, 2024

ChrisHegarty commented May 2, 2024

View reviewed changes

lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java Outdated Show resolved Hide resolved

ChrisHegarty commented May 2, 2024

View reviewed changes

...c/java21/org/apache/lucene/internal/vectorization/MemorySegmentByteVectorScorerSupplier.java Outdated Show resolved Hide resolved

ChrisHegarty commented May 2, 2024

View reviewed changes

ChrisHegarty added 3 commits May 3, 2024 12:22

refactoring

8313c88

restore

8c6ab61

renames and cleanup

89aa9a2

ChrisHegarty force-pushed the msscorer branch from fd47ee5 to 89aa9a2 Compare May 3, 2024 11:36

Merge branch 'main' into msscorer

e9c24a0

ChrisHegarty requested review from uschindler and benwtrent May 3, 2024 12:59

uschindler requested changes May 3, 2024

View reviewed changes

lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java Outdated Show resolved Hide resolved

lucene/core/src/java/org/apache/lucene/codecs/hnsw/FlatVectorScorerProvider.java Outdated Show resolved Hide resolved

ChrisHegarty added 2 commits May 3, 2024 15:50

move creation to VectorizationProvider - much nicer!

ede3dfe

Merge remote-tracking branch 'origin/msscorer' into msscorer

86c47b2

ChrisHegarty commented May 3, 2024

View reviewed changes

lucene/core/src/test/org/apache/lucene/search/TestKnnByteVectorQueryMMap.java Show resolved Hide resolved

uschindler reviewed May 3, 2024

View reviewed changes

uschindler previously approved these changes May 3, 2024

View reviewed changes

ChrisHegarty requested a review from uschindler May 3, 2024 15:07

uschindler reviewed May 3, 2024

View reviewed changes

...c/java21/org/apache/lucene/internal/vectorization/MemorySegmentByteVectorScorerSupplier.java Outdated Show resolved Hide resolved

ChrisHegarty added 2 commits May 3, 2024 16:33

fix benchmark

2f6a9e2

unused import

c6ef6ea

ChrisHegarty requested a review from uschindler May 3, 2024 15:56

MemorySegmentAccessInput refactor

7a1faa1

ChrisHegarty added 3 commits May 12, 2024 19:22

fix license header

9edb423

Merge branch 'main' into msscorer

92dfdb2

clean up and more tests

244352e

ChrisHegarty commented May 17, 2024

View reviewed changes

ChrisHegarty added 3 commits May 21, 2024 09:33

Merge remote-tracking branch 'upstream/main' into msscorer

9742e1e

test copies in threads do not interfere with each other

2a7096e

fix compilation

e018da1

uschindler reviewed May 21, 2024

View reviewed changes

static instance

a743907

uschindler reviewed May 21, 2024

View reviewed changes

lucene/core/src/java/org/apache/lucene/codecs/hnsw/FlatVectorScorerUtil.java Outdated Show resolved Hide resolved

new -> get

c8c70ee

one more INSTANCE

b5a3f45

uschindler reviewed May 21, 2024

View reviewed changes

...core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentFlatVectorsScorer.java Outdated Show resolved Hide resolved

...core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentFlatVectorsScorer.java Outdated Show resolved Hide resolved

make private

ad271f3

uschindler approved these changes May 21, 2024

View reviewed changes

ChrisHegarty added 4 commits May 21, 2024 15:55

add lucene99

c42c9a1

fix toString

e6cac8b

Merge remote-tracking branch 'upstream/main' into msscorer

d9bba27

tidy

80229fb

uschindler approved these changes May 21, 2024

View reviewed changes

ChrisHegarty merged commit 05f04aa into apache:main May 21, 2024
3 checks passed

ChrisHegarty deleted the msscorer branch May 21, 2024 16:34

ChrisHegarty mentioned this pull request May 21, 2024

[9.x] Add a MemorySegment Vector scorer - for scoring without copying on-heap #13402

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a MemorySegment Vector scorer - for scoring without copying on-heap #13339

Add a MemorySegment Vector scorer - for scoring without copying on-heap #13339

ChrisHegarty commented May 2, 2024 •

edited

ChrisHegarty May 2, 2024

benwtrent May 8, 2024

uschindler left a comment

msokolov commented May 3, 2024

uschindler left a comment

uschindler commented May 3, 2024

ChrisHegarty commented May 3, 2024

uschindler commented May 3, 2024

ChrisHegarty commented May 3, 2024

ChrisHegarty May 17, 2024

uschindler commented May 18, 2024

uschindler left a comment

ChrisHegarty commented May 21, 2024 •

edited

uschindler commented May 21, 2024

uschindler left a comment

uschindler left a comment

uschindler commented May 21, 2024

Add a MemorySegment Vector scorer - for scoring without copying on-heap #13339

Add a MemorySegment Vector scorer - for scoring without copying on-heap #13339

Conversation

ChrisHegarty commented May 2, 2024 • edited

ChrisHegarty May 2, 2024

Choose a reason for hiding this comment

benwtrent May 8, 2024

Choose a reason for hiding this comment

uschindler left a comment

Choose a reason for hiding this comment

msokolov commented May 3, 2024

uschindler left a comment

Choose a reason for hiding this comment

uschindler commented May 3, 2024

ChrisHegarty commented May 3, 2024

uschindler commented May 3, 2024

ChrisHegarty commented May 3, 2024

ChrisHegarty May 17, 2024

Choose a reason for hiding this comment

uschindler commented May 18, 2024

uschindler left a comment

Choose a reason for hiding this comment

ChrisHegarty commented May 21, 2024 • edited

uschindler commented May 21, 2024

uschindler left a comment

Choose a reason for hiding this comment

uschindler left a comment

Choose a reason for hiding this comment

uschindler commented May 21, 2024

ChrisHegarty commented May 2, 2024 •

edited

ChrisHegarty commented May 21, 2024 •

edited