Enhancement 11236 lazy compute similarity score #12480

Jackyrie2 · 2023-08-01T07:07:59Z

Description

This is an update to the previous PR #12371 , a few issues were found.

ordinal of newNode was used as index to the scoringContext map instead of using size as index
the entire scoringContext was evaluated during call to sortInternal instead of just checking if the new index being sorted has a pre-computed score

Two new unit tests were added to demonstrate the bugs, this PR fixes the issues above.

Benchmarking Result

To measure any meaningful latency improvements, we have to first create a big index and other small indexes, then once we force an index merge, the index writer will invokeHNSWGraphBuilder.initializeFromGraph. KnnGraphTester was modified as the following:

first add 90% of documents
iw.commit()
forceMerge into 1 segment
set merge policy to NoMergePolicy and add the rest of the documents
set merge policy to LogDocMergePolicy
forceMerge into 1 segment again <- This step is specifically captured in the benchmark and I have verified in logs that initializeFromGraph is called exactly once in this step

From the benchmark results, we are calculating significantly fewer scores using the lazy eval enhancement. However, the indexMergeTime did not decrease as expected.

…tation

Jackyrie2 · 2023-08-01T07:11:23Z

@benwtrent @zhaih please take a look when you get a chance! I couldn't figure out how to update the existing PR, so I created a new one.

benwtrent

Thank you so much for running the benchmark.

I have two optimizations you should apply. I suggest the following to re-measure:

Use realistic vector sizes. You should be testing with at least 768 Float32 dimensions. If you are not, please make sure that you are. This way the distance calculation has the typical expense.
If the performance still isn't improved (with my optimizations & on 768 dim vectors), I suggest utilizing JFR to see where the time is spent. It could be that creating and deleting objects/adding removing from the hashmap washes out any real improvement :/

benwtrent · 2023-08-01T13:47:40Z

lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java


  public NeighborArray(int maxSize, boolean descOrder) {
    node = new int[maxSize];
    score = new float[maxSize];
    this.scoresDescOrder = descOrder;
+    scoringContext = new HashMap<>();


Pre-allocate to maxSize. We should avoid making the hashmap grow as we add things.

we don't need hashmap actually, I believe an array with length of maxSize is more than enough, as we're at most mapping idx to ScoringFunction right?

@zhaih yes! you are correct. Just have an array of maxSize works and null means there is no scoring function.

benwtrent · 2023-08-01T13:49:06Z

lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java

+    // Check if we need to compute score
+    if (scoringContext.containsKey(sortedNodeSize)) {
+      score[sortedNodeSize] = scoringContext.get(sortedNodeSize).calculateScore();
+      scoringContext.remove(sortedNodeSize);
+    }


You should simplify and only call the map once.

Suggested change

// Check if we need to compute score

if (scoringContext.containsKey(sortedNodeSize)) {

score[sortedNodeSize] = scoringContext.get(sortedNodeSize).calculateScore();

scoringContext.remove(sortedNodeSize);

}

// Check if we need to compute score

ScoringFunction maybeScoringFunction = scoringContext.remove(sortedNodeSize);

if (maybeScoringFunction != null) {

score[sortedNodeSize] = maybeScoringFunction.calculateScore();

}

zhaih

Hi Jack, thanks for doing the benchmark and I think the count of scoring computation does indicate the optimization is worth trying.

To further squeeze the performance and show the benefit, I think we need to optimize the data structure so that it doesn't create too much overhead during the graph build.
I have several suggestions below, and let me know what you think!

Also +1 to @benwtrent 's suggestion about using 768 vectors to further testing.
Thanks again for the hardwork!

zhaih · 2023-08-01T18:12:13Z

lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java


  public NeighborArray(int maxSize, boolean descOrder) {
    node = new int[maxSize];
    score = new float[maxSize];
    this.scoresDescOrder = descOrder;
+    scoringContext = new HashMap<>();


we don't need hashmap actually, I believe an array with length of maxSize is more than enough, as we're at most mapping idx to ScoringFunction right?

zhaih · 2023-08-01T18:24:43Z

lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java

@@ -111,6 +129,12 @@ public int[] sort() {
  private int insertSortedInternal() {


To further optimize the memory usage and eliminate potential GC overhead, I would suggest not storing ScoringFunction at all.
In theory, to compute a score, we need 3 things: SimilarityFunction, node_emb_1, node_emb_2. Where node embeddings can be get by calling vectors.vectorValue(nodeId) from HnswGraphBuilder, and also SimilarityFunction is held by HnswGraphBuilder.
That means, we don't need to even store the similarityFunction, and two bytes array beforehand, we can let the HnswGraphBuilder pass in a BiFunction<Integer, Integer, Float> to perform scoring execution when we need it and inside NeighborArray we just need to give two nodes' id to that function. That way, we don't need to store extra context inside NeighborArray and also avoid holding a lot of byte[] for too long. (If you think about it, we're holding all byte/float arrays necessary for score computation until we computed it or the graph is constructed, that's a huge GC load if your computer doesn't have enough memory)

I like this idea.

@zhaih instead of adding a function reference to the NeighborArray#ctor, why not pass the function to the NeighborArray#sort() method?

This obviates the need for any additional structures, if score==-1 calculate via the method provided to sort(). Though it probably won't be BiFunction<Integer, Integer, Float> but instead a Function<Integer, Float> as only the caller will know which is the "current" node for the neighbor array. Or even better, a custom functional interface that works on native values int -> float this way we don't box the integer and scoring results unnecessarily.

why not pass the function to the NeighborArray#sort() method?

Yeah +1

This obviates the need for any additional structures, if score==-1 calculate via the method provided to sort().

I have one concern for this, that -1 might be a valid score itself, but we can always have an additional integer, like noScoreIndex or so, to indicate upto which index we haven't calculated the score. (Because those nodes will always be sit in the front of the score array)

Though it probably won't be BiFunction<Integer, Integer, Float> but instead a Function<Integer, Float> as only the caller will know which is the "current" node for the neighbor array. Or even better, a custom functional interface that works on native values int -> float

+1, thanks for a more detailed thinking!

Score "NaN" is probably best to cover our bases :)

Oh right, I forgot Java has NaN, yeah then I guess NaN will be the best.

This is a wonderful thread, I am learning a lot! Let me take yall's suggestions and see how much more performance we can gain. Again, really appreciate your help!

Jackyrie2 · 2023-08-03T20:20:14Z

Here is a quick re-run of benchmark(100 dim vectors) on the optimized code with a 90% - 10% split on documents addition:

Baseline -> old candidate -> optimized candidate:
253,234,833 -> 265,153,500 -> 246,373,375
3,845,039,583 -> 4,338,130,875 -> 3,674,674,583
10,131,094,959 -> 9,869,441,083 -> 9,569,986,875

I will run the benchmark on 768 dim vectors later.

Jackyrie2 · 2023-08-07T18:15:08Z

Benchmark with 768-dimension vectors

zhaih · 2023-08-07T21:29:35Z

Thank you @Jackyrie2 for a lot of benchmarking! Since we have already made it lightweight enough (requires no extra memory usage on NeighborArray) and the benchmark has shown mostly positive results(Benchmark does have quite a lot of noises, especially if you're running it with your local machine, if you want , maybe write some simple bash loop and run the evaluation 3 - 5 times and use the average), I would say this PR is a good optimization.

But it does need some more documentation I think as the NeighborArray is already complex enough, so I will try to carefully review it again recently.

Thank you!

zhaih

I left some small comments, overall it looks good to me!

zhaih · 2023-08-22T23:12:33Z

lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java

+      float[] vectorValue = null;
+      byte[] binaryValue = null;
+      switch (this.vectorEncoding) {
+        case FLOAT32 -> vectorValue = (float[]) vectors.vectorValue(nodeOrd);


Let's take this part outside of lambda to reduce number of times we call vectorValue, this operation involves some seek and parse operation on off-heap memory.

lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java

zhaih · 2023-08-22T23:13:56Z

lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java

-                case BYTE -> this.similarityFunction.compare(
-                    binaryValue, (byte[]) vectorsCopy.vectorValue(newNeighbor));
-              };
+//          float score =


Let's remove those commented code?

zhaih · 2023-08-22T23:16:27Z

lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java

+
+    if (Float.isNaN(tmpScore)){
+      tmpScore = scoringFunction.computeScore(tmpNode);
+      System.out.println("Node: " + tmpNode + "   Score: " + tmpScore);


remove this?

zhaih · 2023-08-22T23:24:49Z

Ah sorry I almost forget:
don't forget to create a CHANGES.txt entry :)

zhaih

LGTM, I don't think there's a need to make it Lucene 10 only? (we're not breaking any APIs)

zhaih · 2023-08-31T23:10:38Z

lucene/CHANGES.txt

@@ -90,6 +90,8 @@ Optimizations

 * GITHUB#12408: Lazy initialization improvements for Facets implementations when there are segments with no hits
  to count. (Greg Miller)
+
+* GITHUB##12371: enable lazy computation of similarity score during initializeFromGraph (Jack Wang)


Put it under Lucene 9.8?

zhaih · 2023-08-31T23:14:25Z

lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java

-                    vectorValue, (float[]) vectorsCopy.vectorValue(newNeighbor));
-                case BYTE -> this.similarityFunction.compare(
-                    binaryValue, (byte[]) vectorsCopy.vectorValue(newNeighbor));
-              };
          // we are not sure whether the previous graph contains


Update this comment like
We will compute those scores lazily when we need to pop out the non-diverse nodes?

…ps://github.com/Jackyrie2/lucene into enhancement-11236-lazy-compute-similarity-score

zhaih · 2023-09-01T06:05:41Z

@Jackyrie2 Seems the precommit fails?

zhaih · 2023-09-01T18:39:55Z

Merged and backported

Jackyrie2 added 3 commits June 13, 2023 22:55

[Draft] Lazily compute similarity score when reuse the old HNSW graph

2108f5d

Add unit tests to demonstrate bug in original code

d6280d4

Fix a line in previous unit test, and fix the previous buggy implemen…

6d946c0

…tation

benwtrent requested changes Aug 1, 2023

View reviewed changes

zhaih requested changes Aug 1, 2023

View reviewed changes

zhaih linked an issue Aug 1, 2023 that may be closed by this pull request

Lazily compute similarity score when reuse the old HNSW graph #12236

Closed

Address comments

c96c2f2

zhaih reviewed Aug 22, 2023

View reviewed changes

Jackyrie2 and others added 4 commits August 30, 2023 23:38

Address comments

87dd878

Add entry in change.txt

1b214fe

Add name in change.txt entry

315fe03

Merge branch 'main' into enhancement-11236-lazy-compute-similarity-score

dbee70d

zhaih approved these changes Aug 31, 2023

View reviewed changes

Jackyrie2 added 3 commits August 31, 2023 21:45

Change comment

9270402

Merge branch 'enhancement-11236-lazy-compute-similarity-score' of htt…

6d11ce2

…ps://github.com/Jackyrie2/lucene into enhancement-11236-lazy-compute-similarity-score

modify change.txt

b91ff4f

zhaih approved these changes Sep 1, 2023

View reviewed changes

Jackyrie2 added 2 commits August 31, 2023 23:41

Remove unused imports

9cf95f1

Styling

7997421

zhaih merged commit 9fd45e3 into apache:main Sep 1, 2023
4 checks passed

zhaih pushed a commit that referenced this pull request Sep 1, 2023

Enhancement 11236 lazy compute similarity score (#12480)

7fc9397

jimczi mentioned this pull request Sep 7, 2023

Introduce a random vector scorer in HNSW builder/searcher #12529

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement 11236 lazy compute similarity score #12480

Enhancement 11236 lazy compute similarity score #12480

Jackyrie2 commented Aug 1, 2023 •

edited

Loading

Jackyrie2 commented Aug 1, 2023

benwtrent left a comment

benwtrent Aug 1, 2023

zhaih Aug 1, 2023

benwtrent Aug 1, 2023

benwtrent Aug 1, 2023

zhaih left a comment

zhaih Aug 1, 2023

zhaih Aug 1, 2023

benwtrent Aug 1, 2023

benwtrent Aug 1, 2023

zhaih Aug 1, 2023 •

edited

Loading

benwtrent Aug 1, 2023

zhaih Aug 1, 2023

Jackyrie2 Aug 2, 2023

Jackyrie2 commented Aug 3, 2023

Jackyrie2 commented Aug 7, 2023

zhaih commented Aug 7, 2023

zhaih left a comment

zhaih Aug 22, 2023

zhaih Aug 22, 2023

zhaih Aug 22, 2023

zhaih commented Aug 22, 2023

zhaih left a comment

zhaih Aug 31, 2023

zhaih Aug 31, 2023

zhaih commented Sep 1, 2023

zhaih commented Sep 1, 2023

		@@ -111,6 +129,12 @@ public int[] sort() {
		private int insertSortedInternal() {

Enhancement 11236 lazy compute similarity score #12480

Enhancement 11236 lazy compute similarity score #12480

Conversation

Jackyrie2 commented Aug 1, 2023 • edited Loading

Description

Benchmarking Result

Jackyrie2 commented Aug 1, 2023

benwtrent left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhaih left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhaih Aug 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jackyrie2 commented Aug 3, 2023

Jackyrie2 commented Aug 7, 2023

Benchmark with 768-dimension vectors

zhaih commented Aug 7, 2023

zhaih left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhaih commented Aug 22, 2023

zhaih left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhaih commented Sep 1, 2023

zhaih commented Sep 1, 2023

Jackyrie2 commented Aug 1, 2023 •

edited

Loading

zhaih Aug 1, 2023 •

edited

Loading