Lucene 9.9: Benchmark HNSW improvements #2292

lintool · 2023-12-05T17:51:21Z

It'd be great to benchmark HNSW, reproducing the experiments here: Vector Search with OpenAI Embeddings: Lucene Is All You Need. arXiv:2308.14963.

It's actually quite simple, we've gotten it down to a single command:

python src/main/python/run_regression.py --download --index --verify --search --regression msmarco-passage-openai-ada2

See: https://github.com/castorini/anserini/blob/master/docs/regressions/regressions-msmarco-passage-openai-ada2.md

@jpountz @benwtrent @ChrisHegarty Do either of you have cycles to try it out?

The text was updated successfully, but these errors were encountered:

ChrisHegarty · 2023-12-05T21:04:11Z

I’ll try to find time to run this.

lintool · 2023-12-14T14:07:02Z

Reporting on experiments with code on #2302 at commit 0b9f79e "Added -noMerge option".

MS MARCO v1 passage, dev queries, cosDPR-distil model. This is the model reported here:

Experimental Setup

Comparing HNSW fp32 vs. int8. In both cases, I'm forcing no merge to reduce variance on indexing times (which varies from trial to trial depending on which segments get selected for merging). For both, we end up with 16 segments.

Experiments on my Mac Studio, stats:

Mac Studio, M1 Ultra (20 cores; 16 performance and 4 efficiency), 128 GB RAM
macOS Sonoma 14.1.2; openjdk 11.0.13 2021-10-19.
Using 16 threads.

Four trials for each condition. Commands:

python src/main/python/run_regression.py --index --verify --search --regression msmarco-passage-cos-dpr-distil-hnsw >& logs/log.msmarco-passage-cos-dpr-distil-hnsw.lucene990.fp32.1
python src/main/python/run_regression.py --index --verify --search --regression msmarco-passage-cos-dpr-distil-hnsw-int8 >& logs/log.msmarco-passage-cos-dpr-distil-hnsw.lucene990.int8.1

python src/main/python/run_regression.py --index --verify --search --regression msmarco-passage-cos-dpr-distil-hnsw >& logs/log.msmarco-passage-cos-dpr-distil-hnsw.lucene990.fp32.2
python src/main/python/run_regression.py --index --verify --search --regression msmarco-passage-cos-dpr-distil-hnsw >& logs/log.msmarco-passage-cos-dpr-distil-hnsw.lucene990.fp32.3
python src/main/python/run_regression.py --index --verify --search --regression msmarco-passage-cos-dpr-distil-hnsw >& logs/log.msmarco-passage-cos-dpr-distil-hnsw.lucene990.fp32.4

python src/main/python/run_regression.py --index --verify --search --regression msmarco-passage-cos-dpr-distil-hnsw-int8 >& logs/log.msmarco-passage-cos-dpr-distil-hnsw.lucene990.int8.2
python src/main/python/run_regression.py --index --verify --search --regression msmarco-passage-cos-dpr-distil-hnsw-int8 >& logs/log.msmarco-passage-cos-dpr-distil-hnsw.lucene990.int8.3
python src/main/python/run_regression.py --index --verify --search --regression msmarco-passage-cos-dpr-distil-hnsw-int8 >& logs/log.msmarco-passage-cos-dpr-distil-hnsw.lucene990.int8.4

Results

Indexing:

# fp32
Total 8,841,823 documents indexed in 00:31:04
Total 8,841,823 documents indexed in 00:31:08
Total 8,841,823 documents indexed in 00:30:34
Total 8,841,823 documents indexed in 00:30:24
                                 avg 00:30:48

# int8
Total 8,841,823 documents indexed in 00:35:55
Total 8,841,823 documents indexed in 00:35:32
Total 8,841,823 documents indexed in 00:35:42
Total 8,841,823 documents indexed in 00:36:15
                                 avg 00:35:51
                                 +16% slowdown

Retrieval:

# fp32
6980 queries processed in 00:02:33 = ~45.34 q/s
6980 queries processed in 00:02:33 = ~45.57 q/s
6980 queries processed in 00:02:33 = ~45.33 q/s
6980 queries processed in 00:02:33 = ~45.55 q/s
                                  avg 45.45 q/s

# int8
6980 queries processed in 00:01:48 = ~64.54 q/s
6980 queries processed in 00:01:48 = ~64.50 q/s
6980 queries processed in 00:01:48 = ~64.56 q/s
6980 queries processed in 00:01:48 = ~64.45 q/s
                                  avg 64.51 q/s
                                   +42% speedup

Finally, effectiveness: remains the same, no difference.

benwtrent · 2023-12-14T14:25:06Z

Finally, effectiveness: remains the same, no difference.

🎉

 +16% slowdown

On index and initial flush, this isn't surprising. We build the graph with float32 and then have the small additional overhead of calculating quantiles and storing the quantized representation of everything.

But, I have noticed that merging is faster (about 30-40%).

lintool · 2023-12-14T15:22:31Z

@benwtrent merging is difficult to benchmark because I get high variance in running times... my diagnosis is that running time is idiosyncratically dependent on which segments get selected for merging... is this a convincing explanation?

benwtrent · 2023-12-14T18:04:39Z

@lintool I understand :)

I am happy to see that effectiveness is still, well, effective.

lintool · 2023-12-15T02:40:14Z

Hrm... update: trying the same experiments, but with OpenAI embeddings. Getting errors:

...
2023-12-14 20:46:43,421 INFO  [main] index.AbstractIndexer (AbstractIndexer.java:260) - 53.93% of files completed, 4,830,000 documents indexed
[2336.220s][warning][gc,alloc] pool-2-thread-16: Retried waiting for GCLocker too often allocating 244355 words
2023-12-14 20:47:44,104 INFO  [main] index.AbstractIndexer (AbstractIndexer.java:260) - 53.93% of files completed, 4,830,000 documents indexed
[2373.702s][warning][gc,alloc] pool-2-thread-8: Retried waiting for GCLocker too often allocating 249714 words
2023-12-14 20:48:44,511 INFO  [main] index.AbstractIndexer (AbstractIndexer.java:260) - 53.93% of files completed, 4,830,000 documents indexed
2023-12-14 20:49:46,485 INFO  [main] index.AbstractIndexer (AbstractIndexer.java:260) - 53.93% of files completed, 4,830,000 documents indexed
2023-12-14 20:50:49,971 INFO  [main] index.AbstractIndexer (AbstractIndexer.java:260) - 53.93% of files completed, 4,830,000 documents indexed
2023-12-14 20:51:52,476 INFO  [main] index.AbstractIndexer (AbstractIndexer.java:260) - 53.93% of files completed, 4,830,000 documents indexed
2023-12-14 20:52:53,176 INFO  [main] index.AbstractIndexer (AbstractIndexer.java:260) - 53.93% of files completed, 4,830,000 documents indexed
2023-12-14 20:53:56,204 INFO  [main] index.AbstractIndexer (AbstractIndexer.java:260) - 53.93% of files completed, 4,830,000 documents indexed
2023-12-14 20:54:58,288 INFO  [main] index.AbstractIndexer (AbstractIndexer.java:260) - 53.93% of files completed, 4,830,000 documents indexed
[2792.142s][warning][gc,alloc] pool-2-thread-5: Retried waiting for GCLocker too often allocating 249321 words
[2793.036s][warning][gc,alloc] pool-2-thread-8: Retried waiting for GCLocker too often allocating 249713 words
2023-12-14 20:56:13,898 INFO  [main] index.AbstractIndexer (AbstractIndexer.java:260) - 53.93% of files completed, 4,830,000 documents indexed
[2869.427s][warning][gc,alloc] pool-2-thread-9: Retried waiting for GCLocker too often allocating 247479 words
[2872.043s][warning][gc,alloc] pool-2-thread-3: Retried waiting for GCLocker too often allocating 237701 words

@benwtrent any ideas? cc/ @ChrisHegarty @jpountz

How should I start debugging?

tteofili · 2023-12-18T16:41:58Z

do these warnings result in indexing errors? or is it "just" polluting the output logs?
it might be the GC not managing to free up enough memory (also considering the larger embeddings' size of 1536), either we can give the JVM a larger heap or allow more retries from the GC (e.g., passing -XX:GCLockerRetryAllocationCount=100 to the indexing java command)

lintool · 2023-12-20T01:33:41Z

We're still having issues with openai-ada2-int8, but #2302 (Lucene 9.9.1) has been merged into master. Will create new issue focused on debugging this.

lintool · 2023-12-21T19:42:21Z

Ref: #2314

lintool · 2023-12-28T13:27:31Z

Ref: #2318

lintool · 2024-02-24T22:19:20Z

No further follow-up, closing.

lintool mentioned this issue Dec 14, 2023

Upgrade to Lucene 9.9.1 #2302

Merged

lintool closed this as completed Feb 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lucene 9.9: Benchmark HNSW improvements #2292

Lucene 9.9: Benchmark HNSW improvements #2292

lintool commented Dec 5, 2023

ChrisHegarty commented Dec 5, 2023

lintool commented Dec 14, 2023

benwtrent commented Dec 14, 2023

lintool commented Dec 14, 2023

benwtrent commented Dec 14, 2023

lintool commented Dec 15, 2023

tteofili commented Dec 18, 2023 •

edited

Loading

lintool commented Dec 20, 2023

lintool commented Dec 21, 2023

lintool commented Dec 28, 2023

lintool commented Feb 24, 2024

Lucene 9.9: Benchmark HNSW improvements #2292

Lucene 9.9: Benchmark HNSW improvements #2292

Comments

lintool commented Dec 5, 2023

ChrisHegarty commented Dec 5, 2023

lintool commented Dec 14, 2023

Experimental Setup

Results

benwtrent commented Dec 14, 2023

lintool commented Dec 14, 2023

benwtrent commented Dec 14, 2023

lintool commented Dec 15, 2023

tteofili commented Dec 18, 2023 • edited Loading

lintool commented Dec 20, 2023

lintool commented Dec 21, 2023

lintool commented Dec 28, 2023

lintool commented Feb 24, 2024

tteofili commented Dec 18, 2023 •

edited

Loading