Upgrade to Lucene 9.9 #2288

jpountz · 2023-12-05T08:51:37Z

Lucene 9.9 was just released, let's upgrade Anserini? https://lucene.apache.org/core/corenews.html#apache-lucenetm-990-available

lintool · 2023-12-05T12:48:09Z

Definitely! I'm in the middle of running our regressions, and then planning on merging in #2275 which is a big code dump.

But let's queue up after that?

BTW, are there are new codecs introduce that we gotta upgrade? HNSW indexer currently hard codes Lucene95Codec:
https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/index/IndexHnswDenseVectors.java#L283C14-L283C14

jpountz · 2023-12-05T12:59:17Z

But let's queue up after that?

Sure, no hurry.

are there are new codecs introduce that we gotta upgrade?

Indeed, you'll need to replace Lucene95Codec with Lucene99Codec and Lucene95HnswVectorsFormat with Lucene99HnswVectorsFormat when you upgrade. There are new options as well, e.g. you could use Lucene99HnswScalarQuantizedVectorsFormat to enable int8 scalar quantization.

lintool · 2023-12-05T13:13:22Z

int8

Nice. Is there int16 or float16 as well as intermediate step?

When we're ready for that, can you and @tteofili work on that together?

jpountz · 2023-12-05T13:17:22Z

Nice. Is there int16 or float16 as well as intermediate step?

Not at this point., we're missing native support for float16 in the JVM.

tteofili · 2023-12-05T13:20:48Z

sure I can work with @jpountz on the upgrade (and perhaps on config options for enabling quantization in HNSW in Anserini)

lintool · 2023-12-05T13:24:22Z

Nice. Is there int16 or float16 as well as intermediate step?

Not at this point., we're missing native support for float16 in the JVM.

And do you have numbers of speed/effectiveness tradeoffs vs. full float32?

If not, I guess we should rerun https://arxiv.org/abs/2308.14963 ?

jpountz · 2023-12-05T13:33:08Z

Mileage varies, the main benefit is that you only need one byte per dimension to fit in RAM to get decent performance, vs. 4 bytes per RAM without scalar quantization. So this allows addressing more data with the same amount of RAM.

It turns out that we accidentally turned on quantization on Lucene's nightly benchmarks between Nov 13th and yesterday, there was a noticeable ~30% speedup, even though all vectors already fit in memory at 4 bytes per dimension. http://people.apache.org/~mikemccand/lucenebench/VectorSearch.html

@benwtrent might have more info than I do.

jpountz · 2023-12-05T13:37:42Z

For reference, there have been lots of performance improvements in 9.8 and 9.9 for sparse retrieval too, see e.g. http://people.apache.org/~mikemccand/lucenebench/OrHighHigh.html over recent months. One optimization in particular, apache/lucene#12444 (annotation FK on the nigthly charts, and a blog that describes the optimization) should help significantly with cases that are hard for dynamic pruning, such as learned sparse representations. So I would expect much better numbers for Lucene if you were to run benchmarks from https://arxiv.org/abs/2110.11540 again.

jpountz · 2023-12-05T13:42:02Z

And do you have numbers of speed/effectiveness tradeoffs vs. full float32?

The PR that did the change has a few more numbers about speed and effectiveness: apache/lucene#12582 (comment)

lintool · 2023-12-05T13:42:17Z

re: HNSW - yup, I suppose faster is a given... my question is more about how much you give up in terms of effectiveness...

lintool · 2023-12-05T13:43:53Z

And do you have numbers of speed/effectiveness tradeoffs vs. full float32?

The PR that did the change has a few more numbers about speed and effectiveness: apache/lucene#12582 (comment)

Thanks, this is good info.

But as I always say... you need a real search task like MS MARCO, BEIR, etc.

benwtrent · 2023-12-05T13:49:00Z

The JVM just doesn't support f16. Reading from disk, doing fast vector operations, etc., its just bad. Even in JDK21.

There have been steps to fix this (finally adding an intrinsic for de/encoding f16), but its not there yet.

We cannot add f16 until there is something in Panama Vector that handles it.

lintool · 2023-12-20T01:34:38Z

Upgrade completed #2302

ChrisHegarty mentioned this issue Dec 5, 2023

Upgrade to Lucene 9.9.0 #2290

Closed

This was referenced Dec 5, 2023

Lucene 9.9: Benchmark HNSW improvements #2292

Closed

Lucene 9.9: Benchmark sparse improvements #2293

Closed

lintool closed this as completed Dec 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade to Lucene 9.9 #2288

Upgrade to Lucene 9.9 #2288

jpountz commented Dec 5, 2023

lintool commented Dec 5, 2023

jpountz commented Dec 5, 2023

lintool commented Dec 5, 2023

jpountz commented Dec 5, 2023

tteofili commented Dec 5, 2023

lintool commented Dec 5, 2023

jpountz commented Dec 5, 2023

jpountz commented Dec 5, 2023

jpountz commented Dec 5, 2023

lintool commented Dec 5, 2023

lintool commented Dec 5, 2023

benwtrent commented Dec 5, 2023

lintool commented Dec 20, 2023

Upgrade to Lucene 9.9 #2288

Upgrade to Lucene 9.9 #2288

Comments

jpountz commented Dec 5, 2023

lintool commented Dec 5, 2023

jpountz commented Dec 5, 2023

lintool commented Dec 5, 2023

jpountz commented Dec 5, 2023

tteofili commented Dec 5, 2023

lintool commented Dec 5, 2023

jpountz commented Dec 5, 2023

jpountz commented Dec 5, 2023

jpountz commented Dec 5, 2023

lintool commented Dec 5, 2023

lintool commented Dec 5, 2023

benwtrent commented Dec 5, 2023

lintool commented Dec 20, 2023