Multiprobe L2 LSH first pass #123

alexklibisz · 2020-08-02T01:30:09Z

First pass at multiprobe L2 LSH based on paper: Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search, Qin, et. al.
This uses the naive method for picking perturbation vectors. Namely it enumerates all of the 3^k possible perturbation vectors and uses a heap to pick the best ones using the x_i(\delta) scores from section 4.3 in the paper.
Next I'll implement the optimizations from sections 4.4 and 4.5.

This already shows a meaningful improvement in the Sift dataset benchmarks, specifically:

Mapping with L=300, k=2, r=1, query with 4000 candidates produces p90 recall/duration = (0.73, 54ms)
Mapping with L=100, k=2, r=1, query with 4000 candidates and 3 probes produces p90 recall/duration = (0.78, 54ms)

So you can store 3x fewer hashes and get better recall with the exact same latency.

This required some internal changes to the way the HashingFunctions are cached.
I also simplified how the MatchHashesAndScoreQuery handles cases with all-zero counts by just returning a DocIdSetIterator.empty().

…tests to test them accurately.

…ith 3x fewer hashes using multiprobing

alexklibisz added 30 commits July 29, 2020 07:58

Trying to checkout LFS files

7c508fa

Debugging

f6d1ae8

set lfs: true

ccbf71b

Debugging

d2e7045

Checking out LFS seems to work now

1926f08

Added a step to freeze the index

446950f

Get rid of freezing

6a98eb5

remove git lfs files

2931c3a

Aws cli

5d26964

Aws cli

655e900

gitignore what a fucking mess

1526e87

Overcoming terrible git tooling

16248ae

First bit of translation from the old ES pr

3bdc07c

Get rid of FreezeRequest. Better continuous benchmark results.

3a6229e

Filtered down to promising benchmark hyperparams for glove25 and sift

95a0dc7

fix s3 sync target

5985359

Increase timeout because github is slower

00cc50c

Merge branch 'perf-fix-benchmarks' into l2-lsh

f9b7048

Added a hashWithProbes method to L2Lsh model

aa34235

first pass implementation compiles

9f8f8be

Reasonable results with L2 LSH. Will have to likely modify L2 recall …

d83143d

…tests to test them accurately.

Fixed a bug in the multiprobe implementation and added some tests

a5b47b2

L=100,k=2,r=2,probes=5 gets 88% recall in 88ms (120ms for exact)

bd2199b

Increase memory of ES data notes in docker-compose

2ab985e

Merge branch 'master' into l2-lsh

75caaab

Safer way to handle 0 matches

4265961

continuous benchmark example where you can get the same performance w…

86e8d8b

…ith 3x fewer hashes using multiprobing

Added tests for multiprobe LSH

8f1dc94

cleanup

7160f3b

cleanup'

b235cf4

alexklibisz added 3 commits August 1, 2020 21:42

docs for probes parameter

d768952

Changelog

d6f15ee

Fix example app

522834a

alexklibisz merged commit 6fe9301 into master Aug 2, 2020

alexklibisz deleted the l2-lsh branch August 2, 2020 02:15

alexklibisz mentioned this pull request Aug 2, 2020

MultiProbe for L2 Similarity #73

Closed

This was referenced Sep 25, 2022

Build: Use Github Actions caching instead of SBT caching #431

Merged

Build: Caching followups #432

Merged

This was referenced Oct 25, 2022

Build: Debug Jekyll permissions #449

Merged

Docs: Update performance page with latest ann-benchmarks results and automate page updates #451

Merged

nfsantos mentioned this pull request Nov 15, 2022

Support ES 7.17.7 #456

Closed

alexklibisz mentioned this pull request Jan 17, 2023

Dependencies: downgrade elasticsearch-7x branch to Elasticsearch 7.17.8 #465

Merged

alexklibisz mentioned this pull request May 29, 2023

Performance: Remove usage of Math.fma in PanamaFloatVectorOps #520

Merged

hindog mentioned this pull request Sep 28, 2023

- Upgrade ES to 8.10.2 #557

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiprobe L2 LSH first pass #123

Multiprobe L2 LSH first pass #123

alexklibisz commented Aug 2, 2020 •

edited

Multiprobe L2 LSH first pass #123

Multiprobe L2 LSH first pass #123

Conversation

alexklibisz commented Aug 2, 2020 • edited

alexklibisz commented Aug 2, 2020 •

edited