Skip to content

Add approximate LSH approach for MinHash #23

@fnothaft

Description

@fnothaft

Currently, our MinHashing scheme falls back to a LSH scheme for approximate MinHashing. This provides a reduction in data replication from n to b (where n is the number of elements and b is the number of buckets). However, more efficient approximate LSH schemes can achieve a further reduction. We should add a method like multiprobing:

Lv, Qin, et al. "Multi-probe LSH: efficient indexing for high-dimensional similarity search." Proceedings of the 33rd international conference on Very large data bases. VLDB Endowment, 2007.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions