Vectors in Search
Dice.com code for implementing the ideas discussed in the Vectors in Search talk from the Activate 2018 conference, by Simon Hughes ( Chief Data Scientist, Dice.com ), and my later talk 'Searching with Vectors' from the HayStack Conference in 2019. This extends my earlier work on 'Conceptual Search' which can be found here - https://github.com/DiceTechJobs/ConceptualSearch (including slides and video links). In this talk, I present a number of different approaches for searching vectors at scale using an inverted index. This implements approaches to Approximate k-Nearest Neighbor Search including:
- LSH (using the Sim Hash)
- K-Means Tree
- Vector Thresholding
and describes how these ideas can be implemented and queried efficiently within an inverted index.
- Code for implementing the k-means tree, LSH sim hash and vector thresholding algorithms, and indexing and searching vectors in solr using these techniques.
- Java code for implementing the custom similarity classes and payloadEdismax parser described in the talk.
- Xml snippets for importing the solr plugins from the 'solr_vectors_in_search_plugins' java code.
- Solr Version - 7.5
- Python Version - 3.x+ (3.5 used)