Skip to content
Dice.com repo to accompany the dice.com 'Vectors in Search' talk by Simon Hughes, from the Activate 2018 search conference, and the 'Searching with Vectors' talk from Haystack 2019 (US). Builds upon my conceptual search and semantic search work from 2015
Branch: master
Clone or download
Latest commit d5b7f07 Apr 27, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
python Update encoding logic to work with lsh Oct 17, 2018
solr_configs Formatting changes Oct 16, 2018
solr_plugins rename dir Oct 16, 2018
.gitignore Update gitignore, remove java class files and .iml Oct 16, 2018
LICENSE Initial commit Oct 12, 2018
Readme.md Update Readme.md Apr 26, 2019

Readme.md

Vectors in Search

Dice.com code for implementing the ideas discussed in the Vectors in Search talk from the Activate 2018 conference, by Simon Hughes ( Chief Data Scientist, Dice.com ), and my later talk 'Searching with Vectors' from the HayStack Conference in 2019. This extends my earlier work on 'Conceptual Search' which can be found here - https://github.com/DiceTechJobs/ConceptualSearch (including slides and video links). In this talk, I present a number of different approaches for searching vectors at scale using an inverted index. This implements approaches to Approximate k-Nearest Neighbor Search including:

  • LSH (using the Sim Hash)
  • K-Means Tree
  • Vector Thresholding

and describes how these ideas can be implemented and queried efficiently within an inverted index.

Directory Structure

  • python
    • Code for implementing the k-means tree, LSH sim hash and vector thresholding algorithms, and indexing and searching vectors in solr using these techniques.
  • solr_plugins
    • Java code for implementing the custom similarity classes and payloadEdismax parser described in the talk.
  • solr_configs
    • Xml snippets for importing the solr plugins from the 'solr_vectors_in_search_plugins' java code.

Implementation Details

  • Solr Version - 7.5
  • Python Version - 3.x+ (3.5 used)

Links

You can’t perform that action at this time.