Skip to content

Investigate and Implement Approximate Nearest Neighbor Search #106

@m-goggins

Description

@m-goggins

With the first round of results collected from exact neighbor search, we found the search times to be prohibitively slow (about 1.8 seconds per input). This won't work in production, so we'll need the speedup promised by approximate neighbor search. The scope of this ticket is two-fold:

  1. implement the HNSW approximate neighbor search algorithm using hnswlib, and tie it to sentence transformers in the performance evaluation script
  2. create a hyperparameter optimization script for HNSW that uses a grid-search over its relevant construction parameters to find the values that best optimize recall with respect to exact search and search time. Note that we're using recall here because the benchmark of approximate search is not how right it is in a vacuum, but how close it can get to exact search.

Metadata

Metadata

Assignees

Labels

Algorithm DevelopmentTasks related to training, testing, evaluating and improving language models

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions