Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
benchmark
src ES - integration test checking if plugin was installed May 17, 2018
.gitignore ES - refactoring and moving elasticsearch-aknn to root May 2, 2018
LICENSE.txt ES - refactoring and moving elasticsearch-aknn to root May 2, 2018
NOTICE.txt ES - removed unused tests and small aesthetic fixes May 16, 2018
README.md Update README.md May 22, 2018
build.gradle
settings.gradle ES - refactoring and moving elasticsearch-aknn to root May 2, 2018
testplugin.sh ES - removed unused tests and small aesthetic fixes May 16, 2018

README.md

Elasticsearch-Aknn

Elasticsearch plugin for approximate K-nearest-neighbor querires on floating-point vectors using locality sensitive hashing.

The API for the three main endpoints and main points about implementation are documented at the root of this repository.

See the testplugin.sh script for an outline of building and installing the plugin.

See the benchmarks directory for examples on interacting with the plugin programmatically via Python and the requests library.

The long-term plan for this plugin is to extract it to polish it up and move it to its own repository. I've begun doing this on the dev branch of the elasticsearch-aknn repository.

Planned Improvements

  1. Implement integration tests. Elasticsearch has some nice integration testing functionality, but the documentation is very scarce.
  2. Add proper error checking and error responses to the endpoints to prevent silent/ambiguous errors. For example, Elasticsearch prevents lowercase index names and fails to index such a document, but the endpoint still returns 200.
  3. Clean up the JSON-Java serialization and deserialization, especially the conversion of JSON lists of lists to Java List<List<Double>> to Java Double [][] to RealMatrix.
  4. Enforce an explicit mapping and types for new Aknn LSH models. For example, the LSH hyperplanes should not be indexed and can likely be stored as half_float / Java float) to save space / network latency.
  5. Enforce an explicit mapping and types for _aknn_vector and _aknn_hashes entries. For example, _aknn_vector should not be indexed and can likley be stored as a half_float / Java float.
  6. Determine a proper place for defining/changing plugin configurations. For example, the name of the vector and hashes items.
  7. Implement alternative distance functions, starting with cosine distance.