Learned Metric Index Framework

Learned Metric Index (LMI) is a ML-based index for computing approximate nearest neighbor queries in complex data.

You can use LMI to index any kind of high-dimensional vectors. The core idea is the use of various machine learning models as nodes, and having the inference of these models produce the search results. In [1] we employ supervised learning from a pre-existing partitioning of the data. In [2] the process is completely unsupervised.

We also worked with students that used LMI for searching in text data and protein data

Searching in action:

You can check out our web application demonstrating our search on images and proteins chains.

Dowload the data using Mendeley data: https://data.mendeley.com/datasets/8wp73zxr47/6

Unzip the data:

$ # (1) navigate to the directory you want to use
$ # (2) Extract the compressed input data files either manually or using the unzip command (if available)
$ unzip data.zip
$ # (3) Extract the compressed source code files
$ unzip LMIF.zip

Installation

Prerequisites:

Docker (ver. >=20.10.13)
45GB of storage space
300GB of main memory. This amount of memory is necessary to run all the experiments. If the reader wishes to run only some subset, less memory might be sufficient

Installation Method 1: With docker

$ # (4) Navigate to the source code directory and build the image:
$ cd LMIF && docker build -t repro-lmi -f Dockerfile . --network host
$ # (5) Check if the `repro-lmi` image was successfully built by listing your docker images:
$ docker images
$ # (6) Start the interactive session and map the input and output directories from/to your local machine. Note that the full path to your current directory needs to be provided:
$ docker run -it -v <full-path-host-machine>/LMIF/outputs:/learned-indexes/outputs -v <full-path-host-machine>/datasets:/learned-indexes/data repro-lmi /bin/bash

Installation Method 2: Without docker

Note that this method was not tested on Windows or MacOS.

$ # create and activate the python virtual environment
$ python -m venv env && source env/bin/activate
$ # install the required packages
$ pip install -r requirements.txt

Usage

$ # (8) Run the quick experiments, generate the report
$ python3 run-experiments.py `cat quick-experiments.txt` 2>&1 | tee outputs/experiments-quick.log
$ python3 create-report.py
$ # The partial output is in outputs/report.html
$ # (9) Run the rest of the experiments
$ python3 run-experiments.py `cat experiments.txt` 2>&1 | tee outputs/experiments.log
$ python3 create-report.py

Note that running all of the experiments is quite time and memory consuming, took us 74 days and 350GB RAM on a one-core Intel Xeon Gold 5120.

Generate the report:

$ python3 create-report.py outputs/

Datasets

Dataset	N. of objects	Descriptor length	Filesize (GB) \
CoPhIR	1M	284	0.748
Profiset	1M	4096	23
MoCap	350k	4096	6.8

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
experiment-outputs-from-authors		experiment-outputs-from-authors
experiment-setups		experiment-setups
img		img
lmi		lmi
output-files		output-files
results-from-primary-paper		results-from-primary-paper
supplementary-experiment-setups		supplementary-experiment-setups
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
calculate-correlation.py		calculate-correlation.py
create-report.py		create-report.py
experiments.txt		experiments.txt
quick-experiments.txt		quick-experiments.txt
requirements-ci.txt		requirements-ci.txt
requirements.txt		requirements.txt
run-experiments-valid.py		run-experiments-valid.py
run-experiments-valild.py		run-experiments-valild.py
run-experiments.py		run-experiments.py
use-case-1.ipynb		use-case-1.ipynb
use-case-2.ipynb		use-case-2.ipynb
use-case-3.ipynb		use-case-3.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learned Metric Index Framework

Contents

Introduction

Searching in action:

Dowload the data using Mendeley data: https://data.mendeley.com/datasets/8wp73zxr47/6

Unzip the data:

Installation

Prerequisites:

Installation Method 1: With docker

Installation Method 2: Without docker

Usage

Datasets

Workflow

All of the experiments (65)

A Single experiment

About

Releases

Packages

Languages

License

TerkaSlan/LMIF

Folders and files

Latest commit

History

Repository files navigation

Learned Metric Index Framework

Contents

Introduction

Searching in action:

Dowload the data using Mendeley data: https://data.mendeley.com/datasets/8wp73zxr47/6

Unzip the data:

Installation

Prerequisites:

Installation Method 1: With docker

Installation Method 2: Without docker

Usage

Datasets

Workflow

All of the experiments (65)

A Single experiment

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages