Skip to content

cuulee/faiss-on-disk-example

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

faiss-on-disk-example

This repo contains example code to run faiss to search for nearest neighbors in a dense vector dataset not fitting into RAM (see blogpost).

Running the examples

To run the example, on a machine running Docker, run:

docker build -t nnsearch:latest .
docker run --name nn -d nnsearch:latest
docker exec -it nn bash
cd workspace

and then get and inflate 1M GIST vectors (a benchmark dataset for vector nearest-neighbors search) with:

wget ftp://ftp.irisa.fr/local/texmex/corpus/gist.tar.gz
tar -xzvf gist.tar.gz 

To perform nearest neighbors search with numpy (this can fail on machines not having 8GB+ of RAM for the process), run:

cd src
python numpy_inference.py

To perform the same search with faiss (meant to scale to large numbers of vectors), run:

python faiss_training.py
python faiss_inference.py

when done with runs, make clean in the root folder should clean up all files created along the way.

Profiling

To monitor memory usage during script execution one can use memory_profiler:

# requires to have run python faiss_training.py before
mprof run faiss_inference.py
# generate memory usage plot vs time
mprof plot -o faiss_inference

About

Example of out-of-RAM k-nearest neighbors search using faiss

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 89.8%
  • Dockerfile 8.3%
  • Makefile 1.9%