Skip to content
forked from 1flei/aws_alsh

Sublinear Time NNS over Generalized Weighted Space (ICML 2019)

Notifications You must be signed in to change notification settings

HuangQiang/aws_alsh

 
 

Repository files navigation

Sublinear Time Nearest Neighbor Search over Generalized Weighted Space

Welcome to the AWS_ALSH GitHub!

AWS_ALSH is a package for the problem of Nearest Neighbor Search (NNS) over generalized weighted space. Given a dataset of n data points and a query q with a weight vector w in the Eculidean space, this problem aims to find the nearest point to q such that the weighted Euclidean distance is minimized. It is a fundamental problem, and it has wide applications in many data mining and machine learning tasks.

In the package (AWS_ALSH), we introduce two Asymmetric Locality-Sensitive Hashing (ALSH) schemes that can deal with this problem in sublinear time and work on Arbitraty Weighted Space (AWS). We call them weight-oblivious. The details of AWS_ALSH can be found from our work Sublinear Time Nearest Neighbor Search over Generalized Weighted Space which has been published in ICML 2019.

Datasets

We provide three datasets Mnist, MovieLens, and Sift for evaluations. They can be downloaded from OneDrive. The statistics of the datasets are summarized as follows.

Datasets #Data Points #Queries Dimensionality
Mnist 60,000 1,000 784
MovieLens 52,889 1,000 150
Sift 1,000,000 1,000 128

For each query set, we generate the weight vector w's from five typical distributions, whihc are illustrated as follows.

Types Illustrations
identical all "1"s
binary uniformly distributed in {0,1}d
normal d-dimensional normal distributions N(0,I)
uniform uniformly distributed in [0,1]d
negative all "-1"s

Compilation

To build the project, use the following instructions:

mkdir build
cd build
cmake ..
make -j

Usages

In order to run the scripts, the datasets/query should be put in the data/ folder (You should create such a folder by yourself and copy the datasets and queries to this folder), or modify the scripts/dataset_config.py to change the path of datasets and queries accordingly.

We provide an example as follows to illustrate how to run this package:

./alsh -A fraction_recall_s2alsh -n 60000 -q 1000 -d 784 -D ../data/Mnist784/Mnist784.ds -Q ../data/Mnist784/Mnist784.q -W ../data/Mnist784/Mnist784_normal.w -G ../data/Mnist784/Mnist784_normal.gt -O output.out -U 3.140000 --data_hash_filename data_hash.dh --query_hash_filename query_hash.qh

We also provide the python scripts on the scripts/ folder that can be used to run the precision_recall and fraction_recall experiments automatically, e.g.

python ../scripts/run_ground_truth.py
python ../scripts/run_precision_recall.py
python ../scripts/run_fraction_recall.py

The results for the precision-recall experiments can be found as follows.

drawing

If you need more details to run this package, please type the following command to get more information.

./alsh --help

Reference

Please use the following bibtex to cite this work when you use AWS_ALSH in your paper.

@inproceedings{lei2019sublinear,
  title={Sublinear time nearest neighbor search over generalized weighted space},
  author={Lei, Yifan and Huang, Qiang and Kankanhalli, Mohan and Tung, Anthony},
  booktitle={International Conference on Machine Learning},
  pages={3773--3781},
  year={2019},
  organization={PMLR}
}

It is welcome to contact Lei Yifan (leiyifan@u.nus.edu) or Huang Qiang (huangq@comp.nus.edu.sg) if you meet any issue. Thank you.

About

Sublinear Time NNS over Generalized Weighted Space (ICML 2019)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 83.9%
  • Python 13.0%
  • C 2.0%
  • CMake 1.1%