Skip to content

Latest commit

 

History

History
128 lines (105 loc) · 9.44 KB

knn_benchmark.rst

File metadata and controls

128 lines (105 loc) · 9.44 KB

Benchmark the performance of KNN algorithms

In this doc, we benchmark the performance on multiple K-Nearest Neighbor algorithms implemented by dgl.knn_graph.

Given a dataset of N samples with D dimensions, the common use case of KNN algorithms in graph learning is to build a KNN graph by finding the K nearest neighbors for each of the N samples among the dataset.

Empirically, the three parameters, N, D, and K, all have impact on the computation cost. To benchmark the algorithms, we pick a few represensitive datasets to cover most common scenarios:

  • A synthetic dataset with mixed gaussian samples: N = 1000, D = 3.
  • A point cloud sample from ModelNet: N = 10000, D = 3.
  • Subsets of MNIST
    • A small subset: N = 1000, D = 784
    • A medium subset: N = 10000, D = 784
    • A large subset: N = 50000, D = 784

Some notes:

  • bruteforce-sharemem is an optimized implementation of bruteforce on GPU.
  • kd-tree is currently only implemented on CPU.
  • bruteforce-blas conducts matrix multiplication, thus is memory inefficient.
  • nn-descent is an approximate algorithm, and we also report the recall rate of its result.

Results

In this section, we show the runtime and recall rate (where applicable) for the algorithms under various scenarios.

The experiments are run on an Amazon EC2 P3.2xlarge instance. This instance has 8 vCPUs with 61GB RAM, and one Tesla V100 GPU with 16GB RAM. In terms of the environment, we obtain the numbers with DGL==0.7.0(64d0f3f), PyTorch==1.8.1, CUDA==11.1 on Ubuntu 18.04.5 LTS.

  • Mixed Gaussian:
Model + CPU ------------------+ K = 8 -------------------+ K = 64 GPU ------------------+ K = 8 ------------------+ K = 64
bruteforce-blas 0.010 0.011 0.002 0.003
kd-tree 0.004 0.006 n/a n/a
bruteforce 0.004 0.006 0.126 0.009
bruteforce-sharemem n/a n/a 0.002 0.003
nn-descent 0.014 (R: 0.985) 0.148 (R: 1.000) 0.016 (R: 0.973) 0.077 (R: 1.000)
  • Point Cloud
Model + CPU ------------------+ K = 8 -------------------+ K = 64 GPU ------------------+ K = 8 ------------------+ K = 64
bruteforce-blas 0.359 0.432 0.010 0.010
kd-tree 0.007 0.026 n/a n/a
bruteforce 0.074 0.167 0.008 0.039
bruteforce-sharemem n/a n/a 0.004 0.017
nn-descent 0.161 (R: 0.977) 1.345 (R: 0.999) 0.086 (R: 0.966) 0.445 (R: 0.999)
  • Small MNIST
Model + CPU ------------------+ K = 8 -------------------+ K = 64 GPU ------------------+ K = 8 ------------------+ K = 64
bruteforce-blas 0.014 0.015 0.002 0.002
kd-tree 0.179 0.182 n/a n/a
bruteforce 0.173 0.228 0.123 0.170
bruteforce-sharemem n/a n/a 0.045 0.054
nn-descent 0.060 (R: 0.878) 1.077 (R: 0.999) 0.030 (R: 0.952) 0.457 (R: 0.999)
  • Medium MNIST
Model + CPU ------------------+ K = 8 -------------------+ K = 64 GPU ------------------+ K = 8 ------------------+ K = 64
bruteforce-blas 0.897 0.970 0.019 0.023
kd-tree 18.902 18.928 n/a n/a
bruteforce 14.495 17.652 2.058 2.588
bruteforce-sharemem n/a n/a 2.257 2.524
nn-descent 0.804 (R: 0.755) 14.108 (R: 0.999) 0.158 (R: 0.900) 1.794 (R: 0.999)
  • Large MNIST
Model + CPU ------------------+ K = 8 -------------------+ K = 64 GPU ------------------+ K = 8 ------------------+ K = 64
bruteforce-blas 21.829 22.135 Out of Memory Out of Memory
kd-tree 542.688 573.379 n/a n/a
bruteforce 373.823 432.963 10.317 12.639
bruteforce-sharemem n/a n/a 53.133 58.419
nn-descent 4.995 (R: 0.658) 75.487 (R: 0.999) 1.478 (R: 0.860) 15.698 (R: 0.999)

Conclusion

  • As long as you have enough memory, bruteforce-blas is the default algorithm to go with.
  • Specifically, when D is small and the data is on CPU, kd-tree is the best algorithm.