In [7]:
using NearestNeighbors
using BenchmarkTools

Let us create some data

In [8]:
data = rand(Float32, 128, 10^6)

128×1000000 Matrix{Float32}:
 0.687857   0.785659   0.676999    …  0.523741    0.809215    0.295437
 0.149933   0.339917   0.816843       0.00970232  0.757438    0.243993
 0.658733   0.385988   0.43249        0.378       0.419239    0.139605
 0.459716   0.195461   0.177259       0.287488    0.344925    0.282759
 0.224169   0.814845   0.42296        0.51748     0.775577    0.601469
 0.899717   0.144047   0.240218    …  0.344248    0.00593603  0.136317
 0.341651   0.245336   0.74273        0.442731    0.461624    0.15544
 0.557462   0.570779   0.715726       0.947273    0.353633    0.807687
 0.339329   0.681877   0.776807       0.402773    0.865587    0.743952
 0.356522   0.502739   0.206005       0.831999    0.118927    0.98116
 0.549483   0.729211   0.641927    …  0.145109    0.171613    0.22724
 0.598691   0.332117   0.60057        0.887142    0.328095    0.986986
 0.394814   0.402753   0.939232       0.508874    0.558687    0.34845
 ⋮                                 ⋱                

Then we can instanciate a tree with the data and use it for searching vectors to a query vector

In [9]:
brutetree = BruteTree(data)

BruteTree{StaticArraysCore.SVector{128, Float32}, Euclidean}
  Number of points: 1000000
  Dimensions: 128
  Metric: Euclidean(0.0)
  Reordered: false

Let us consider a query vector

In [10]:
query = data[:,4];

The method **`knn`** returns the top `k` items to the query

In [13]:
k = 10
idx, distances = knn(brutetree, query, k) 

([368682, 593392, 446008, 8071, 512932, 777413, 763039, 163584, 525605, 4], Float32[3.702029, 3.6918645, 3.6857388, 3.660195, 3.674085, 3.6798215, 3.6688938, 3.6510324, 3.5827184, 0.0])

In [29]:
@btime knn($brutetree, $query, 1) 

  24.324 ms (2 allocations: 128 bytes)


([4], Float32[0.0])

In [31]:
@benchmark idx, distances = knn(brutetree, query, k) 

BenchmarkTools.Trial: 204 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m24.327 ms[22m[39m … [35m 26.432 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m24.498 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m24.589 ms[22m[39m ± [32m280.331 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m█[39m▄[39m▇[39m▆[39m▁[39m [34m▃[39m[39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▆[39m▅[39m█[39m█[39m█[

In [15]:
query_batch = data[:,10:20];

In [8]:
@benchmark idx, distances = knn(brutetree, query_batch, 1) 

BenchmarkTools.Trial: 19 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m263.135 ms[22m[39m … [35m267.479 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m265.197 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m265.090 ms[22m[39m ± [32m  1.074 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [32m▃[39m[34m [39m[39m [39m [39m [39m [39m [39m█[39m▃[39m [39m [39m [39m [39m▃[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▇[39m▁[39m▇[39m▁

The query can also be a batch of vectors

In [17]:
idx, distances = knn(brutetree, query_batch, 1) 

([[10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20]], Vector{Float32}[[0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0]])

Here we can see that vectors from the query batch have the closest matches in indices 10 to 20 (which are the indices we used to select them!)

In [10]:
idx

11-element Vector{Vector{Int64}}:
 [10]
 [11]
 [12]
 [13]
 [14]
 [15]
 [16]
 [17]
 [18]
 [19]
 [20]

Note that creating a `BruteTree` instance can take some time 

In [11]:
@benchmark brutetree = BruteTree(data)

BenchmarkTools.Trial: 55 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m90.558 ms[22m[39m … [35m 94.109 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.43%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m91.591 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m91.803 ms[22m[39m ± [32m721.254 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.15% ± 0.21%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m█[39m [39m▄[39m [39m [39m [39m [34m▂[39m[39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▄[39m▁[39m▁[39m▁[39m▄[3

Note that even id data is not store the time is similar

In [12]:
@benchmark brutetree2 = BruteTree(data; storedata=false)

BenchmarkTools.Trial: 55 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m90.483 ms[22m[39m … [35m 93.861 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.43%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m91.286 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m91.537 ms[22m[39m ± [32m857.910 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.22% ± 0.22%

  [39m▁[39m [39m [39m [39m [39m [39m [39m▃[39m [39m▁[39m█[39m [39m [39m [34m [39m[39m [39m [39m [39m▁[32m▁[39m[39m [39m▁[39m [39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m█[39m▇[39m▁[39m▄[39m▁[3

If we want to decrease the constructiontime of the Brute tree we can use a single datapoint

In [20]:
aux = data[:,1];
@benchmark brutetree2 = BruteTree(aux;storedata=false)

BenchmarkTools.Trial: 10000 samples with 136 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m705.272 ns[22m[39m … [35m102.496 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 98.90%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m724.265 ns               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m771.833 ns[22m[39m ± [32m  1.020 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m1.31% ±  0.99%

  [39m▄[39m▇[39m█[39m▇[39m▆[34m▄[39m[39m▅[39m▅[39m▅[39m▄[39m▃[39m▂[39m▂[39m▂[39m▂[39m▁[32m▂[39m[39m▂[39m▂[39m▂[39m▂[39m▂[39m▂[39m▁[39m▂[39m▁[39m▁[39m▁[39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂[39m▂[39m▃[39m▃[39m▃[39m▃[39m▂[39m▃[39m▂[39m▂[39m▂[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂
  [39m█[39m█[39

In [21]:
idx, distances = knn(brutetree2, data, query, 1) 

LoadError: UndefVarError: brutetree2 not defined