In [1]:
using NearestNeighbors
using BenchmarkTools

Let us create some data

In [2]:
data = rand(Float32, 128, 10^6)

128×1000000 Matrix{Float32}:
 0.103815   0.359401   0.0542722  …  0.601437   0.0161192  0.613765
 0.787693   0.0236868  0.335288      0.536818   0.906848   0.673667
 0.474631   0.771912   0.864373      0.96388    0.4876     0.181068
 0.0924482  0.214075   0.102792      0.0629652  0.248358   0.367585
 0.411883   0.112835   0.735768      0.547558   0.164784   0.783969
 0.574419   0.866531   0.332136   …  0.2064     0.226139   0.590548
 0.254123   0.188232   0.538184      0.653873   0.80007    0.617379
 0.844911   0.938796   0.696874      0.159127   0.987661   0.358852
 0.552221   0.426752   0.962379      0.208436   0.512279   0.709651
 0.62083    0.56921    0.252262      0.990145   0.486527   0.536565
 0.129705   0.225983   0.951813   …  0.964453   0.0843336  0.670711
 0.113989   0.27302    0.160227      0.826938   0.106803   0.633877
 0.510465   0.635516   0.730226      0.260457   0.281112   0.0579612
 ⋮                                ⋱                        
 0.477103   0.681787   0.8

Then we can instanciate a tree with the data and use it for searching vectors to a query vector

In [3]:
brutetree = BruteTree(data)

BruteTree{StaticArraysCore.SVector{128, Float32}, Euclidean}
  Number of points: 1000000
  Dimensions: 128
  Metric: Euclidean(0.0)
  Reordered: false

Let us consider a query vector

In [4]:
query = data[:,4];

The method **`knn`** returns the top `k` items to the query

In [5]:
k = 10
idx, distances = knn(brutetree, query, k) 

([672697, 263180, 968110, 435367, 30159, 334875, 810169, 100328, 301652, 4], Float32[3.611783, 3.6114755, 3.6000571, 3.581974, 3.5440912, 3.541534, 3.5718088, 3.5450552, 3.5250273, 0.0])

In [6]:
@benchmark idx, distances = knn(brutetree, query, k) 

BenchmarkTools.Trial: 199 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m24.899 ms[22m[39m … [35m 25.517 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m25.129 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m25.113 ms[22m[39m ± [32m102.701 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m [39m [39m [39m▁[39m▄[39m▁[39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂[39m▁[39m▁[39m [39m [39m▁[39m [39m▁[39m [32m [39m[39m▄[34m█[39m[39m▇[39m▂[39m▅[39m [39m▁[39m▇[39m▂[39m▂[39m [39m [39m▄[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▃[39m▁[39m▁[39m▃[39m▃[

In [10]:
query_batch = data[:,10:20];

In [11]:
@benchmark idx, distances = knn(brutetree, query_batch, 1) 

BenchmarkTools.Trial: 19 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m268.285 ms[22m[39m … [35m270.872 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m270.028 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m269.614 ms[22m[39m ± [32m942.168 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m█[39m [39m [39m▁[39m [39m█[39m [39m [39m [39m [39m█[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m [39m [39m [39m [39m [39m [32m [39m[34m▁[39m[39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m [39m▁[39m [39m [39m▁[39m▁[39m [39m▁[39m▁[39m▁[39m [39m▁[39m [39m [39m [39m [39m▁[39m [39m▁[39m [39m 
  [39m█[39m▁[39m▁[39m█

The query can also be a batch of vectors

In [12]:
idx, distances = knn(brutetree, query_batch, 1) 

([[10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20]], Vector{Float32}[[0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0]])

Here we can see that vectors from the query batch have the closest matches in indices 10 to 20 (which are the indices we used to select them!)

In [14]:
idx

11-element Vector{Vector{Int64}}:
 [10]
 [11]
 [12]
 [13]
 [14]
 [15]
 [16]
 [17]
 [18]
 [19]
 [20]

Note that creating a `BruteTree` instance can take some time 

In [39]:
@benchmark brutetree = BruteTree(data)

BenchmarkTools.Trial: 55 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m90.668 ms[22m[39m … [35m 93.859 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.42%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m91.376 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m91.523 ms[22m[39m ± [32m615.101 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.15% ± 0.21%

  [39m [39m [39m [39m [39m [39m▂[39m [39m [39m [39m [39m [39m█[39m█[39m [34m [39m[39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m▅[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m█[39m▅[39m▁[39m▁[39m█[3

Note that even id data is not store the time is similar

In [40]:
@benchmark brutetree2 = BruteTree(data; storedata=false)

BenchmarkTools.Trial: 55 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m90.756 ms[22m[39m … [35m95.558 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.44%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m92.104 ms              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m92.291 ms[22m[39m ± [32m 1.142 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.15% ± 0.21%

  [39m [39m [39m [39m [39m [39m [39m [39m▁[39m▃[39m [39m▃[39m▁[39m [39m [39m [39m▁[34m▁[39m[39m█[39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▇[39m▄[39m▁[39m▁[39m▄[39m▄[39m▄

If we want to decrease the constructiontime of the Brute tree we can use a single datapoint

In [64]:
aux = data[:,1];
@benchmark brutetree2 = BruteTree(aux;storedata=false)

BenchmarkTools.Trial: 10000 samples with 91 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m800.824 ns[22m[39m … [35m294.567 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 99.41%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m822.352 ns               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m920.741 ns[22m[39m ± [32m  2.945 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m3.18% ±  0.99%

  [39m▅[39m█[39m█[34m▇[39m[39m▅[39m▄[39m▄[39m▃[39m▃[39m▂[39m▂[39m▁[39m▁[39m▁[39m▁[39m▁[39m▁[39m▁[39m▁[32m▂[39m[39m▁[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m▂[39m▃[39m▃[39m▄[39m▃[39m▃[39m▃[39m▃[39m▂[39m▂[39m▁[39m▁[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂
  [39m█[39m█[39m

In [66]:
brutetree2

BruteTree{StaticArraysCore.SVector{128, Float32}, Euclidean}
  Number of points: 0
  Dimensions: 128
  Metric: Euclidean(0.0)
  Reordered: false

In [69]:
idx, distances = knn(brutetree2, data, query, 1) 

LoadError: MethodError: no method matching knn(::BruteTree{StaticArraysCore.SVector{128, Float32}, Euclidean}, ::Matrix{Float32}, ::Matrix{Float32}, ::Int64)
[0mClosest candidates are:
[0m  knn(::NNTree{V}, ::AbstractMatrix{T}, [91m::Int64[39m, ::Any) where {V, T<:Number} at ~/.julia/packages/NearestNeighbors/8gDgr/src/knn.jl:55
[0m  knn(::NNTree{V}, ::AbstractMatrix{T}, [91m::Int64[39m, ::Any, [91m::F[39m) where {V, T<:Number, F<:Function} at ~/.julia/packages/NearestNeighbors/8gDgr/src/knn.jl:55
[0m  knn(::NNTree{V}, [91m::Vector{T}[39m, [91m::Int64[39m, ::Any) where {V, T<:(AbstractVector)} at ~/.julia/packages/NearestNeighbors/8gDgr/src/knn.jl:17
[0m  ...