# Read SIFT 1 million 

In [1]:
path = joinpath(homedir(), "Datasets", "SIFT1M",
    "sift-128-euclidean.hdf5")

"/Users/davidbuchaca/Datasets/SIFT1M/sift-128-euclidean.hdf5"

In [2]:
using HDF5

In [3]:
f = h5open(path, "r")

🗂️ HDF5.File: (read-only) /Users/davidbuchaca/Datasets/SIFT1M/sift-128-euclidean.hdf5
├─ 🏷️ distance
├─ 🔢 distances
├─ 🔢 neighbors
├─ 🔢 test
└─ 🔢 train

In [4]:
X_tr_vecs = read(f["train"])
X_te_vecs = read(f["test"]);
neighbors = read(f["neighbors"])
distances = read(f["distances"])

@show size(X_tr_vecs)
@show size(X_te_vecs)
@show size(neighbors)
@show size(distances)

size(X_tr_vecs) = (128, 1000000)
size(X_te_vecs) = (128, 10000)
size(neighbors) = (100, 10000)
size(distances) = (100, 10000)


(100, 10000)

### Mean Squared Error

In [5]:
function MSE(X, query)
    d = (query .- X) .* (query .- X)
    res = d /length(query);
    return sum(res, dims=1)
end

MSE (generic function with 1 method)

In [6]:
function MSE_2(X, query)
    n_features, n_examples = size(X)
    result = zeros(n_examples)
    for m in 1:n_examples
        res = zero(eltype(X))
        for j in 1:n_features
            aux = (query[j] .- X[j,m])
            res += aux * aux
        end
        result[m] = res/n_features
    end
    return result
end

MSE_2 (generic function with 1 method)

In [7]:
query = X_te_vecs[:,1];

In [8]:
using BenchmarkTools

In [9]:
@benchmark MSE(X_te_vecs, query)

BenchmarkTools.Trial: 677 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m2.326 ms[22m[39m … [35m42.588 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 82.58%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m6.954 ms              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m7.367 ms[22m[39m ± [32m 6.235 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m15.24% ± 15.16%

  [39m▃[39m▅[39m [39m [39m [39m▄[39m▆[34m█[39m[32m▂[39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m█[39m█[39m▇[39m▆[39m▇[39m█[39m█[3

In [10]:
@benchmark MSE_2(X_te_vecs, query)

BenchmarkTools.Trial: 3317 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m1.364 ms[22m[39m … [35m  2.351 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m1.454 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m1.491 ms[22m[39m ± [32m109.394 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m▁[39m▅[39m▆[39m█[39m▇[39m▅[39m▆[39m▅[39m▃[39m▁[34m▁[39m[39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▄[39m█[39m█[39m█[39m█[39m█[39

In [11]:
@benchmark MSE(X_tr_vecs, query)

BenchmarkTools.Trial: 7 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m673.161 ms[22m[39m … [35m895.941 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.20% … 16.72%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m821.614 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 9.06%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m819.089 ms[22m[39m ± [32m 73.914 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m10.33% ±  5.52%

  [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m [34m▁[39m[32m▁[39m[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m█[39m [39m 
  [39m█[39m▁[39m▁[

In [12]:
@benchmark MSE_2(X_tr_vecs, query)

BenchmarkTools.Trial: 32 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m144.341 ms[22m[39m … [35m164.304 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m157.739 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m156.720 ms[22m[39m ± [32m  4.986 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m▁[39m [39m [39m [39m [39m [39m▁[39m [39m [39m▁[39m [39m [39m▁[39m [39m [39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m [39m [39m [39m [39m▁[39m▁[39m▁[39m▁[39m█[39m [39m▁[39m▁[32m [39m[39m▁[39m▁[34m▁[39m[39m█[39m [39m▁[39m▁[39m█[39m [39m▁[39m [39m█[39m▁[39m [39m█[39m [39m█[39m [39m [39m [39m [39m▁[39m [39m 
  [39m█[39m▁[39m▁[39m▁

### Finding top k distances (and their ids)


The first naive thing we can do consist on computing all distances and then sorting them to get the top k closest vectors to the query vector

In [13]:
function top_k_ids(X, query)
    distances = MSE_2(X, query)
    top_k_indices = sortperm(distances)
    return top_k_indices
end

top_k_ids (generic function with 1 method)

In [14]:
@benchmark top_k_ids(X_te_vecs, query)[1:10]

BenchmarkTools.Trial: 2131 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m2.001 ms[22m[39m … [35m 40.367 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 94.28%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m2.311 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m2.332 ms[22m[39m ± [32m837.968 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.77% ±  2.04%

  [39m [39m [39m [39m▃[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m [39m▁[39m▁[39m▂[39m▄[39m▄[39m▄[39m▆[39m▇[39m▅[39m▇[39m█[34m▆[39m[39m▅[32m▅[39m[39m▆[39m▂[39m▄[39m▅[39m▄[39m▄[39m▂[39m▂[39m [39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▃[39m▄[39m█[39m█[39m▇[39m▆[

A slightly better approach consist on using `partialsortperm` to simply sort a subset of the distances vector.

In [15]:
function top_k_ids_2(X, query, k)
    distances = MSE_2(X, query)
    top_k_indices = partialsortperm(distances, 1:k)
    return top_k_indices
end

top_k_ids_2 (generic function with 1 method)

In [16]:
@benchmark top_k_ids_2(X_te_vecs, query, 10)

BenchmarkTools.Trial: 2941 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m1.367 ms[22m[39m … [35m 41.973 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 95.87%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m1.676 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m1.688 ms[22m[39m ± [32m756.365 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.81% ±  1.77%

  [39m [39m [39m [39m [39m [39m [39m▃[39m▅[39m▂[39m▂[39m [39m [39m [39m [39m [39m [39m▁[39m▂[39m▁[39m [39m▂[39m▆[39m▆[39m▅[39m▄[39m▂[39m▄[34m▇[39m[32m▆[39m[39m▆[39m▅[39m▆[39m█[39m▅[39m▃[39m▂[39m▂[39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▁[39m▁[39m▁[39m▁[39m▁[39m▄[

### Storing top k distances in a priority queue

A better alternative consists on using a priority queue. This. queue will keep only k distances in memory (no need to store all distances between the query point and all possible candidates).

In [17]:
sort!([1,54,3,24,10])

5-element Vector{Int64}:
  1
  3
 10
 24
 54

In [18]:
a = [1,2,3,4,5]

5-element Vector{Int64}:
 1
 2
 3
 4
 5

In [19]:
using BenchmarkTools

In [20]:
function MSE_3(X, query, top_k)
    result = sort(MSE_2(X[:, 1:top_k], query))
    n_features, n_examples = size(X)
    
    for m in top_k:n_examples
        res = zero(eltype(X))
        for j in 1:n_features
            aux = (query[j] .- X[j,m])
            res += aux * aux
        end
        dist = res/n_features
        
        # see if current mse is in the top pile
        if dist < result[end]
            j = top_k 
            #revisamos la lista resultado de izq a derecha
            while dist < result[j-1]
                j = j-1
                if j == 1
                    break
                end
            end
            
            result[j+1:end] .= result[j:end-1]
            result[j] = dist
         end

    end
    return result
end

MSE_3 (generic function with 1 method)

In [21]:
@benchmark MSE_3(X_tr_vecs, query, 10) 

BenchmarkTools.Trial: 32 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m150.011 ms[22m[39m … [35m163.876 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m157.522 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m157.223 ms[22m[39m ± [32m  3.915 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m [39m [39m█[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▃[39m [39m [39m [39m [34m [39m[32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▃[39m [39m▃[39m [39m [39m▃[39m [39m [39m [39m▃[39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▇[39m▁[39m▁[39m▁

In [22]:
@benchmark top_k_ids_2(X_tr_vecs, query, 10)

BenchmarkTools.Trial: 29 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m162.555 ms[22m[39m … [35m204.958 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 15.07%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m173.513 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m173.905 ms[22m[39m ± [32m  7.639 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.61% ±  2.80%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂[39m [39m [39m [39m [39m█[34m [39m[39m [39m [39m▂[39m [39m [39m▂[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▅[39m▁[39m▁[39m▅[3

In [23]:
function MSE_4(X, query, top_k)
    
    result = sort(MSE_2(X[:, 1:top_k], query))
    n_features, n_examples = size(X)
    
    @inbounds @fastmath for m in top_k:n_examples
        res = zero(eltype(X))
        @simd for j in 1:n_features
            aux = (query[j] .- X[j,m])
            res += aux * aux
        end
        dist = res/n_features
        
        # see if current mse is in the top pile
        if dist < result[end]
            j = top_k 
            #revisamos la lista resultado de izq a derecha
            while dist < result[j-1]
                j = j-1
                if j == 1
                    break
                end
            end            
            result[j+1:end] .= result[j:end-1]
            result[j] = dist
         end
    end
    return result
end


MSE_4 (generic function with 1 method)

In [24]:
@benchmark MSE_4(X_tr_vecs, query, 10) 

BenchmarkTools.Trial: 149 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m30.109 ms[22m[39m … [35m36.762 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m34.136 ms              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m33.736 ms[22m[39m ± [32m 1.437 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂[39m [39m [39m [39m [39m [39m [39m▂[39m [39m [39m [39m [39m [32m▅[39m[39m [39m [34m [39m[39m▃[39m▅[39m▆[39m [39m▃[39m█[39m▂[39m [39m▆[39m [39m [39m▂[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▄[39m▇[39m▁[39m▄[39m█[39m▄[39m

In [29]:
MSE_4(X_tr_vecs, query, 10)

10-element Vector{Float64}:
 423.6640625
 430.3984375
 465.0859375
 509.84375
 513.2578125
 523.515625
 533.1796875
 545.65625
 558.1328125
 561.4140625

In [39]:
X_tr_200k = X_tr_vecs[:,1:200_000]
@benchmark MSE_4(X_tr_200k, query, 10) 

BenchmarkTools.Trial: 735 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m5.780 ms[22m[39m … [35m 11.656 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m6.756 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m6.781 ms[22m[39m ± [32m602.044 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m▁[39m [39m [39m▂[39m [39m [39m [39m▂[39m▁[39m [39m▄[39m▂[39m▅[39m [39m▆[39m▇[39m▆[34m█[39m[39m▃[39m▄[39m█[39m▇[39m▅[39m [39m▅[39m▃[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▃[39m▅[39m█[39m▇[39m▆[39m█[39m█[39

In [44]:
using LoopVectorization

In [53]:
function MSE_5(X, query, top_k)
    
    result = sort(MSE_2(X[:, 1:top_k], query))
    n_features, n_examples = size(X)
    
    for m in top_k:n_examples
        res = zero(eltype(X))
        @turbo  for j in 1:n_features
            aux = (query[j] - X[j,m])
            res += aux * aux
        end
        dist = res/n_features
        
        # see if current mse is in the top pile
        if dist < result[end]
            j = top_k 
            #revisamos la lista resultado de izq a derecha
            while dist < result[j-1]
                j = j-1
                if j == 1
                    break
                end
            end            
            result[j+1:end] .= result[j:end-1]
            result[j] = dist
         end
    end
    return result
end

MSE_5 (generic function with 1 method)

In [54]:
@benchmark MSE_5(X_tr_200k, query, 10) 

BenchmarkTools.Trial: 784 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m5.306 ms[22m[39m … [35m  8.956 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m6.326 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m6.364 ms[22m[39m ± [32m532.248 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m▂[39m▁[39m▇[39m▃[39m█[39m▆[34m▅[39m[39m▇[39m▄[39m▃[39m▂[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▃[39m▄[39m▆[39m▃[39m▃[39m▃[39m▃[39

In [55]:
function jdotavx(a, b)
    s = zero(eltype(a))
    @turbo for i ∈ eachindex(a, b)
        s += a[i] * b[i]
    end
    s
end

jdotavx (generic function with 1 method)

# Distances

In [40]:
using Distances
using BenchmarkTools

In [35]:
query_mat = X_te_vecs[:,1:1];

In [36]:
 x = reshape([0.1, 0.3, -0.1], 3, 1);
pairwise(Euclidean(), x, x)

1×1 Matrix{Float64}:
 0.0

In [37]:
@benchmark R = pairwise(Euclidean(), query_mat, X_te_vecs)

BenchmarkTools.Trial: 5380 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m709.139 μs[22m[39m … [35m 53.472 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 98.09%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m858.063 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m910.383 μs[22m[39m ± [32m764.679 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m1.07% ±  1.34%

  [39m [39m [39m▆[39m█[39m█[39m▇[39m▇[39m▇[39m█[39m▆[39m▆[34m▄[39m[39m▄[39m▂[39m▁[32m▁[39m[39m [39m▁[39m▁[39m▁[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▃[39m█[39m█[

In [38]:
R

LoadError: UndefVarError: R not defined

In [None]:
@benchmark R = pairwise(Euclidean(), query_mat , X_tr_vecs)

In [None]:
?pairwise

In [None]:
using SIMD

function find_val_in_array_simd(x::Array{T}, val::T) where {T}
    n_simd = 64
    last_pos_simd_chunk = length(x)-n_simd
    @inbounds for i in 1:n_simd:last_pos_simd_chunk
        vec_i = vload(Vec{n_simd, T}, x, i)
        sum_equality = sum(vec_i == val)
        if sum_equality >0
            return true
        end
    end

    @inbounds for i in last_pos_simd_chunk:length(x)
        if x[i] == val
            return true
        end
    end

    return false
end