# Usage

This example of `BuzzHash` usage requires packages [`MNIST`](https://github.com/johnmyleswhite/MNIST.jl), [`GR`](https://github.com/jheinen/GR.jl), and [`Plots`](https://github.com/JuliaPlots/Plots.jl). It was originally implemented in `Julia 0.6.0` and is in process of upgrading to `Julia 1.9.2`. It uses the [MNIST handwritten digits](http://yann.lecun.com/exdb/mnist/) which are included in the [MLDatasets](https://docs.juliahub.com/MLDatasets/9CUQK/0.5.3/datasets/MNIST/) package.

In [1]:
using MLDatasets.MNIST
using Plots
gr() # use the GR back end for plots
include("../src/BuzzHash.jl")

The `MNIST` data consists of 60,000 training images with associated labels, `0, ..., 9`, and 10,000 test images with associated labels. These can be accessed with functions `trainfeatures(i)`, `trainlabel(i)`, `testfeatures(i)`, and `testlabel(i)` from the `MNIST` package. Below we pick 4 training images at random and determine their labels.

In [2]:
srand(1234); # set the RNG seed to 1234
indices = rand(1:60000,4);
for i in indices println(i,"\t",trainlabel(i)) end

8429	2.0
35352	7.0
18860	9.0
14356	2.0


To display the associated images we use function `heatmap` from the `Plots` package (because it's easy.) Function `trainfeatures(i)` returns a 784 long vector which we reshape to a 28x28 matrix. Note that the reshaped indices must be adjusted to make the digits appear in their proper orientation. The necessary adjustments will vary with the `Plots` back end (e.g., `pyplot()` vs `gr()`,) and whether a notebook or REPL is used. I have no idea why.

In [3]:
x1 = trainfeatures(indices[1])
x2 = trainfeatures(indices[2])
x3 = trainfeatures(indices[3])
x4 = trainfeatures(indices[4])

plot(
    heatmap(reshape(x1,28,28)[:,28:-1:1],flip=true,color=:blues, title="2"),
    heatmap(reshape(x2,28,28)[:,28:-1:1],flip=true,color=:blues, title="7"),
    heatmap(reshape(x3,28,28)[:,28:-1:1],flip=true,color=:blues, title="9"),
    heatmap(reshape(x4,28,28)[:,28:-1:1],flip=true,color=:blues, title="2"),
    layout = @layout [a b ; c d]
)

We create a random, sparse, binary matrix which expands the 784 pixel data by a factor of 9. We use a 12% sparsity, the same as that of the fly's olfactory system (6/50). 

In [4]:
A = sprand_fd(784*9, 784, 0.12);

We use `A` to hash the 4 data samples, zeroizing all the largest 5% (353) and (by default) setting the remaining entries to 1.0

In [5]:
h1 = buzzhash(A,x1,353)
h2 = buzzhash(A,x2,353)
h3 = buzzhash(A,x3,353)
h4 = buzzhash(A,x4,353);

We display the data (blue) and their hashes (green) side by side. 

In [6]:
plot(
    heatmap(reshape(x1,28,28)[:,28:-1:1],flip=true,color=:blues, title="2"),
    heatmap(reshape(h1,28*3,28*3),color=:greens, title="hash of 2"),
    heatmap(reshape(x2,28,28)[:,28:-1:1],flip=true,color=:blues, title="7"),
    heatmap(reshape(h2,28*3,28*3),color=:greens, title="hash of 7"),
    layout = @layout [a b ; c d]
)

In [7]:
plot(
    heatmap(reshape(x3,28,28)[:,28:-1:1],flip=true,color=:blues, title="9"),
    heatmap(reshape(h3,28*3,28*3),color=:greens, title="hash of 9"),
    heatmap(reshape(x4,28,28)[:,28:-1:1],flip=true,color=:blues, title="2"),
    heatmap(reshape(h4,28*3,28*3),color=:greens, title="hash of 2"),
    layout = @layout [a b ; c d]
)

Of course the hashes, or ensemble codes, are not visually instructive but the main reference proves they are locality sensitive, i.e., they preserve similarity in a well-defined formal sense.

Having a binary hash is computationally efficient for comparison purposes. The binary default may be overridden by letting the optional parameter, `clip`, be `false`.