# What is a neural network?

_"...a computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs."_

In "Neural Network Primer: Part I" by Maureen Caudill, AI Expert, Feb. 1989


## Here is an example of a simple Neural Network (NN):

![ann](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e4/Artificial_neural_network.svg/269px-Artificial_neural_network.svg.png)

## What are these processing units (artificial neurons)?

An artificial neuron is a mathematical function conceived as a model of biological neurons. The artificial neuron receives one or more inputs (_features_ $X_0$, $X_1$, $X_2$ ...), sums them with different weights ($W_0$, $W_1$, $W_2$ ...) and uses this sum as an argument for a nonlinear function ( $f$ also called _activation function_). 

$X_0$ typically equals to 1, and it is called _bias_. 
The weights of each of the neurons are determined when we train the neural network. Initially, these weights are randomly initialized.

# What are these nonlinear activation functions $f$ ?

In [None]:
# We use numpy to do math
import numpy as np

# We use matplotlib to plot graphics
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

# We use pickle to load data
import pickle

In [None]:
!pip3 install keras==2.1.3

In [None]:
# This cell won't run without you making some changes

# Here is a definition of Rectified linear unit (ReLU) function
def relu(x):
    return np.maximum(x, 0, x)

# Here is a definition of Sigmoid function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# We generate X = [-10, ... 10]
X=np.linspace(-10,10, 100)

# We calculate the output of activation functions
relu_Y=relu(X.copy())
sigmoid_Y=sigmoid(X.copy())

# One of the activation functions is a hyperbolic tangent function
# This fuction can be found in every mathimatical package, try to use numpy to calculate the tanh values of X 

tanh_Y = np.tanh(X.copy())# <---- calculate tanh of X here

In [None]:
# We plot these functions
plt.figure(figsize=(20,6))

plt.subplot(1,3,1)
plt.title('Tanh')
plt.plot(X,tanh_Y)
plt.grid(True)

plt.subplot(1,3,2)
plt.title("Rectified linear unit (ReLU)")
plt.plot(X,relu_Y)
plt.grid(True)

plt.subplot(1,3,3)
plt.title("Sigmoid")
plt.plot(X,sigmoid_Y)
plt.grid(True)

# Let's go through a demo of a NN:
Execute the following cell to run the demo. There is a play button to train the NN.

In [None]:
from IPython.display import IFrame; 
IFrame('http://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=4,2&seed=0.85093&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false&learningRate_hide=true&regularizationRate_hide=true&percTrainData_hide=true&noise_hide=true&regularization_hide=true&problem_hide=true',width=1020, height=800)

# How to deal with images?

Mathematically images can be represented as matrices of data. An image consists of pixels and each element of matrix stores an information about one pixel. 

## RGB channels

For human vision it's enough to have three colors to represent the visible spectrum: Red (R), Green (G) and Blue (B) — also called _channels_.
Intensity of each color can be represented either by 1 bit [0-1] or 8 bits [0-255]. Different colors are the result of mixing R, G and B channels with different intensities.

Some images can have more than 3 channels - for example satellites take pictures in Infrared radiation (IR) spectrum and have additional IR, Near IR channels.

In [None]:
img=matplotlib.image.imread('rgb.png')

plt.figure(figsize=(20,6));
plt.subplot(1,4,1);
plt.imshow(img)

plt.subplot(1,4,2);
plt.imshow(img[...,0], cmap='Reds')

plt.subplot(1,4,3);
plt.imshow(img[...,1], cmap='Greens')

plt.subplot(1,4,4);
plt.imshow(img[...,2], cmap='Blues')

# How do artificial neurons deal with pixels?

In the demo above we were using a 2D dataset and initially we used features such as coordinates of the points. 

For images we initially have intensities of 3 channels, and instead of fully connected neurons we will use neurons based on mathematical operation called _convolution_.

# Convolutions

For the image case a convolution can be simplified as a simple sliding operaton of a matrix (called _kernel_ or _filter_) over with the values of the image channel (another matrix).
Let's imagine we have a kernel (in red) and pixel values (in green):

\begin{align*}\color{red}{\begin{bmatrix}
1 & 0 & 1\\
0 & 1 & 0\\
1 & 0 & 1\\
\end{bmatrix}} * 
\color{green}{\begin{bmatrix}
1 & 1 & 1\\
0 & 1 & 1\\
0 & 0 & 1\\
\end{bmatrix}} =
\end{align*}
\begin{align*}
=1\cdot 1 + 0\cdot 1 + 1\cdot 1 + 0\cdot 0 + 1\cdot 1+ 0\cdot 1 + 0\cdot 1 + 0\cdot 0 + 1\cdot 1 = 4
\end{align*}
The resulted feature is smaller than the original image, so in order to keep the same size, we can pad the original images with 0 from every side. This is called _padding_.


# Now we slide this kernel/filter along the image
Below you can see an example of multiplying a kernel  (red numbers in the corners of the yellow matrix) with some values of the image (green). This results in a new matrix (red) which we will use as a new feature in our NN. The sliding window step is called a _stride_.
![Convolution](http://deeplearning.stanford.edu/wiki/images/6/6c/Convolution_schematic.gif)
[Source](http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution)

# An example of an edge detector convolution

In [None]:
# We use scipy to do convolutions
from scipy import signal

In [None]:
img=matplotlib.image.imread('cat.jpg')

# A special kernel used for edge detection
k = [
    [-1, -1, -1],
    [-1,  8, -1],
    [-1, -1, -1]
    ]

plt.figure(figsize=(10,6))

# We plot a red channel of the original image
plt.title('original image')
plt.axis('off')
plt.imshow(img[...,0],cmap='gray')

In [None]:
# We calculate a new feature appying the kernel k to the red channel
new_feature = signal.convolve2d(img[...,0], k, boundary='symm', mode='same')

# We plot the new feature
plt.figure(figsize=(10,6))
plt.title('edge feature image')
plt.axis('off')
plt.imshow(new_feature,cmap='gray')

# Play around with different convolutions

Check the wikipedia Kernel page [https://en.wikipedia.org/wiki/Kernel_(image_processing)](https://bit.ly/2yfaapD) to find different convolutions used in image processing. Try yourself how they work.

In [None]:
# Your code goes here

# Max pooling

Another mathematical operation used in NN is _Max pooling_. Basically, we select a largest number from a part of the matrix. This is used to reduce the size of the image.

![Max pooling](https://upload.wikimedia.org/wikipedia/commons/e/e9/Max_pooling.png)
[Source](https://en.wikipedia.org/wiki/Convolutional_neural_network)

# Training the neural network

It's important to understand that when we define the neural network architecture we only code how many layers we want it to have (number of hidden layers), what type of layer it is going to be (convolutional, max pooling, drop out, ...). We don't define the kernels/filters that will be used. Those filters are learned by NN during the training process.

# Simple Neural Network
Let's see how a neural network generate features for the simple black and white image. Please execute the following cell to run the demo or follow the link [https://transcranial.github.io/keras-js/#/mnist-cnn](https://transcranial.github.io/keras-js/#/mnist-cnn)

In [None]:
from IPython.display import IFrame
IFrame('https://transcranial.github.io/keras-js/#/mnist-cnn', width=900, height=3100)

# MNIST and Fashion MNIST

_MNIST_ is a large database of handwritten digits that is commonly used for training various image processing systems.

![mnist](https://upload.wikimedia.org/wikipedia/commons/2/27/MnistExamples.png)

It was used to train the second demo, that was shown above. For this workshop we will use another public dataset [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist), that has similar 28 pixels x 28 pixels grayscale images of 10 clases.

![fashion_mnist](https://4.bp.blogspot.com/-OQZGt_5WqDo/Wa_Dfa4U15I/AAAAAAAAAUI/veRmAmUUKFA19dVw6XCOV2YLO6n-y_omwCLcBGAs/s1600/out.jpg)

In [None]:
# We use Fashion mnist dataset
from keras.datasets import fashion_mnist

# We download and load the data
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

In [None]:
print("Train dataset size {0}".format(x_train.shape))
print("Test dataset size {0}".format(x_test.shape))

We have 60000 train images and 10000 test images.

In [None]:
# Let's take 4 random images, by generating 4 random indices and reshaping it
random_images=x_train[np.random.choice(60000,4)].reshape(4,28,28)

plt.figure(figsize=(20,6))
plt.subplot(1,4,1)
plt.imshow(random_images[0], cmap=plt.get_cmap('gray'))

plt.subplot(1,4,2)
plt.imshow(random_images[1], cmap=plt.get_cmap('gray'))

plt.subplot(1,4,3)
plt.imshow(random_images[2], cmap=plt.get_cmap('gray'))

plt.subplot(1,4,4)
plt.imshow(random_images[3], cmap=plt.get_cmap('gray'))

# You can explore the Fashion dataset yourself
Try to show some random images from the dataset.

In [None]:
# Your code goes here


# Let's make a neural network image search engine

Our goal is to build a simple image search engine, that will find images similar to the requested one.
We will have the training part of the Fashion MNIST as our dataset of known images. Then we will take random images from the test dataset, which our image search have never seen before, to find similar images in the train dataset.

```
Images in the train dataset: 👚 👞 👖 👜 ...
Images in the test dataset: 👢 👙 👕 👟 ...

User wants to find an image most similar to 👟. 
The search engine will return 👞.
```

Under the hood of our search engine we will have a neural network that will take an image as input and give a vector as an output (it will encode the image into a vector). Something like this:

\begin{align*}
NN_{encoder}(👚) = [3.5, 1.2, 4.3, ... 1.1, -0.9]\\
NN_{encoder}(👞) = [0.4, -1.3, 5.6, ... 2.0, 7.5]\\
NN_{encoder}(👕) = [1.2, -0.3, 1.1, ... -0.4, 1.3]\\
NN_{encoder}(👟) = [0.3, -1.1, 4.1, ... 1.1, 6.3]
\end{align*}

So basically, we will have a short numeric representation of the image.

To find similar images we will compare two output vectors **q** and **p** and those vectors that will be the closest to each other (having the minimal _euclidean distance_) will represent the most similar images. 

\begin{align} 
distance_{euclidean}(\mathbf{q},\mathbf{p}) = d(\mathbf{q},\mathbf{p}) & = \sqrt{(q_1-p_1)^2 + (q_2-p_2)^2 + \cdots + (q_n-p_n)^2} = \sqrt{\sum_{i=1}^n (q_i-p_i)^2}\end{align}

For example, if we compare 👚 and 👟 we got a distance of 95.7:

\begin{align*}
d(NN_{encoder}(👚), NN_{encoder}(👟)) = \sqrt{(3.5-0.3)^2 + (1.2+1.1)^2 + \cdots + (-0.9-6.3)^2} = 95.7 
\end{align*}

But if we compare 👞 and 👟 we will get much closer distance 12.6:

\begin{align*}
d(NN_{encoder}(👞), NN_{encoder}(👟)) = \sqrt{(0.4-0.3)^2 + (-1.3-1.1)^2 + \cdots + (7.5-6.3)^2} = 12.6 
\end{align*}


The output vectors of 👞 and 👟 are still different, but they are much closer than 👚 and 👟. So the search engine will return the 👞 for the request 👟 .

# The goal of our NN is to calculate these vectors. How can we do that?

For these purpose we will build two NN.

* Firstly, we will use an auto-encoder NN architecture. It tries to encode-decode the images. It takes an image as input and tries to encode it to some $\color{cyan}{\text{vector}}$ (that's the $\color{LightSteelBlue}{\text{encoding part}}$), and then it tries to reconstruct the image from the vector (that's the $\color{LightPink}{\text{decoding part}}$).

   ![autoencoder](https://skymind.ai/images/wiki/deep_autoencoder.png)

   In our pseudomath notation, this can be represented as:

\begin{align}
&NN_{autoencoder}(👚) = 👚\\ 
\\
&👚 \xrightarrow{encoding} [0.4, -1.3, 5.6, \ldots 2.0, 7.5] \xrightarrow{decoding} 👚 \\
\\
\\
&NN_{autoencoder}(👞) = 👞\\ 
\\
&👞 \xrightarrow{encoding} [3.5, 1.2, 4.3, \ldots 1.1, -0.9] \xrightarrow{decoding} 👞
\end{align}

------

* Secondly, we will use only the $\color{LightSteelBlue}{\text{encoding part}}$ of our $NN_{autoencoder}(X)$ as our $NN_{encoder}(X)$ for the search engine, deleting the decoder part:


\begin{align*}
NN_{encoder}(👚) = [0.4, -1.3, 5.6, \ldots 2.0, 7.5]\\
NN_{encoder}(👞) = [3.5, 1.2, 4.3, \ldots 1.1, -0.9]\\
\end{align*}

In [None]:
# We use Keras framework to build Neural networks
import keras
from keras import backend as K

from keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model

from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot

# Let's code the encoder part


\begin{align}
👚 \xrightarrow{encoding} [128\text{-dimentional vector}]
\end{align}

In [None]:
# This cell won't run without you making some changes

input_img = Input(shape=(28, 28, 1)) # This will be the input 👚

x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded_feature_vector = MaxPooling2D((2, 2), padding='same', name='')(x) #  <---- name your feature vector somehow

# at this point the representation is (4, 4, 8) i.e. 128-dimensional compressed feature vector 

# Let's code the decoder part


\begin{align}
[128\text{-dimentional vector}] \xrightarrow{decoding} 👚
\end{align}

In [None]:
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded_feature_vector)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded_output = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

# Let's combine encoder and decoder parts into an autoencoder model

\begin{align}
&NN_{autoencoder}(👚) = 👚\\ 
\\
&👚 \xrightarrow{encoding} [128\text{-dimentional vector}] \xrightarrow{decoding} 👚 \\
\\
\\
&NN_{autoencoder}(👞) = 👞\\ 
\\
&👞 \xrightarrow{encoding} [128\text{-dimentional vector}] \xrightarrow{decoding} 👞
\end{align}

In [None]:
# The first model is autoencoder model, it takes the input image and results in a decoded image
autoencoder_model = Model(input_img, decoded_output)
# Compile the first model
autoencoder_model.compile(optimizer='adadelta', loss='binary_crossentropy')

In [None]:
SVG(model_to_dot(autoencoder_model, show_shapes=True, show_layer_names=True).create(prog='dot', format='svg'))

# Let's take only the first part for the encoder model


\begin{align}
&NN_{encoder}(👚) = [128\text{-dimentional vector}]\\ 
\\
&👚 \xrightarrow{encoding} [128\text{-dimentional vector}]\\
\\
\\
&NN_{encoder}(👞) = [128\text{-dimentional vector}]\\ 
\\
&👞 \xrightarrow{encoding} [128\text{-dimentional vector}]
\end{align}

In [None]:
# The second NN model is only a half of the first model, it take the input image and gives the encoded vector as output
encoder_model = Model(inputs=autoencoder_model.input,
                                 outputs=autoencoder_model.get_layer('').output) # <---- take the output from the feature vector
# Compile the second model
encoder_model.compile(optimizer='adadelta', loss='binary_crossentropy')

In [None]:
SVG(model_to_dot(encoder_model, show_shapes=True, show_layer_names=True).create(prog='dot', format='svg'))

In [None]:
# This cell won't run without you making some changes

# We need to scale the image from [0-255] to [0-1] for better performance of activation functions
# Please normalize the datasets

x_train = x_train / 255.
x_test = x_test / 255.

In [None]:
# We train the NN in batches (groups of images), so we reshape the dataset

x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))

print("Train dataset size is {0}".format(x_train.shape))
print("Test dataset size is {0}".format(x_test.shape))

# Let's train the neural network

Here we will train our NN. It will learn the optimal weights used in the convolutional layers by itself.
Both `x` and `y` in the `fit` function equal to `x_train`, because our autoencoder wants to have the output equal to input $NN_{autoencoder}(👚) = 👚$

In [None]:
# It takes 10 minutes to train this neural network 
# If you want to skip the training process, please load the weights directly in the next cell

learning_history=autoencoder_model.fit(x=x_train, y=x_train, epochs=10, batch_size=128, 
                                 shuffle=True, validation_data=(x_test, x_test), verbose=1)

In [None]:
autoencoder_model.load_weights('autoencoder_0.2925.weights')

# Let's see how it performs

In our pseudomath notation, this can be represented as:

\begin{align}
&NN_{autoencoder}(👚) = 👚\\ 
\\
&👚 \xrightarrow{encoding} [128\text{-dimentional vector}] \xrightarrow{decoding} 👚 \\
\\
\\
&NN_{autoencoder}(👞) = 👞\\ 
\\
&👞 \xrightarrow{encoding} [128\text{-dimentional vector}] \xrightarrow{decoding} 👞
\end{align}

We will try to encode-decode `x_test` part of the dataset, that NN has never seen before.

In [None]:
encoded_decoded_image=autoencoder_model.predict(x_test)

In [None]:
# we take 5 consecutive images from the test dataset starting from a random index
random_index=np.random.randint(0,10000)

for i,r in zip(x_test[random_index:random_index+5], encoded_decoded_image[random_index:random_index+5]):
    plt.figure()
    ax=plt.subplot(1,2,1)
    ax.set_title("Original:")
    ax.imshow(np.squeeze(i), cmap=plt.get_cmap('gray'))
    
    ax=plt.subplot(1,2,2)
    ax.set_title("Encoded-Decoded:")
    ax.imshow(np.squeeze(r), cmap=plt.get_cmap('gray'))
    plt.show()

## Time to encode all the images in the train dataset
 
\begin{align*}
&NN_{encoder}(👚) = [128\text{-dimentional vector}]\\ 
&NN_{encoder}(👞) = [128\text{-dimentional vector}]\\ 
&NN_{encoder}(👖) = [128\text{-dimentional vector}]\\ 
&NN_{encoder}(👜) = [128\text{-dimentional vector}]\\ 
&NN_{encoder}(👠) = [128\text{-dimentional vector}]\\ 
&\cdots
\end{align*}

Later we will compare a vector of the requested image with these encoded vectors, to find similar images.

In [None]:
x_train_encoded = encoder_model.predict(x_train)
print("Shape of the encoded dataset: {0}".format(x_train_encoded.shape))

## Let's flatten the dataset
We flatten (4, 4, 8) to a flat 128-dimensional vector

In [None]:
x_train_encoded_128=x_train_encoded.reshape(x_train_encoded.shape[0],128)
print("Shape of the encoded dataset: {0}".format(x_train_encoded_128.shape))

## Plotting 128 dimentional vector is difficult, let's reduce it to 2D and 3D

We will use t-SNE algorithm to reduce the dimentions:

\begin{align}
&[128\text{-dimentional vector}] \xrightarrow{t-SNE} [x, y, z]\\
&[128\text{-dimentional vector}] \xrightarrow{t-SNE} [x, y]
\end{align}

In [None]:
!pip install -U --no-deps git+https://github.com/DmitryUlyanov/Multicore-TSNE.git#egg=MulticoreTSNE

In [None]:
# We use t-SNE to shrink 128 dimentional vector in 2D or 3D
from MulticoreTSNE import MulticoreTSNE as TSNE

# We use plotly to plot 3D
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.graph_objs as go
init_notebook_mode(connected=True)

### Reducing to 3D

In [None]:
tsne = TSNE(n_components=3, verbose=1000, n_jobs=2)

# Warning, it will take more than 30 min to run the following line in Azure cloud notebook
#transformation_3d = tsne.fit_transform(x_train_encoded_128[...,:1000])

# We load 3 dimentional t-SNE from the pickle file
transformation_3d = pickle.load(open("transformation_3d.p", "rb"))

In [None]:
print("Shape of the encoded dataset: {0}".format(transformation_3d.shape))

### Ploting 3D cloud
Every dot is an encoded image:
\begin{align}
👚 \xrightarrow{NN_{encoder}} [128-\text{dimentional vector}] \xrightarrow{t-SNE} [x, y, z]
\end{align}
The color of the dot represents the class the image belongs to.

In [None]:
x, y, z = np.rollaxis(transformation_3d, 1, 0)[...,:1000]
trace1 = go.Scatter3d(
    x=x,
    y=y,
    z=z,
    mode='markers',
    marker=dict(
        size=5,
        color=plt.cm.tab10(y_train[:1000]),
        opacity=0.8
    )
)


data = [trace1]
layout = go.Layout(
    margin=dict(
        l=0,
        r=0,
        b=0,
        t=0
    )
)
fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
%%time 
# Try to reduce the first 1000 examples of 128 dimentional vector to 2d vector
# It will take about 3 min 20 sec to do this task on Azure cloud notebook
# Don't run tsne on all 60000 samples, since it will take up to 20 min

tsne = TSNE(n_components=2, verbose=1000, n_jobs=2)
transformation_2d = tsne.fit_transform(x_train_encoded_128[...,:1000])

In [None]:
transformation_2d = pickle.load(open("transformation_2d.p", "rb"))

In [None]:
print("Shape of the encoded dataset: {0}".format(transformation_2d.shape))

### Ploting 2D cloud
Every dot is an encoded image:
\begin{align}
👚 \xrightarrow{NN_{encoder}} [128-\text{dimentional vector}] \xrightarrow{t-SNE} [x, y]
\end{align}
The color of the dot represents the class the image belongs to.

In [None]:
plt.figure(figsize=(16,10))
plt.scatter(transformation_2d[:1000,0], transformation_2d[:1000,1], marker='.', c=plt.cm.tab10(y_train))

# Time to build a search engine

At this point we have all the images encoded to their 128-dimentional representations:
\begin{align*}
&NN_{encoder}(👚) = [128\text{-dimentional vector}]\\ 
&NN_{encoder}(👞) = [128\text{-dimentional vector}]\\ 
&NN_{encoder}(👖) = [128\text{-dimentional vector}]\\ 
&NN_{encoder}(👜) = [128\text{-dimentional vector}]\\ 
&NN_{encoder}(👠) = [128\text{-dimentional vector}]\\ 
&\cdots
\end{align*}

Every time we have a new image 👟
want to find a similar image, we will encode it to the $[128\text{-dimentional vector}]$ and compare it to the exsisting dataset. This search and comparison will be done automatically by python `KDTree`. It calculates the Euclidian distances and return the index of the closest vector.

In [None]:
from scipy import spatial

# We create KDTree and put all the exsisting vectors in the tree
tree = spatial.KDTree(x_train_encoded_128)

# Now we encode the test dataset

\begin{align*}
&NN_{encoder}(👢 ) = [128\text{-dimentional vector}]\\ 
&NN_{encoder}(👙) = [128\text{-dimentional vector}]\\ 
&NN_{encoder}(👕) = [128\text{-dimentional vector}]\\ 
&NN_{encoder}(👟) = [128\text{-dimentional vector}]\\ 
&NN_{encoder}(💼) = [128\text{-dimentional vector}]\\ 
&\cdots\\
\end{align*}

\begin{align*}
NN_{encoder}(test) \rightarrow [\text{test_result128}]
\end{align*}

In [None]:
# Now we encode all the the test dataset
test_result = encoder_model.predict(x_test)
# Flatten it to 128 dimentions
test_result128=test_result.reshape(10000,128)

# The search process
Let's take the random index from the test dataset and pull 5 consecutive images from there (👢 👙 👕 👟 💼).

Simultaniously, we will pull 5 vectors which encode these images from test_result128.

Using KDTree we will find the closest vector from the train dataset for each of the requested vectors.
KDTree will return the index of the image from the train dataset, which has the closest vector to the requested one.

In [None]:
# Taking random index
random_index=np.random.randint(0,9995)
for i,f in zip(x_test[random_index:random_index+5], test_result128[random_index:random_index+5]):
    # KDTree returns euclidian distance and the index from the train dataset
    distance, index = tree.query(f)
    plt.figure()
    plt.subplot(1,2,1)
    plt.title("Requested:")
    plt.imshow(np.squeeze(i), cmap=plt.get_cmap('gray'))
    
    plt.subplot(1,2,2)
    plt.title("Found:")
    plt.imshow(np.squeeze(x_train[index]), cmap=plt.get_cmap('gray'))
    plt.show()
    print("Distance between two 128-dimentional vectors: %f\n" % distance)

# Advanced version
For advanced version you can try to DIY encoding of any image with excisting Deep Neural Network.
To classify images Deep Neural Networks like VGG16, has many layers:

### We can use the output of the fully connected layer of the pretrained NN as a 4096 dimentional feature vector


In [None]:
# do some imports
from keras import applications
from keras.models import Sequential
from keras.layers import Dropout, Flatten, Dense
from keras.preprocessing import image
from keras.models import Model

In [None]:
# download the model (might take time, since we donwload ~500Mb)
base_model = applications.VGG19(weights='imagenet')
model = Model(input=base_model.input, output=base_model.get_layer('fc1').output)

In [None]:
img = image.load_img('fashion.png', target_size=(224, 224))
img = image.img_to_array(img)
img = np.expand_dims(img, axis=0)

In [None]:
features = model.predict(img)

print("Feature vector shape: {0}".format(features.shape))