<a href="https://colab.research.google.com/github/danielbauer1979/MSDIA_PredictiveModelingAndMachineLearning/blob/main/GB888_VI_9_AutoEncoderLab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab: A Simple Image Autencoder

In this lab, we will demonstrate a simple auto-encoder in the context of the Fashion MNIST dataset. We opt for simplicity and use single-layer encoders and decoders. Obviouly, we could enhance performance by having deep, convolutional layers in the encoding and decoding steps. For a more detailed example in the context of the MNIST digit dataset, see this [keras blog article](https://blog.keras.io/building-autoencoders-in-keras.html).

## Import Packages and Data

As usually, we import keras functionality:

In [None]:
import numpy as np
import matplotlib.pyplot as plt

import keras
from keras import layers

And let's import the Fashion MNIST data:

In [None]:
from keras.datasets import fashion_mnist
import numpy as np
(x_train, _), (x_test, _) = fashion_mnist.load_data()

And let's normalize and flatten the images (as we use a conventional neural net):

In [None]:
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))
print(x_train.shape)
print(x_test.shape)

## A Simple Autoencoder

### Defining the autoencoder

We will consider a simple autoencoder with 32 neurons in the middle. This is the size of our encoded representations! 32 floats means we have a compression of factor 24.5, since the input is 784 floats.

In [None]:
encoding_dim = 32

Our autoencoder consists of the Input, a single encoder layer using Relu-s, and single decoder layer going back to the input size:

In [None]:
input_img = keras.Input(shape=(784,))
encoded = layers.Dense(encoding_dim, activation='relu')(input_img)
decoded = layers.Dense(784, activation='sigmoid')(encoded)

Our autoencoder combines encoder and decoder:

In [None]:
autoencoder = keras.Model(input_img, decoded)

We also separately defined the encoder: This model maps an input to its encoded representation.

In [None]:
encoder = keras.Model(input_img, encoded)

And let's define our decoder:

In [None]:
encoded_input = keras.Input(shape=(encoding_dim,))
decoder_layer = autoencoder.layers[-1]
decoder = keras.Model(encoded_input, decoder_layer(encoded_input))

### Training our autoencoder

We use binary-crossentropy to assess the similarity between the pixels (recall these are between zero and one). We could also use a regression objective.

In [None]:
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

And let's train the model. Again, the idea is that x is our feature vector *and* our target!

In [None]:
autoencoder.fit(x_train, x_train,
                epochs=50,
                batch_size=256,
                shuffle=True,
                validation_data=(x_test, x_test))

### Evaluating our Autoencoder

Let's evaluate our autoencoder based on the test set.

In [None]:
encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)

Let's do a visual inspection of a few images:

In [None]:
n = 10
plt.figure(figsize=(20, 4))
for i in range(n):
    # Display original
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(x_test[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # Display reconstruction
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded_imgs[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

So, we note that by compressing, we definitely lose some detail. But the autoencoder does capture the basic shape!

One interesting application is that we can evaluate similaries by considering the distance in compressed space. Let's check how similar the two pants are (third and fourth image):

In [None]:
first_image_encoded = encoded_imgs[2]
second_image_encoded = encoded_imgs[3]

distance = np.linalg.norm(first_image_encoded - second_image_encoded)

print(f"The Euclidean distance between the images is: {distance}")

Let's compare this to the difference between the shoe and the sweater in the first two images:

In [None]:
first_image_encoded = encoded_imgs[0]
second_image_encoded = encoded_imgs[1]

distance = np.linalg.norm(first_image_encoded - second_image_encoded)

print(f"The Euclidean distance between the images is: {distance}")

Much larger.

How about the sweater and the jacket:

In [None]:
first_image_encoded = encoded_imgs[1]
second_image_encoded = encoded_imgs[4]

distance = np.linalg.norm(first_image_encoded - second_image_encoded)

print(f"The Euclidean distance between the images is: {distance}")

In the middle. You get the idea: The vector gives a numerical representation of the images in 32 dimensional space. And more similar images are closer than more different images!