In [None]:
##adpated from https://keras.io/examples/generative/vae/

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import Sequential
from matplotlib import pyplot as plt

# Week 4.1 Autoencoders

For our generative models, we want something that will

1. Compress our original **high dimensional data** into a lower dimensional ``representation`` or ``latent`` vector


2. Use this ``latent vector`` to generate **new images** that are plausibly from the original set, but also intersting in their variety. 


To do this, we will build **two neural networks** to do each task, and then train them both **at the same time**!.

These neural networks will use very similar structures to those we've seen before (``Dense`` layers and ``Convolution`` operations)


### The Encoder 

Using convolution laters, this first model will take input image (first we'll be working with some ``28x28`` grayscale pictures), and output a ``2 dimensional vector``. 

This will place the image somewhere in latent space!


In [None]:
image_dims = (28, 28, 1)

In [None]:
latent_dim = 2
encoder = Sequential([
    layers.Input(shape = image_dims),
    layers.Conv2D(32, 3, activation="relu", strides=2, padding="same"),
    layers.Conv2D(64, 3, activation="relu", strides=2, padding="same"),
    layers.Flatten(),
    layers.Dense(16, activation="relu"),
    ##Output is two numbers
    layers.Dense(latent_dim)
])

In [None]:
encoder.summary()

### The Decoder 

The second model will take a ``2 dimensional vector`` and convert this back into a ``28x28`` black and white image!

The structure of the ``Encoder`` and ``Decoder`` don't have to match, however....


* The input of the ``Encoder`` and the output of the ``Decoder`` **must have the same shape**


* The input of the ``Decoder`` and the output of the ``Encoder`` **must have the same shape**

In [None]:
image_dims = (28, 28, 1)

In [None]:
quarter = int(image_dims[0]/4)

In [None]:
decoder = Sequential([
    keras.Input(shape=(latent_dim,)),
    layers.Dense(quarter * quarter * 64, activation="relu"),
    layers.Reshape((quarter, quarter, 64)),
    layers.Conv2DTranspose(64, 3, activation="relu", strides=2, padding="same"),
    layers.Conv2DTranspose(32, 3, activation="relu", strides=2, padding="same"),
    layers.Conv2DTranspose(1, 3, activation="sigmoid", padding="same")
])

In [None]:
decoder.summary()

### Putting it together 

We combine these two models into a final ``Sequential``.

In [None]:
auto_encoder = Sequential([encoder, decoder])

### MNIST

We're first going to try and learn from a dataset consisting of **hand written digits**. This is called [MNIST](http://yann.lecun.com/exdb/mnist/) and has been a mainstay of computer vision research for decades. 

We can load this directly from ``Keras`` and it has **70,000** examples in all. 

In [None]:
##Load in images
(x_train, _), (x_test, _) = keras.datasets.mnist.load_data()
##Join test and train sets together
mnist_digits = np.concatenate([x_train, x_test], axis=0)
##Normalise down to 0-1
mnist_digits = np.expand_dims(mnist_digits, -1).astype("float32") / 255

In [None]:
mnist_digits.shape

### Custom Loss Functions

So **how do we train these models?**

Our broader goal in terms of generative modelling is to have something that can **generate new images similar to the original set**.

In practical terms, if all is working well, the model will be able to **accurately reconstruct** the input image when it comes out of the ``decoder``. 

This means the dataset we use is **the same for both input and output**.

We define our **loss function** as the **difference** between the ``original image`` and ``decoded image``.

Since this isn't built into ``Keras``, we define our own ``custom function``. It is called every time a **forward pass** has been executed containing the output of the model, and the expected output from the dataset. 

We use this to calculate the loss (however we choose fit!), and return a number from this function. This is then used by ``Keras`` to update the weights. 

In [None]:
#Define our custom function 
def reconstruction_loss(original, decoded):
    difference = tf.reduce_mean(
                tf.reduce_sum(
                    keras.losses.mean_squared_error(original, decoded), axis=(1, 2)
                )
            )
    return difference

In [None]:
#Compile the model (giving custom loss function)
auto_encoder.compile(optimizer=keras.optimizers.Adam(), loss = reconstruction_loss)

In [None]:
#Train (x and y is the **same data**)
auto_encoder.fit(x = mnist_digits, y = mnist_digits, epochs=30, batch_size=128)

## Exploring the Latent Space 

So we've trained the model. Lets generate some digits!

First we'll pick a random point (``z``), and use the ``decoder`` to generate image

In [None]:
#Generate random point
z = (np.random.random((1,2))*10)-5
z

In [None]:
#Decode image
x_decoded = decoder.predict(z)
digit = x_decoded[0].reshape(image_dims[0], image_dims[1])
plt.imshow(digit,cmap="Greys_r")

### All of Latent space!

Now lets see what that looks like in a grid. 

We can see how things that are near to each other in latent space have similar characteristics!

The learning process has found a representation that has organised the numbers based on some of the underlying features they possess!

In an ideal world, all of our digit classes would be equally represented so when we sample it to generate new images, we get an accurate representation of the original set. 

In [None]:
#Import functions from cci_autoencoders.py
from cci_autoencoders import plot_label_clusters
from cci_autoencoders import plot_latent_space

In [None]:
plot_latent_space(decoder)

### Plotting the original dataset

Now lets look at the original dataset. We can use the ``encoder`` to take each image (with the colour representing which digit it is) and plot them in latent space. 

In an ideal world, these would be well spread across the space. However, we can see that the area for 0's and 1's is much bigger and more separated than other digits. 

In [None]:
(x_train, y_train), _ = keras.datasets.mnist.load_data()
x_train = np.expand_dims(x_train, -1).astype("float32") / 255
# display a 2D plot of the digit classes in the latent space
z_mean = encoder.predict(x_train)
plt.figure(figsize=(12, 10))
plt.scatter(z_mean[:, 0], z_mean[:, 1], c=y_train)
plt.colorbar()
plt.xlabel("z[0]")
plt.ylabel("z[1]")
plt.show()  

## Variational Autoenconders

There are infact a couple of tweaks to the standard autoencoder setup that allow for both **better quality images** and a **more interesting** and **spread out** latent space. In reality, **Variational Autoencoders** are what anyone would actually use for any practical purposes. 

We've already learned so much new stuff (like total champs), so we won't cover this in much detail now. The main intuition to take is that instead of encoding and decoding each image as **a single point** in latent space, we sample from a **normal distribution** around a given point in latent space (see picture). 


This means the model is slightly different, and we have to account for some extra metrics into the **loss function**. 


In [None]:
import cci_autoencoders
from tensorflow import keras

In [None]:
from cci_autoencoders import init_VAE

In [None]:
vae = init_VAE(latent_dim=2)

In [None]:
vae.compile(optimizer=keras.optimizers.Adam())
vae.fit(mnist_digits, epochs=30, batch_size=128)

In [None]:
import matplotlib.pyplot as plt

In [None]:
plot_latent_space(vae.decoder, scale = (-5,3))

In [None]:
(x_train, y_train), _ = keras.datasets.mnist.load_data()
x_train = np.expand_dims(x_train, -1).astype("float32") / 255
z_mean, _, _ = vae.encoder.predict(x_train)
plt.figure(figsize=(12, 10))
plt.scatter(z_mean[:, 0], z_mean[:, 1], c=y_train)
plt.colorbar()
plt.xlabel("z[0]")
plt.ylabel("z[1]")
plt.show()