From (https://blog.keras.io/building-autoencoders-in-keras.html) By [Francois Chollet](https://twitter.com/fchollet)
# Autoencoders


![autoencoder](https://www.compthree.com/images/blog/ae/ae.png)

Usually, they are used for extracting latent representations with a lower dimensionality than the input data -> data compression 

## Case: MNIST handwritten digits

### Dataset: 

Train images: 60,000

Test images: 10,000

Image size: 28x28 pixels

![dataset](https://www.researchgate.net/profile/Steven_Young11/publication/306056875/figure/fig1/AS:393921575309346@1470929630835/Example-images-from-the-MNIST-dataset_W640.jpg)
![image of a digit](https://3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com/wp-content/uploads/2016/05/Examples-from-the-MNIST-dataset.png)




In [None]:
# Import MNIST dataset

from keras.datasets import mnist
import numpy as np

(x_train, y_train), (x_test, y_test) = mnist.load_data()

# labels
print(y_train)

# data
print(x_train.shape)
print(x_test.shape)

In [None]:
# show images
import matplotlib.pyplot as plt

def see_images(dataset, n):
    plt.figure(figsize=(20, 4))
    for i in range(n):
        ax = plt.subplot(2, n, i + 1)
        plt.imshow(dataset[i].reshape(28,28))
        plt.gray()
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)
    plt.show()

In [None]:
see_images(x_train, 6)

In [None]:
# We need to preprocess our data to use them as input in the network. 
# We normalise the images, each pixel within [0, 1] and format the images into a vector

x_train = x_train.astype('float32') / 255 # 255 max rgb value
x_test = x_test.astype('float32') / 255
x_train = x_train.reshape(len(x_train), np.prod(x_train.shape[1:]))
x_test = x_test.reshape(len(x_test), np.prod(x_test.shape[1:]))

input_dim = x_train.shape[1]

# data
print(x_train.shape)
print(x_test.shape)

### Simplest autoencoder
We will use a single fully-connected neural for our encoder and decoder.

Elements:

Input Layer

* Input layer: $y = xI$

Encoder

* Fully-connected layer: $y = xA^T + b$
* Rectified Linear Unit function: $ReLu(x)=max(0, x)$ (encoder)

Decoder

* Fully-connected layer: $y = xW^T + b$
* Sigmoid function: $Sigmoid(x) = \sigma(x) = \frac{1}{1+exp(-x)}$ (decoder)

![autoencoder for MNIST](https://blog.keras.io/img/ae/autoencoder_schema.jpg)

In [None]:
from keras.layers import Input, Dense
from keras.models import Model # (template for architecture)

# latent representations dimension
latent_dim = 32
# input layer
input_img = Input(shape=(input_dim, ))

# encoder
encoder = Dense(latent_dim, activation='relu')
encoder_feats = encoder(input_img)

# decoder
decoder = Dense(input_dim, activation='sigmoid')
decoder_out = decoder(encoder_feats)

# model
simple_autoencoder = Model(input_img, decoder_out)

In [None]:
encoder_model = Model(input_img, encoder_feats)
encoded_input = Input(shape=(latent_dim, ))
decoder_model = Model(encoded_input, decoder(encoded_input))

Before training the model, we need to set up the loss function and optimizer.

Loss function per-pixel crossentropy loss

$y' = p(y)$

$loss = -[y * log(y' + \epsilon) + (1-y) * log(1-y' + \epsilon)]$


In [None]:
simple_autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

In [None]:
# Train model
simple_autoencoder.fit(x_train, x_train,  # the target is the same input data!
                       epochs=50, 
                       batch_size=256, 
                       shuffle=True, 
                       validation_data=(x_test, x_test)) 

In [None]:
# Predict digits
prediction = simple_autoencoder.predict(x_test)

# Visualisation of predictions
see_images(x_test, 5)
see_images(prediction, 5)

# Using decoder and encoder model
encoded_imgs = encoder_model.predict(x_test)
decoded_imgs = decoder_model.predict(encoded_imgs)

see_images(decoded_imgs, 5)

In [None]:
# latent representations
encoded_imgs[0]

### Varietional Autoencoder (VAE)

VAE is a generative model that learns the parameters of the probability distribution modelling the input data. 

"A VAE encodes data Y (e.g., a sentence) as hidden random variables Z, based on which the decoder reconstructs Y. Consider a generative model, parameterized by $\theta$, as"

$p_{\theta}(Z,Y) = p_{\theta}(Z)p_{\theta}(Y|Z)$ (Bahuleyan et al., 2018)

**Encoder**: it learns two parameters (`z_mean`, and `z_log_sigma`) in the latent space from the input data, and it randomly samples points from the latent normal distribution (`z = z_mean + exp(0.5 * z_log_sigma) * epsilon`).

**Decoder**: it maps the latent space points to the original input data


**Loss fuctions**: a VAE has two loss functions: the unimodal(reconstruction $l_{re}$) loss as in the simple autoencoder and the KL divergence ($D_{KL}$) between the latent distribution (approximation, learned parameters) and the prior distribution (actual distribution of latent space ?) which works as a regulaser. 

$l_{vae} = l_{re} - D_{KL}$

$l_{vae} = -[y * log(y' + \epsilon) + (1-y) * log(1-y' + \epsilon)] - [0.5 * \frac{1}{n}\sum^{n}{1+z_{sigma} - z_{mean}^2 - e^{z_{sigma}}}]$


In [None]:
# parameters
intermediate_dim = 512 # dimension of points in the latent space
batch_size = 128 
latent_dim = 2 # the two latent parameters z_mean and z_sigma

![architecture sequential VAE](img/VAE_arch.png)
(Kingma and Welling, 2014)

In [None]:
# architecture

inputs = Input(shape=(input_dim, ), name='encoder_input')

# encoder (x -> z_mean & x -> z_log_var) A 'Y' shape
encoder_1 = Dense(intermediate_dim, activation='relu')
encoder_out_1 = encoder_1(inputs)
z_mean = Dense(latent_dim, name='z_mean')
z_mean_out = z_mean(encoder_out_1)
z_log_sigma = Dense(latent_dim, name='z_log_sigma')
z_log_sigma_out = z_log_sigma(encoder_out_1)



In [None]:
# sampling
from keras import backend as K # operations with tensors

def sampling(params):
    z_mean, z_log_sigma = params
    epsilon = K.random_normal(shape=(K.shape(z_mean)[0], latent_dim)) # mean = 0, std = 1
    return z_mean + K.exp(0.5 * z_log_sigma) * epsilon #

# We need to wrap the output of sampling into a layer for connecting it with the decoder. For that we can
# a Lambda layer
from keras.layers import Lambda
z = Lambda(sampling, output_shape=(latent_dim, ))
z_out = z([z_mean_out, z_log_sigma_out])

In [None]:
# decoder
decoder_1 = Dense(intermediate_dim, activation='relu')
decoder_2 = Dense(input_dim, activation='sigmoid')
decoder_1_out = decoder_1(z_out)
decoder_2_out = decoder_2(decoder_1_out)

![Models](img/vae_ex_arch.png)

In [None]:
# models

# end-to-end autoencoder 
vae = Model(?, ?) 

# from inputs to latent space
encoder_model = Model(?, ?) 

# generator
decoder_input = Input(shape=(dim?, ))
generator = Model(decoder_input, ?)

In [None]:
# loss
from keras.losses import binary_crossentropy

# Reconstruction loss and KL divergence

l_re = binary_crossentropy(inputs, decoder_2_out) * input_dim
d_kl = - 0.5 * K.sum(1 + z_log_sigma_out - K.square(z_mean_out) - K.exp(z_log_sigma_out), axis=-1)

loss = K.mean(l_re + d_kl)

vae.add_loss(loss)

vae.compile(optimizer='adam')

In [None]:
# train 
vae.fit(x_train,
       shuffle=True,
       epochs=20,
       batch_size=batch_size,
       validation_data=(x_test, None))

In [None]:
x_test_encoded = encoder_model.predict(x_test, batch_size=batch_size)

# distributions of the different classes
plt.figure(figsize=(12, 10))
plt.scatter(x_test_encoded[:, 0], x_test_encoded[:, 1], c=y_test, cmap='viridis')
plt.colorbar()
plt.xlabel("z[0]")
plt.ylabel("z[1]")
plt.show()


In [None]:
# display generator usin random values within the interval
# shown by the encoder [-4,4]

n = 12  # figure with 12x12 digits
digit_size = 28
figure = np.zeros((digit_size * n, digit_size * n))

# we will sample n points within [-4, 4] standard deviations
grid_x = np.linspace(-4, 4, n) # value obtain with the encoder
grid_y = np.linspace(-4, 4, n)[::-1]

for i, yi in enumerate(grid_x):
    for j, xi in enumerate(grid_y):
        z_sample = np.array([[xi, yi]])
        x_decoded = generator.predict(z_sample)
        digit = x_decoded[0].reshape(digit_size, digit_size)
        figure[i * digit_size: (i + 1) * digit_size,
               j * digit_size: (j + 1) * digit_size] = digit

plt.figure(figsize=(10, 10))

# axis labels 
start_range = digit_size // 2
end_range = (n - 1) * digit_size + start_range + 1

pixel_range = np.arange(start_range, end_range, digit_size)

sample_range_x = np.round(grid_x, 1)
sample_range_y = np.round(grid_y, 1)

plt.xticks(pixel_range, sample_range_x)
plt.yticks(pixel_range, sample_range_y)
plt.xlabel("z[0]")
plt.ylabel("z[1]")

plt.imshow(figure)
plt.show()