# DDA4210 Tutorial 7: Generative Models

1. TA: Dong QIAO
2. Email: dongqiao@link.cuhk.edu.cn
3. Office: Big Room 74, SDS Research Space on 4th Floor of Zhixin Bldg
4. Version: Beta

# Introduction to Variational Autoencoder

1. AutoEncoder (AE) is a type of neural network designed to learn an approximate identity transformation using an unsupervised way and then to reconstruct high-dimensional data and consists of an encoder network $f_\phi$ and a decoder network $g_\theta$, parameterized by $\phi$, $\theta$ respectively.

2. The middle layer of AE usually has a narrow bottleneck to compress original high-dimensional data to low-dimensional representations.

3. VAE aims to transform $\mathbf{x}$ into a prior distribution $p_z$ (rather than a fixed vector $\mathbf{z}$) using encoder $f_\phi$ and then to reconstruct $\mathbf{x}$ using decoder $g_\theta$.

4. Given training data $\mathbf{X} = \{\mathbf{x}_1, \mathbf{x}_2, \dots , \mathbf{x}_n\}$ and a prior distribution $p_z$. Assume we have trained $f_\phi$ and $g_\theta$ of VAE successfully. In order to generate a new sample that looks like a real data point $\mathbf{x}_i$, we need the following steps:
    * First, sample a $\mathbf{z}_i$ from the prior distribution $p_z$.
    * Then, a new sample can be generated via the decoder, i.e., $g_\theta(\mathbf{z}_i)$.

# Table of Contents

1. Basic Implementation
    * Import packages
    * Download dataset
    * Define our model
    * Define the loss function
    * Training
2. Assessing a Variational Autoencoder on MNIST using Pytorch
    * Test: Comparing Original Images with their Reconstructions
    * Data Generation
    * Visualize the latent space

# Basic Implementation

## Import packages

In [None]:
# Import libraries
import torchvision                             # contains image datasets and many functions to manipulate images
import torchvision.transforms as transforms    # to normalize, scale etc the dataset
from torch.utils.data import DataLoader        # to load data into batches (for SGD)
from torchvision.utils import make_grid        # Plotting. Makes a grid of tensors
from torchvision.datasets import MNIST         # the classic handwritten digits dataset
import matplotlib.pyplot as plt                # to plot our images
import numpy as np

## Download dataset

In [None]:
# Create Dataset object.s Notice that ToTensor() transforms images to pytorch
# tensors AND scales the pixel values to be within [0, 1]. Also, we have separate Dataset
# objects for training and test sets. Data will be downloaded to a folder called 'data'.
trainset = MNIST(root='./data', train=True, download=True, transform=transforms.ToTensor())
testset  = MNIST(root='./data', train=False, download=True, transform=transforms.ToTensor())

# Create DataLoader objects. These will give us our batches of training and testing data.
batch_size = 100

#########################
# load dataset
#########################
pass
#########################
# End
#########################

In [None]:
import torch
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

In [None]:
import torch.nn as nn                          # Class that implements a model (such as a Neural Network)
import torch.nn.functional as F                # contains activation functions, sampling layers etc
import torch.optim as optim                    # For optimization routines such as SGD, ADAM, ADAGRAD, etc

In [None]:
e_hidden = 500        # Number of hidden units in the encoder. See AEVB paper page 7, section "Marginal Likelihood"
d_hidden = 500        # Number of hidden units in the decoder. See AEVB paper page 7, section "Marginal Likelihood"
latent_dim = 2        # Dimension of latent space. See AEVB paper, page 7, section "Marginal Likelihood"
learning_rate = 0.001 # For optimizer (SGD or Adam)
weight_decay = 1e-5   # For optimizer (SGD or Adam)
epochs = 10           # Number of sweeps through the whole dataset

## Define our model

In [None]:
class VAE(nn.Module):
    def __init__(self):
        """Variational Auto-Encoder Class"""
        super(VAE, self).__init__()
        # Encoding Layers
        self.e_input2hidden = nn.Linear(in_features=784, out_features=e_hidden)
        self.e_hidden2mean = nn.Linear(in_features=e_hidden, out_features=latent_dim)
        self.e_hidden2logvar = nn.Linear(in_features=e_hidden, out_features=latent_dim)
        
        # Decoding Layers
        self.d_latent2hidden = nn.Linear(in_features=latent_dim, out_features=d_hidden)
        self.d_hidden2image = nn.Linear(in_features=d_hidden, out_features=784)
        
    def encoder(self, x):
        #####################
        # Define encoder
        #####################
        
        # Shape Flatten image to [batch_size, input_features]
        x = x.view(-1, 784)
        # Feed x into Encoder to obtain mean and logvar
        pass
        
        #####################
        # End
        #####################
        return self.e_hidden2mean(x), self.e_hidden2logvar(x)
        
    def decoder(self, z):
        #####################
        # Define encoder
        #####################
        pass
        #####################
        # Define encoder
        #####################
        
    def forward(self, x):
        # Encoder image to latent representation mean & std
        mu, logvar = self.encoder(x)
        
        # Sample z from latent space using mu and logvar
        if self.training:
            z = torch.randn_like(mu).mul(torch.exp(0.5*logvar)).add_(mu)
        else:
            z = mu
        
        # Feed z into Decoder to obtain reconstructed image. Use Sigmoid as output activation (=probabilities)
        x_recon = self.decoder(z)
        
        return x_recon, mu, logvar

## Define the loss function

In [None]:
# Loss
def vae_loss(image, reconstruction, mu, logvar):
    """Loss for the Variational AutoEncoder."""
    # Binary Cross Entropy for batch
    BCE = F.binary_cross_entropy(input=reconstruction.view(-1, 28*28), target=image.view(-1, 28*28), reduction='sum')
    
    ###############################
    # Closed-form KL Divergence
    ###############################
    pass
    ###############################
    # End
    ###############################
    
    return BCE - KLD

## Instantiate VAE and train it

In [None]:
# Instantiate VAE with Adam optimizer
vae = VAE()
vae = vae.to(device)    # send weights to GPU. Do this BEFORE defining Optimizer
optimizer = optim.Adam(params=vae.parameters(), lr=learning_rate, weight_decay=weight_decay)
vae.train()            # tell the network to be in training mode. Useful to activate Dropout layers & other stuff

# Train
losses = []

for epoch in range(epochs):
    # Store training losses & instantiate batch counter
    losses.append(0)
    number_of_batches = 0

    ##################################
    # Train the model for each batch
    ##################################
    
    # Grab the batch, we are only interested in images not on their labels
    for images, _ in trainloader:
        # Save batch to GPU, remove existing gradients from previous iterations
        images = images.to(device)
        optimizer.zero_grad()

        # Feed images to VAE. Compute Loss.
        pass

        # Backpropagate the loss & perform optimization step with such gradients
        loss.backward()
        optimizer.step()

        # Add loss to the cumulative sum
        losses[-1] += loss.item()  
        number_of_batches += 1
        
    ##################################
    # Train the model for each batch
    ##################################

    # Update average loss & Log information
    losses[-1] /= number_of_batches
    print('Epoch [%d / %d] average reconstruction error: %f' % (epoch+1, epochs, losses[-1]))

# Assessing a Variational Autoencoder on MNIST using Pytorch

## Comparing Original Images with their Reconstructions

In [None]:
# Display original images
test_images = images
    
with torch.no_grad():
    print("Original Images")
    test_images = test_images.cpu()
    test_images = test_images.clamp(0, 1)
    test_images = test_images[:50]
    test_images = make_grid(test_images, 10, 5)
    test_images = test_images.numpy()
    test_images = np.transpose(test_images, (1, 2, 0))
    plt.imshow(test_images)
    plt.show()

In [None]:
# Display reconstructed images
with torch.no_grad():
    print("Reconstructions")
    reconstructions = reconstructions.view(reconstructions.size(0), 1, 28, 28)
    reconstructions = reconstructions.cpu()
    reconstructions = reconstructions.clamp(0, 1)
    reconstructions = reconstructions[:50]
    plt.imshow(np.transpose(make_grid(reconstructions, 10, 5).numpy(), (1, 2, 0)))
    plt.show()

In [None]:
# Set VAE to evaluation mode (deactivates potential dropout layers)
# So that we use the latent mean and we don't sample from the latent space
vae.eval()

# Keep track of test loss (notice, we have no epochs)
test_loss, number_of_batches = 0.0, 0

for test_images, _ in testloader:

  # Do not track gradients
  with torch.no_grad():

    # Send images to the GPU/CPU
    test_images = test_images.to(device)

    ##############################################################################
    # Feed images through the VAE to obtain their reconstruction & compute loss
    ##############################################################################
    pass
    ##############################################################################
    # End
    ##############################################################################

    # Cumulative loss & Number of batches
    test_loss += loss.item()
    number_of_batches += 1

# Now divide by number of batches to get average loss per batch
test_loss /= number_of_batches
print('average reconstruction error: %f' % (test_loss))

## Data Generation

To sample images from the generative model, we can do something like this.

1. sample an array with shape [50, latent_dim] of standard normal variates, making sure that this tensor is saved to GPU. 
2. feed it through the decoder, obtaining a reconstruction, 
3. reshape it the correct image shape. 
4. use the usual trick to print images from Pytorch GPU with matplotlib.

Notice how we set the VAE in evaluation mode and we make sure that Pytorch doesn’t keep track of gradients.

In [None]:
vae.eval()
with torch.no_grad():
    # Sample from standard normal distribution
    z = torch.randn(50, latent_dim, device=device)
    
    # Reconstruct images from sampled latent vectors
    recon_images = vae.decoder(z)
    recon_images = recon_images.view(recon_images.size(0), 1, 28, 28)
    recon_images = recon_images.cpu()
    recon_images = recon_images.clamp(0, 1)
    
    # Plot Generated Images
    plt.imshow(np.transpose(make_grid(recon_images, 10, 5).numpy(), (1, 2, 0)))

## Visualize the latent space (optional)

Finally, we can also visualize the latent space. Basically we are generating a grid of values in the latent space. Then we are decoding the grid into images, reshaping it and plotting it. The result, looks like this:

In [None]:
with torch.no_grad():
    # Create empty (x, y) grid
    latent_x = np.linspace(-1.5, 1.5, 20)
    latent_y = np.linspace(-1.5, 1.5, 20)
    latents = torch.FloatTensor(len(latent_x), len(latent_y), 2)
    # Fill up the grid
    for i, lx in enumerate(latent_x):
    for j, ly in enumerate(latent_y):
        latents[j, i, 0] = lx
        latents[j, i, 1] = ly
    # Flatten the grid
    latents = latents.view(-1, 2)
    # Send to GPU
    latents = latents.to(device)
    # Find their representation
    reconstructions = vae.decoder(latents).view(-1, 1, 28, 28)
    reconstructions = reconstructions.cpu()
    # Finally, plot
    fig, ax = plt.subplots(figsize=(10, 10))
    plt.imshow(np.transpose(make_grid(reconstructions.data[:400], 20, 5).clamp(0, 1).numpy(), (1, 2, 0)))

Notice that since we are using simple linear layers and very few epochs, this will look very messy. For better performance we would need to use convolutional layers in the encoder & decoder.

# References

* Mauro Camara Escudero, 2020, Minimalist Variational Autoencoder in Pytorch with CUDA GPU, https://maurocamaraescudero.netlify.app/post/minimalist-variational-autoencoder-in-pytorch-with-cuda-gpu/

* Mauro Camara Escudero, 2020, Assessing a Variational Autoencoder on MNIST using Pytorch, https://maurocamaraescudero.netlify.app/post/assessing-a-variational-autoencoder-on-mnist-using-pytorch/

* Durk Kingma. 2020. “Variational Autoencoder in Pytorch.” https://github.com/dpkingma/examples/blob/master/vae/main.py

* Federico Bergamin. 2020. “Variational-Autoencoders.” https://github.com/federicobergamin/Variational-Autoencoders/blob/master/VAE.py

* Paul Guerrero, Nils Thuerey. 2020. “Smart Geometry Ucl - Variational Autoencoder Pytorch.” https://colab.research.google.com/github/smartgeometry-ucl/dl4g/blob/master/variational_autoencoder.ipynb#scrollTo=TglLFCT1N7iF

* Mauro Camara Escudero, 2020, Variational Auto-Encoders and the Expectation-Maximization Algorithm, https://maurocamaraescudero.netlify.app/post/variational-auto-encoders-and-the-expectation-maximization-algorithm/