<img src="https://drive.google.com/thumbnail?id=1cXtXdAcwedVDbapmz1pj_hULsQrhEcff&sz=w1500" width="500"/>

---

# **Variational AutoEncoders (VAEs)**

#### **Morning contents/agenda**

1. AutoEncoders

2. Variational AutoEncoders

3. Other VAE architectures

4. A simple VAE demo

#### **Learning outcomes**

1. Understand how Autoencoders can be used for data dimensionality reduction

2. Gain intuition about what a latent space is and how the latent space of Autoencoders and VAEs differ

3. Understand how the reparametrization trick makes VAEs trainable

4. Differentiate the role of the KL Divergence and reconstruction fidelity terms in the loss function of VAEs

<br>

#### **Afternoon contents/agenda**

1. Load the FashionMNIST dataset

2. Implement a convolutional VAE

3. Implement a conditioned convolutional VAE

#### **Learning outcomes**

1. Practice implementation VAEs

2. Understand how to refactor a linear VAE into a convolutional VAE

3. Learn how to perform class-conditioned generation

<br/>

---

<br/>

### **Conditional VAEs**

There is one type of VAE that is of special interest in generative problems: **Conditional Variational AutoEncoders (cVAEs)**.

We have been looking at how VAEs can learn latent representations of a dataset, which can then be used to generate new, original data points. However, what happens if we want to constrain the decoder to produce examples within a certain category?

cVAEs address this by introducing an additional conditional input $c$ to both the encoder and the decoder:

<br>

<center>
<img src="https://drive.google.com/thumbnail?id=1CuNqPhxOCkkwTN7HtIInKo7o3JsppUic&sz=w1500" width="600"/>
</center>

<br>

At generation time, we may simply pass a class of our choosing along with the random vector, and condition our generation.

In this afternoon practical we will look at how to implement these conditional VAEs in PyTorch.


In [None]:
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt

import random
from tqdm import tqdm

from torchvision.datasets import FashionMNIST
from torchvision.utils import make_grid
from torchvision import transforms
from torch.utils.data import DataLoader, Dataset, Subset

from sklearn.model_selection import StratifiedShuffleSplit

from IPython.display import clear_output

In [None]:
def set_seed(seed):
    """
    Use this to set ALL the random seeds to a fixed value and take out any randomness from cuda kernels
    """
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

    torch.backends.cudnn.benchmark = False  ##uses the inbuilt cudnn auto-tuner to find the fastest convolution algorithms. -
    torch.backends.cudnn.enabled   = False

    return True

device = 'cpu'
if torch.cuda.device_count() > 0 and torch.cuda.is_available():
    print("Cuda installed! Running on GPU!")
    device = 'cuda'
else:
    print("No GPU available!")

## 1. Loading and Visualising the Data

Tasks


- Define a preprocessing transform using `transforms.ToTensor()` so that images are converted to `PyTorch` tensors in the `[0, 1]` range.

- Download and load the training split of `FashionMNIST` from `torchvision.datasets`, applying your transform.

- Use `StratifiedShuffleSplit` to create an `80/20` stratified split of the dataset indices, using the class labels to preserve class balance between training and validation sets.

- Construct `torch.utils.data.Subset` objects for the training and validation subsets, and wrap each one in a `DataLoader` with `batch_size=128`.

- Draw a batch from the training dataloader and select the first 32 images to plot

## 2. Training Helper Functions


Write the `train` and `validation` functions that will handle optimisation of your VAE. They follow the same structure we used in class, but try to re-write them yourself rather than copying and pasting. This is a good chance to check that you really understand what each step in the training loop is doing, from computing the loss to updating the weights and keeping track of performance. Can you write the `ELBO` loss as its own ``nn.Module`` class?

## 3. Convolutional VAE
Now let's refactor our linear VAE into a convolutional VAE. Like in class, you will build an encoder, a decoder, and a VAE wrapper, and then check that the tensor shapes line up correctly.

Tasks
- Implement a `ConvEncoder` that:
    - Takes `input_channels` and `latent_size` as arguments.

    - Uses a `nn.Sequential` of convolution → batch norm → GELU → max-pooling blocks to map an input image to a compact feature map. Use three blocks with channel sizes 20 → 40 → 60, kernel size 3, appropriate padding to preserve the shape, and `MaxPool2d(2)` to downsample from 32\times32 to 4\times4.

    - Flattens the final feature map (of size 60\times4\times4) and passes it through two linear layers to output `mu` and `logvar`, each of dimension `latent_size`.

- Implement a ``ConvDecoder`` that:
    - First maps a latent vector of size latent_size back to a flattened feature map using a linear layer, then reshapes it to (60, 4, 4), padding it with 3 so that the effective input size is 32x32

    - Uses a `nn.Sequential` of convolution → batch norm → GELU → nearest-neighbour upsampling blocks to progressively upsample back to the image resolution. Mirror the encoder with channel sizes 60 → 60 → 40 → 20 → `input_channels`, and use `Upsample(scale_factor=2, mode="nearest")` between blocks. If you wish, you can also write your decoder with `ConvTranspose2d`

    - Ends with a Sigmoid activation to produce outputs in [0, 1], and crops the borders so the final output has the same spatial size as the original input.

- Implement a `ConvVAE` that:
    - Composes the `ConvEncoder` and `ConvDecoder` in the initialisation.
    - Includes a `sample(mu, logvar)` function that applies the reparameterisation trick $z = \mu + \epsilon\, \sigma$ with $\epsilon \sim \mathcal{N}(0, I)$.
    - In `forward(x)`, encodes x to `mu`, `logvar`, samples `z`, and returns the reconstructed image together with `mu` and `logvar`.

- Test your implementation by:
    - Instantiating ConvEncoder, ConvDecoder, and ConvVAE with input_channels=1 and latent_size=128.

    - Passing a mini-batch from your FashionMNIST dataloader through each component and printing the output shapes to confirm they match the expected dimensions.

## 4. Train your ConvVAE
Tasks
- Train the `ConvVAE` for 10 epochs using latent size = 32, an Adam optimiser with learning rate $1\times10^{-3}$, and an ELBO loss with $\beta = 1$. Track both training and validation losses throughout.

- After training, take a validation batch and plot the first 32 original images next to their reconstructions to assess reconstruction quality.

- Sample 32 latent vectors from a standard normal distribution, decode them, and plot the resulting generated images to inspect the model’s generative behaviour.

## 5. Expanding our ConVAE with Class-Conditioning

In a conditional VAE, additional information such as class labels is incorporated into the model so that both the encoder and decoder operate with awareness of the conditioning variable. This is usually achieved by concatenating the label (or a learned embedding of it) to the encoder input and the latent vector, or by injecting the conditioning value directly into these inputs so that the latent space becomes class-aware during training. For this task, we will implement conditioning via addition. This means that we will “add” the value of our label to our input. But before we do that, we must project the one-hot labels through a linear layer so that they lie in the same dimensional space as the corresponding input: one projection matching the spatial dimensions of the encoder input, and another matching the dimensionality of the latent space so that conditioning can also be applied within the decoder. One fun outcome of this is that our model is also learning the "best" projection during optimisation!


Tasks


- Write a function that converts a batch of integer class labels into one-hot vectors of dimension 10, ensuring the tensor is created on the same device as the labels. 

- Rewrite the convolutional encoder and decoder from the previous task so that each includes a linear layer mapping a one-hot vector of 10 classes to the appropriate dimension $1\times 28\times 28$ for the encoder input and the latent size for the decoder. Then add these embeddings to the inputs before forwarding through the network and adjust your final convolutional VAE accordingly. In this step, you may find it easier and more succinct to *inherit* from your previously implemented classes.

- Adjust your `train` and `validate` functions accordingly to use those class labels.

- Train your conditional VAE with the same hyperparameters from Question #4

- Generate 10 samples for each class by drawing random latent vectors, assigning a fixed label to each group, decoding them with the conditioned decoder, and plotting a row of images per label to inspect how well the model controls class-specific generation.