# Lab 1 : Generative Models (Autoencoders)
```
- [S25] Advanced Machine Learning, Innopolis University
- Teaching Assistant: Gcinizwe Dlamini
```
<hr>


```
Lab Plan
1. Undercomplete, overcomplete, Sparse and Denoising Autoencoder
2. Task 1
3. Variational Autoencoders
4. Task 2
```

<hr>

## 1. Undercomplete, overcomplete, Sparse and Denoising Autoencoder

PCA vs. Undercomplete autoencoders
* Autoencoders are much flexible than PCA.
* Neural Network activation functions introduce “non-linearities” in encoding, but PCA only linear transformation.

### 1.1 Undercomplete Example

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchsummary import summary
from torch.utils.data import TensorDataset, DataLoader

import torchvision
import torchvision.transforms as transforms


device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

### 1.2 Defining Undercomplete Autoencoder

In [2]:
## Undercomplete
class autoencoder(nn.Module):
    def __init__(self, input_size, latent_dim):
      super(autoencoder, self).__init__()
      # Step 1 : Define the encoder
      self.encoder = nn.Sequential(
          nn.Linear(input_size, input_size//2),
          nn.ReLU(),
          nn.Linear(input_size//2, latent_dim)
      )

      # Step 2 : Define the decoder
      self.decoder = nn.Sequential(
          nn.Linear(latent_dim, input_size//2),
          nn.ReLU(),
          nn.Linear(input_size//2, input_size)
      )

      # Step 3 : Initialize the weights (optional)
      self.encoder.apply(self.__init_weights)
      self.decoder.apply(self.__init_weights)

    def forward(self, x):
      # Step 1: Pass the input through encoder to get latent representation
      z = self.encoder(x)
      # Step 2: Take latent representation and pass through decoder
      x = self.decoder(z)
      return x

    def encode(self,input):
      #Step 1: Pass the input through the encoder to get latent representation
      return self.encoder(input)

    def __init_weights(self,m):
      #Init the weights (optional)
      if type(m) == nn.Linear:
          torch.nn.init.xavier_uniform_(m.weight)
          m.bias.data.fill_(0.01)

### 1.3 Define training parameters

```
Step 1: Set training parameters (batch size, learning rate, optimizer, number of epochs, loss function)
Step 2: Create dataset (Randomly generated)
Step 3: Create data loader
Step 4: Define the training loop
```

In [3]:
batchSize = 100
learning_rate = 0.01
num_epochs = 3
sample = torch.randn((batchSize,1,64))
AE = autoencoder(64,5).to(device)
print(AE)

criterion = nn.MSELoss()
optimizer = torch.optim.Adam(AE.parameters(),lr=learning_rate)

#Create a random dataset
data_loader = DataLoader(TensorDataset(torch.randn((1000,1,64))),batch_size=32,shuffle=True)

autoencoder(
  (encoder): Sequential(
    (0): Linear(in_features=64, out_features=32, bias=True)
    (1): ReLU()
    (2): Linear(in_features=32, out_features=5, bias=True)
  )
  (decoder): Sequential(
    (0): Linear(in_features=5, out_features=32, bias=True)
    (1): ReLU()
    (2): Linear(in_features=32, out_features=64, bias=True)
  )
)


### 1.4 AE Training Loop

In [4]:
for epoch in range(num_epochs):
    epoch_loss = 0.0
    for X in data_loader:
        X = X[0].to(device)

        optimizer.zero_grad()
        # forward
        output = AE(X)
        loss = criterion(output, X)

        # backward
        loss.backward()
        optimizer.step()

        epoch_loss += loss.item()

    # log
    print('epoch [{}/{}], loss:{:.4f}'.format(epoch + 1, num_epochs, loss.item()))

epoch [1/3], loss:1.0779
epoch [2/3], loss:0.8597
epoch [3/3], loss:0.8648


## 2. Task 1 (Regularized Autoencoder)

Regularized autoencoders use a loss function that encourages the model to have other properties besides the ability to copy its input to its output.

* **Sparse Autoencoders** : It impose a constraint in its loss by adding a regularization term in the loss function.
$$L(x,\hat{x}) + λ \sum_{i}||h_i||$$
where $h_i$ is the activations of the hidden layers

  **Regularization Form** : It can be L1 regularization or Any other kinds of penalties are possible


* **Denoising Autoencoder** : a special autoencoder that is robust to noise. By adding stochastic noise, we force Autoencoder to learn more robust features


<font color='red'><strong>TASK 1.1 :</strong> Implement and train a Sparse Autoencoder for 1D data of your choice (data points should be more than 5000)</font>


<font color='red'><strong>TASK 1.2 :</strong> Implement and train a Denoising Autoencoder for CIFAR 10 dataset. Choose one class from the 10 classes</font>


In [None]:
class SparseAutoencoder(nn.Module):
  pass


class DenoisingAutoencoder(nn.Module):
  pass


## 3. Variational Autoencoders

![caption](https://learnopencv.com/wp-content/uploads/2020/11/vae-diagram-1-1024x563.jpg)


![](https://learnopencv.com/wp-content/uploads/2020/11/reparam-vae-2048x959.jpg)

Backpropagation works fine!!
However, we simply cannot do this for a random sampling process.

$$z \sim q(z|x^{i})$$

* It is basically divert the non-differentiable operation out of the network
* So that, even though we still involve a thing that is non-differentiable, at least it is out of the network
* Hence the network could still be trained.

To do so, we sample $\epsilon \sim N(0,I)$ and calculate:

$$z = \mu_{\phi}(x^{(i)}) + Σ^{1/2}_{\phi}(x^{(i)})\epsilon$$

**Key theories behind :** <br>
1. Change of variable
2. Loacation-Scale Transformation
3. [Law of The Unconscious Statistician](https://en.wikipedia.org/wiki/Law_of_the_unconscious_statistician)
4. [Evidence lower bound (ELBO)](https://en.wikipedia.org/wiki/Evidence_lower_bound)

Then, $z$ will be a sample from $q(z|x^{(i)})$ as its a linear transformation of $\epsilon$ with mean $\mu_{\phi}(x^{(i)})$ and covariance $Σ^{1/2}_{\phi}(x^{(i)})$.

The sampling operation now occurs only for $\epsilon$, which we don’t need to backpropagate through.


**NOTE:** make a simplifying assumption that our covariance matrix only has nonzero values on the diagonal, allowing us to describe this information in a simple vector.

**NOTE** In order to deal with the fact that the network may learn negative values for $σ$ , we'll typically have the network learn $log$ $\sigma$ and exponentiate this value to get the latent distribution's variance

In [5]:
import os
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import transforms
from torchvision.utils import save_image

### 3.1 Get data (MNIST) and set Hyper-parameters

In [6]:
# Hyper-parameters
image_size = 784
h_dim = 400
z_dim = 20
num_epochs = 15
batch_size = 128
learning_rate = 1e-3

# MNIST dataset
dataset = torchvision.datasets.MNIST(root='../../data',
                                     train=True,
                                     transform=transforms.ToTensor(),
                                     download=True)

# Data loader
data_loader = torch.utils.data.DataLoader(dataset=dataset,
                                          batch_size=batch_size,
                                          shuffle=True)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../../data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 119973772.85it/s]


Extracting ../../data/MNIST/raw/train-images-idx3-ubyte.gz to ../../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../../data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 58832294.23it/s]


Extracting ../../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../../data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 32175167.59it/s]


Extracting ../../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../../data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 10058357.32it/s]


Extracting ../../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../../data/MNIST/raw



### 3.2 Defining Variational Autoencoder

In [7]:
# VAE model
class VAE(nn.Module):
  def __init__(self, image_size=784, h_dim=400, z_dim=20):
    super(VAE, self).__init__()
    # Encoder part
    self.fc1 = nn.Linear(image_size, h_dim)

    self.fc2 = nn.Linear(h_dim, z_dim)
    self.fc3 = nn.Linear(h_dim, z_dim)

    # Decoder part
    self.fc4 = nn.Linear(z_dim, h_dim)
    self.fc5 = nn.Linear(h_dim, image_size)

  def encode(self, x):
    h = F.relu(self.fc1(x))
    return self.fc2(h), self.fc3(h)

  def reparameterize(self, mu, log_var):
    std = torch.exp(log_var/2)
    eps = torch.randn_like(std)
    return mu + eps * std

  def decode(self, z):
    h = F.relu(self.fc4(z))
    return F.sigmoid(self.fc5(h))

  def forward(self, x):
    mu, log_var = self.encode(x)
    z = self.reparameterize(mu, log_var)
    x_reconst = self.decode(z)
    return x_reconst, mu, log_var

### 3.3 Training Variational Autoencoder

In [8]:
model = VAE(image_size, h_dim, z_dim).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

In [None]:
# Start training
mse_loss = nn.MSELoss()
for epoch in range(num_epochs):
    for i, (x, _) in enumerate(data_loader):
        # Forward pass
        x = x.to(device).view(-1, image_size)
        x_reconst, mu, log_var = model(x)

        # Compute reconstruction loss and kl divergence
        reconst_loss = mse_loss(x_reconst, x)
        kl_div = - 0.5 * torch.sum(1 + log_var - mu.pow(2) - log_var.exp())

        # Backprop and optimize
        loss = reconst_loss + kl_div
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i+1) % 10 == 0:
            print ("Epoch[{}/{}], Step [{}/{}], Reconst Loss: {:.4f}"
                   .format(epoch+1, num_epochs, i+1, len(data_loader), reconst_loss.item()))

    with torch.no_grad():
        # Save the sampled images
        z = torch.randn(batch_size, z_dim).to(device)
        out = model.decode(z).view(-1, 1, 28, 28)
        save_image(out,'./sampled-{}.png'.format(epoch+1))

        # Save the reconstructed images
        out, _, _ = model(x)
        x_concat = torch.cat([x.view(-1, 1, 28, 28), out.view(-1, 1, 28, 28)], dim=3)
        save_image(x_concat, './reconst-{}.png'.format(epoch+1))

## 4. Task 2 (variational autoencoder)

<font color='red'><strong>TASK 2.1 :</strong> Implement and train a variational autoencoder for cifar 10 data using the achitecture below for encoder and decoder as baseline (you can only improve the achitecture by extending the achitecture on top of baseline) </font>
<br>

<font color='red'><strong>TASK 2.2 :</strong> Implement a function that will generate images using the implemented vae</font>  


**Encoder Achitecture (baseline)**
- 3 convolutional layers whereby each layer is followed by batch normalization and relu activation function
  - Layer 1 : applies 32 filters
  - Layer 2 : applies 64 filters
  - Layer 3 : applies 128 filters

**Latent space**
- without activation

**Decoder Achitecture (baseline)**
- 3 deconvolutional layers whereby each layer is followed by batch normalization and relu activation function
- Mirror of the encoder network

**NB**: For both convolution and deconvolution `kernel_size=4, stride=2, padding=1`

In [None]:
## Template code
## Write your code here

class VAE(nn.Module):
    def __init__(self):
        super(VAE, self).__init__()
        ## Your code here


    def reparametrize(self, mean, logvar):
        ## Your code here
        pass


    def forward(self, x):
        ## Your code here
        pass


vae_model = VAE()
vae_model

## Resources

* [Auto-Encoding Variational Bayes](https://arxiv.org/pdf/1312.6114.pdf)
* [Variational inference: A review for statisticians](https://arxiv.org/pdf/1601.00670.pdf)
* [Tutorial on variational autoencoders](https://arxiv.org/pdf/1606.05908.pdf)
* [Stochastic Backpropagation and Approximate Inference in Deep Generative Models](https://arxiv.org/pdf/1401.4082.pdf)