## Midterm Lab Exam

```
- Generative Artificial Intelligence (Fall semester 2023)
- Professor: Muhammad Fahim
- Teaching Assistant: Gcinizwe Dlamini
```
<hr>

## Tasks
```
1. Variational AutoEncoder
2. Conditional GAN (bonus)
```
All implementation in python and using PyTorch framework for neural networks
<hr>

## 0. Dataset

For both tasks CIFAR-10 dataset will be used

In [20]:
import os
import numpy as np
import torch
from torch import nn as nn
import torchvision
import torchvision.transforms as transforms
from torchvision.utils import save_image
import torch.nn.functional as F

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'available device : {device}')

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

batch_size = 32

trainset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

available device : cuda
Files already downloaded and verified


## 2. Task 1 (variational autoencoder)

* Implement and train a variational autoencoder for cifar 10 data using the achitecture below for generator and discriminator as baseline (you can only improve the achitecture by extending the achitecture on top of baseline)
* Implement a function that will generate images using the implemented vae


### 2.1  VAE definition (20 points)

**Encoder Achitecture (baseline)**
- 3 convolutional layers whereby each layer is followed by batch normalization and relu activation function
  - Layer 1 : applies 32 filters
  - Layer 2 : applies 64 filters
  - Layer 3 : applies 128 filters

**Latent space**
- without activation

**Decoder Achitecture (baseline)**
- 3 deconvolutional layers whereby each layer is followed by batch normalization and relu activation function
- Mirror of the encoder network

**NB**: For both convolution and deconvolution `kernel_size=4, stride=2, padding=1`

In [14]:
class VAE(nn.Module):
    def __init__(self, channel_num=3, kernel_num=128, z_size=2048, image_size=32):
        super(VAE, self).__init__()

        """
        Encoder architecture:
        3 convolutional layers whereby each layer is followed by batch normalization and relu activation function
            Layer 1 : applies 32 filters
            Layer 2 : applies 64 filters
            Layer 3 : applies 128 filters
        """
        # set up the parameters
        n_filters_list = [32, 64, 128]
        kernel_size = 4
        stride = 2
        padding = 1
        self.z_size = z_size   # the length of the mean and variance vectors

        def build_conv_layer(
            in_channels,
            out_channels,
            kernel_size=kernel_size,
            stride=stride,
            padding=padding
        ):
            return nn.Conv2d(
                in_channels=in_channels,
                out_channels=out_channels,
                kernel_size=kernel_size,
                stride=stride,
                padding=padding
            )

        def build_deconv_layer(
            in_channels,
            out_channels,
            kernel_size=kernel_size,
            stride=stride,
            padding=padding
        ):
            return nn.ConvTranspose2d(
                in_channels=in_channels,
                out_channels=out_channels,
                kernel_size=kernel_size,
                stride=stride,
                padding=padding
            )

        # build an encoder Sequential object
        self.encoder = nn.Sequential(
            build_conv_layer(in_channels=channel_num, out_channels=n_filters_list[0]),
            nn.BatchNorm2d(n_filters_list[0]),
            nn.ReLU(),
            build_conv_layer(in_channels=n_filters_list[0], out_channels=n_filters_list[1]),
            nn.BatchNorm2d(n_filters_list[1]),
            nn.ReLU(),
            build_conv_layer(in_channels=n_filters_list[1], out_channels=n_filters_list[2]),
            nn.BatchNorm2d(n_filters_list[2]),
            nn.ReLU()
        )

        # latent representation layers that produce mean and variance vectors
        self.mean = nn.Linear(z_size, z_size)
        self.variance = nn.Linear(z_size, z_size)

        """
        Decoder architecture:
        3 deconvolutional layers whereby each layer is followed by batch normalization and relu activation function (a mirror of the encoder network)
        """
        self.decoder = nn.Sequential(
            build_deconv_layer(in_channels=n_filters_list[2], out_channels=n_filters_list[1]),
            nn.BatchNorm2d(n_filters_list[1]),
            nn.ReLU(),
            build_deconv_layer(in_channels=n_filters_list[1], out_channels=n_filters_list[0]),
            nn.BatchNorm2d(n_filters_list[0]),
            nn.ReLU(),
            build_deconv_layer(in_channels=n_filters_list[0], out_channels=channel_num),
            nn.BatchNorm2d(channel_num),
            nn.ReLU()
        )


    def reparametrize(self, mean, variance):
        std = torch.exp(variance / 2)
        epsilon = torch.randn_like(std)
        return mean + epsilon * std


    def forward(self, x):
        ## Your code here
        latent = self.encoder(x)
        latent_flatten = latent.view(x.size(0), -1) # flattening
        # print(latent.shape)
        # print(latent_flatten.shape)
        # print(latent_flatten)

        mean, logvar = self.mean(latent_flatten), self.variance(latent_flatten)

        r_flatten = self.reparametrize(mean, logvar)    # reparameterization
        r = torch.reshape(r_flatten, latent.shape)  # unflattening
        x_reconstructed = self.decoder(r)

        return (mean, logvar), x_reconstructed




vae_model = VAE()
vae_model

VAE(
  (encoder): Sequential(
    (0): Conv2d(3, 32, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU()
    (6): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (7): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (8): ReLU()
  )
  (mean): Linear(in_features=2048, out_features=2048, bias=True)
  (variance): Linear(in_features=2048, out_features=2048, bias=True)
  (decoder): Sequential(
    (0): ConvTranspose2d(128, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): ConvTranspose2d(64, 32, kernel_size=(4, 4), stride=(2, 2), padding=

In [37]:
for data, _ in trainloader:
    latent_shape = vae_model.encoder(data).shape
    break
latent_shape

torch.Size([32, 128, 4, 4])

### 2.2 Training parameters definition (10 points)

- Define optimizer : Adam optimizer (default `weight_decay=1e-5, learning_rate=3e-04`)
- Define the criterion : kl-divergence loss and reconstruction loss

In [15]:
optimizer = torch.optim.Adam(vae_model.parameters(), weight_decay=1e-5, lr=3e-04)

def reconstruction_loss(x_reconstructed, x):
    ## Your code here
    return F.mse_loss(x, x_reconstructed, reduction='sum')  # we're summing the pixel-wise loss

def kl_divergence_loss(mean, logvar):
    ## Your code here
    return -torch.sum(1 + logvar - mean ** 2 - logvar.exp()) / 2

### 2.3 VAE Training and Evaluation (15 points)

- Define the training procedure and train the vae model
- Add model evaluation every after (n batches) or (n epochs)

In [19]:
from tqdm import tqdm

epochs = 10
vae_model.train()

for epoch in tqdm(range(epochs)):
    reconstruction_loss_sum, kl_divergence_loss_sum = 0, 0

    for data, _ in trainloader:
        # forward
        optimizer.zero_grad()
        (mean, logvar), x_reconstructed = vae_model(data)
        # compute the losses
        reconstruction_loss_batch = reconstruction_loss(x_reconstructed, data)
        kl_divergence_loss_batch = kl_divergence_loss(mean, logvar)
        loss_batch = reconstruction_loss_batch + kl_divergence_loss_batch
        # backprop
        loss_batch.backward()
        optimizer.step()
        # sum the losses
        reconstruction_loss_sum += reconstruction_loss_batch.item()
        kl_divergence_loss_sum += kl_divergence_loss_batch.item()

    # print losses per epoch
    print(f'Epoch {epoch}:')
    print(f'Reconstruction loss: {reconstruction_loss_sum}')
    print(f'KL divergence loss: {kl_divergence_loss_sum}')


 10%|█         | 1/10 [00:58<08:48, 58.74s/it]

Epoch 0:
Reconstruction loss: 6331143.8525390625
KL divergence loss: 469905.4650878906


 20%|██        | 2/10 [01:54<07:37, 57.20s/it]

Epoch 1:
Reconstruction loss: 6083635.541015625
KL divergence loss: 363860.82287597656


 30%|███       | 3/10 [02:41<06:06, 52.31s/it]

Epoch 2:
Reconstruction loss: 6011450.5732421875
KL divergence loss: 347919.7263183594


 40%|████      | 4/10 [03:29<05:04, 50.71s/it]

Epoch 3:
Reconstruction loss: 5968876.3818359375
KL divergence loss: 339364.62127685547


 50%|█████     | 5/10 [04:18<04:10, 50.07s/it]

Epoch 4:
Reconstruction loss: 5937293.4833984375
KL divergence loss: 331096.41525268555


 60%|██████    | 6/10 [05:07<03:18, 49.67s/it]

Epoch 5:
Reconstruction loss: 5906473.2529296875
KL divergence loss: 331338.4747314453


 70%|███████   | 7/10 [05:54<02:26, 48.69s/it]

Epoch 6:
Reconstruction loss: 5841626.037109375
KL divergence loss: 322981.98828125


 80%|████████  | 8/10 [06:41<01:36, 48.18s/it]

Epoch 7:
Reconstruction loss: 5686894.275390625
KL divergence loss: 320126.23767089844


 90%|█████████ | 9/10 [07:28<00:47, 47.76s/it]

Epoch 8:
Reconstruction loss: 5593394.751953125
KL divergence loss: 321101.8046875


100%|██████████| 10/10 [08:14<00:00, 49.43s/it]

Epoch 9:
Reconstruction loss: 5565116.6240234375
KL divergence loss: 322868.0987548828





### 2.4 Images generation (5 points)

- Implement a function that will take the number of images to be generated, generate images using the previously implemented vae and save them to folder

In [41]:
def generate_images(num_images=5, model=vae_model):
    if not os.path.exists("./output"):
        os.makedirs("./output")

    model.eval()
    with torch.no_grad():
        for i in range(num_images):
            n = torch.randn(1, 2048)    # sample ~N(0, 1)

            n = torch.reshape(n, [1, 128, 4, 4])  # unflattening
            # n = n.to(device)  # Send z to the same device as the model

            generation_img = model.decoder(n)

            save_image(generation_img.view(1, 3, 32, 32), f'./output/img{i}.png')

In [42]:
generate_images()

## Bonus task (50 points)

- Grading for the bonus task is binary (its eighter you get it all correct or zero) -- One mistake equals zero
- The bonus points scope is the midterm lab exam only (bonus points cannot be tranfered to other parts of the course)
- Implement a conditional GAN for CIFAR 10 data
- Train and evaluate cGAN
- Log the training and validation perfomace metrics to `TensorBoard`
- Implement a function that will take a condition, generate images using the previously implemented conditional GAN and visualize the result

In [None]:
## Your code here