# Deep Learning for Computer Vision

---

**Goethe University Frankfurt am Main**

Winter Semester 2022/23

<br>

## *Assignment 9 (Generative Adversarial Networks)*

---

**Points:** 110<br>
**Due:** 15.2.2023, 10 am<br>
**Contact:** Matthias Fulde ([fulde@cs.uni-frankfurt.de](mailto:fulde@cs.uni-frankfurt.de))<br>

---

**Your Name:** Tilo-Lars Flasche

<br>

<br>

## Table of Contents

---

- [1 Networks](#1-Networks-(50-Points))
  - [1.1 Discriminator](#1.1-Discriminator-(20-Points))
  - [1.2 Generator](#1.2-Generator-(25-Points))
  - [1.3 Initialization](#1.3-Initialization-(5-Points))
- [2 Training](#2-Training-(40-Points))
  - [2.1 Discriminator Loss](#2.1-Discriminator-Loss-(10-Points))
  - [2.2 Generator Loss](#2.2-Generator-Loss-(5-Points))
  - [2.3 Gradients](#2.3-Gradients-(5-Points))
  - [2.4 Game](#2.4-Game-(20-Points))
- [3 Image Generation](#3-Image-Generation-(20-Points))
  - [3.1 Hyperparameter Tuning](#3.1-Hyperparameter-Tuning-(10-Points))
  - [3.2 Improvements](#3.2-Improvements-(10-Points))

<br>

<br>

## Setup

---

In this notebook we're working with PyTorch again. For training a generative adversarial network, we definitely need a GPU, so make sure that you're training on the correct device. You can use the statements below to check.

In [None]:
import torch
import torchvision
import torchvision.transforms as T

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

print(device)

We import some utility functions to count the number of parameters in a model and to show the training losses of the generator and discriminator networks in a plot. Furthermore we import classes for the generator and discriminator network and for handling the training process.

In [None]:
from utils import count_params, show_loss

from networks import Discriminator, Generator
from training import Game

%load_ext autoreload
%autoreload 2

### TODO remove the following cells until the 'Dataset' cell

In [None]:
G = Generator(latent_dim=100, channels=1, depth=4)
z = torch.ones(16, 100, 1, 1)
z.shape
G(z).shape

In [None]:
D = Discriminator(channels=1, depth=4)
x = torch.ones(16, 1, 28, 28)
D(x).shape

In [None]:
D(G(z)).shape

<br>

## Dataset

---

We're going to use the MNIST dataset again for the image generating task in order to reduce the necessary model complexity and runtime of the algorithm. However, if you want something more difficult, feel free to load another image dataset from Torchvision!

In [None]:
# We can use a larger batch size with this dataset.
batch_size = 128

# Cast to tensor and normalize.
transform = T.Compose([
    T.ToTensor(),
    T.Normalize([0.5], [0.5])
])

# We just merge the training and test sets to get more data.
dataset = torch.utils.data.ConcatDataset([
    torchvision.datasets.MNIST(
        root='./datasets',
        train=True,
        download=True,
        transform=transform
    ),
    torchvision.datasets.MNIST(
        root='./datasets',
        train=False,
        download=True,
        transform=transform
    )
])

# Make data loader for whole dataset.
dataloader = torch.utils.data.DataLoader(
    dataset=dataset,
    batch_size=batch_size,
    num_workers=2,
    shuffle=True
)

Note that the given batch size above is just a suggestion. No matter whether you use the MNIST dataset or something else, you can adjust this hyperparameter to see if you get better results.

<br>

## Exercises

---

Our goal in this assignment is to create a more or less vanilla convolutional GAN to generate images similar to those from the MNIST dataset (or the dataset of your choice). In order to do this, we have to implement both the generator and discriminator networks, the loss functions, and the training algorithm.

<br>

### 1 Networks (50 Points)

---

The first step is to create the two networks and implement the weight initialization.

<br>

### 1.1 Discriminator (20 Points)

---

The task is to complete the definition of the `Discriminator` class in the `networks.py` file.

The discriminator network is basically just a binary classifier. You're generally free to come up with a convolutional architecture on your own, with some restrictions. Your network should be fully convolutional without max pooling or linear layers, except for the output layer, where you're allowed to use a linear layer for the final projection.

Furthermore, your discriminator network should incrementally decrease the size of the feature maps. For the MNIST dataset, intermediate sizes for the feature maps of $28, 14, 7$ should be sufficient. Downsample the feature maps by using appropriate values for kernel size, padding, and stride.

You should use a sigmoid function for the final activation, mapping predictions in the $(0, 1)$ range. For the hidden layer activations it is recommended to use leaky ReLU. Other non saturating activation functions that do not produce sparse gradients can also work, though.

Normalize the activations of each layer, except for the first and last layer. If you use a larger batch size, such as the default value given above, using regular spatial batch norm should be fine. If you monitor the generated images during training and find that they become correlated, try using virtual batch norm or instance normalization instead.

There are no restrictions regarding the size of your model, but keep in mind that training a larger model takes more time. So it's recommended to start small and increase the capacity only if necessary.

<br>

### 1.2 Generator (25 Points)

---

The task is to complete the definition of the `Generator` class in the `networks.py` file.

The generator network should map latent vectors of some given dimension to image tensors with the same shape and value range as the images from the dataset. You're again free to come up with a convolutional architecture on your own, with some restrictions. As with the discriminator, your network should be fully convolutional except for the very first layer, where you're allowed to use a linear layer for the initial projection of the latent vectors.

Furthermore, your generator should incrementally increase the size of the feature maps. For the MNIST dataset, intermediate sizes for the feature maps of $7, 14, 28$ should be sufficient. Use either transposed convolution or the pixel shuffle operator for upsampling the feature maps.

For transposed convolution you can compute the output dimensions as:

<br>

$$
    D_\text{out} = (D_\text{in} - 1) \times \text{stride} - 2 \times \text{padding} + \text{kernel_size}
$$

<br>

You should use a tanh function for the final activation of the network, mapping pixel values in the $(-1, 1)$ range. For the hidden layer activations it is again recommended to use the leaky ReLU function.

As with the discriminator network, you should normalize the activations of each layer, except for the first and last layer of the network. Regular batch norm should be fine. If you encounter issues, use virtual batch norm or instance normalization as already noted above.

Again, there is no restriction regarding the number of parameters in your model, but it is recommended to start small and increase the size only if necessary. The network sizes do not have to match, but the differences in size shouldn't become too large.

<br>

### 1.3 Initialization (5 Points)

---

The task is to complete the definition of the `init_params` function in the `networks.py` file.

We want to initialize the parameters of the convolution, transposed convolution, and normalization layers in a particular way. The function is a callback that is recursively called with each submodule of your generator and discriminator networks. You should check whether the argument is a convolutional or normalization layer and if so, initialize the parameters as follows:

- `Conv2d` or `ConvTranspose2d`: Initialize the weights with values drawn from $\mathcal{N}(0, 0.02)$
- `Norm2d`: Initialize the weights with values drawn from $\mathcal{N}(1, 0.02)$ and the biases with constant $0$

Note that you shouldn't use biases for the convolutional layers when using normalization. Remember that you've shown in a previous assignment that the biases are redundant.

<br>

### 2 Training (40 Points)

---

The next step is to create the training routine for the GAN game.

<br>

### 2.1 Discriminator Loss (10 Points)

---

The task is to complete the definition of the `discriminator_loss` function in the `training.py` file.

Implement the discriminator loss as introduced in the lecture:

<br>

$$
    J^{(D)} =
    -\frac{1}{2}\mathbb{E}_{\mathbf{x} \sim p_\text{data}(\mathbf{x})} \ln D(\mathbf{x})
    -\frac{1}{2}\mathbb{E}_{\mathbf{z} \sim p(\mathbf{z})} \ln\left(1 - D(G(\mathbf{z}))\right)
$$

<br>

The function takes as input the predictions for a minibatch of samples from the training set and a minibatch of samples created by the generator network and returns the sum of the means of the respective losses.

<br>

### 2.2 Generator Loss (5 Points)

---

The task is to complete the definition of the `generator_loss` function in the `training.py` file.

Implement the *non-saturating* loss as introduced in the lecture:

<br>

$$
    J^{(G)} = -\frac{1}{2}\mathbb{E}_{\mathbf{z} \sim p(\mathbf{z})}\ln D(G(\mathbf{z}))
$$

<br>

Recall that this is different from the generator loss in the zero-sum game. We use this version to have a stronger gradient signal when the generated images are bad, so that the generator can better learn from its errors.

<br>

### 2.3 Gradients (5 Points)

---

The task is to complete the definition of the `requires_grad` function in the `training.py` file.

The generator and discriminator networks are trained jointly, in alternating runs. When training the discriminator, we also need to employ the generator, and vice versa. In order to avoid unnecessary computations, we will set the `requires_grad` attribute of parameters of the network that is currently not optimized to `False`.

The function should iterate over the parameters of the given model and set the attributes accordingly.

<br>

### 2.4 Game (20 Points)

---

The task is to complete the definition of the `Game` class in the `training.py` file.

This class implements the GAN game. Check the parameters and what attributes are stored in the constructor. The class provides methods for training, saving and loading state, and creating images for visual inspection during training.

Your task is to implement the main training loop in the `play` method of the class. We use the function `iterate` to sample an arbitrary amount of random minibatches during training, specified by the `num_iter` parameter of the method. In each iteration you should do the following:

- **Discriminator training**
  - Use the `requires_grad` function to set the corresponding attributes on the networks.
  - Sample a minibatch of random vectors of size `latent_dim` from $\mathcal{N}(0,1)$
  - Generate fake images.
  - Compute predictions for real and fake images.
  - Compute discriminator loss and append the value to the `self.d_loss` list.
  - Update discriminator paramerers.


- **Generator training**
  - Use the `requires_grad` function to set the corresponding attributes on the networks.
  - Sample a minibatch of random vectors of size `latent_dim` from $\mathcal{N}(0,1)$
  - Generate fake images.
  - Compute prediction for fake images.
  - Compute generator loss and append the value to the `self.g_loss` list.
  - Update generator parameters.

Note that we use Adam optimizer, with $\beta_1$ set to $0.5$, and a uniform learning rate for both networks. You may change that if you want to try out different algorithms, betas, or different learning rates for generator and discriminator.

<br>

### 3 Image Generation (20 Points)

---

Now that we have everything implemented, let's train our model for some time to see how well it works!

<br>

### 3.1 Hyperparameter Tuning (10 Points)

---

The task is to set up your model and training algorithm to get some good results. Finetune your hyperparameters to get a stable training run without convergence failure or mode collapse. An important part of training is visual inspection. The `Game` class saves grids of generated images in the `images` folder. Use these images to monitor the progress of your model and use early stopping if it's clear that the model doesn't converge.

You can use the `prefix` parameter of the constructor to give a unique name to the saved images and the `show_every` parameter to set an interval for generating samples. After training, keep *only* the best result for this exercise in the `images` folder for your submission.

The goal in this exercise is not to perfectly match $p_\text{data}(\mathbf{x})$. It's sufficient if your generator produces reasonably good results that are clearly recognizable as being close to samples the target distribution. See the image below for an acceptable result for the MNIST dataset.

<br>

![Example outputs](images/example.jpg)

<br>

#### 3.1.1 Solution

Adjust the code below to your needs for training the model.

In [None]:
############################################################
###                  START OF YOUR CODE                  ###
############################################################

# Set latent dimension and number of channels.
latent_dim = 100
channels = 1
depth = 4

# Create new training setup.
game = Game(
    Discriminator(channels, depth),
    Generator(latent_dim, channels, depth),
    dataloader,
    device,
    batch_size=batch_size,
    latent_dim=latent_dim,

    # kwargs ...
)

############################################################
###                   END OF YOUR CODE                   ###
############################################################

Note that the `Game` class saves the training state. You can execute the following cell with the call to the the `play` method multiple times and don't have to train your model in a single run. Also note that you can use the `save` and `load` methods of the class to save the training state to disc and continue training at a later time. You don't have to upload your model for submission.

In [None]:
history = game.play(num_iter=10_000)

<br>

#### 3.1.2 Results

Use the function call below to show the accumulated training loss history.

In [None]:
show_loss(history)

<br>

### 3.2 Improvements (10 Points)

---

The task is to implement at least *one* additional feature to the training algorithm that could improve your previous results. For instance you can implement R1 regularization, one-sided label smoothing, a replay buffer, additional noise, or other techniques not used so far that were introduced in the lecture or that you find in literature (give references when you implement something not mentioned in the lecture).

You're free to add additional parameters and methods to the `Game` class for your implementation, or additional functions in the `training.py` file.

Again keep only your best result in the `images` folder for your submission. (Remember using a different prefix for this training run.)

<br>

#### 3.2.1 Solution

Adjust the code below to train your model again using the new implementation.

In [None]:
############################################################
###                  START OF YOUR CODE                  ###
############################################################

# Set latent dimension and number of channels.
latent_dim = None
channels = 1
depth = None

# Create new training setup.
game = Game(
    Discriminator(channels, depth),
    Generator(latent_dim, channels, depth),
    dataloader,
    device,
    batch_size=batch_size,
    latent_dim=latent_dim,

    # kwargs ...
)

############################################################
###                   END OF YOUR CODE                   ###
############################################################

Train the networks for some time using the updated algorithm. For comparability, you should run the training for the same number of iterations as before.

In [None]:
history = game.play(num_iter=10_000)

<br>

#### 3.2.2 Results

Use the function call below to show the accumulated training loss history.

In [None]:
show_loss(history)

<br>

#### 3.2.3 Discussion

Briefly describe what you have changed and how it did affect the training.

*Write your descriptions here.*

<br>