# Assignment 6

Team:

- Bertan Karacora

Tasks:

- Task 1:
    - Implement a **conditional DCGAN** model (https://arxiv.org/abs/1411.1784)
    - Train the model for conditional generation on the SVHN dataset
    - Requirements:
        - Use Tensorboard, WandDB or some other experiment tracker
        - Show the capabilities of the model to generate data based on given label
     
- Task 2:
    - Implement a fully convolutional DCGAN-like model (https://arxiv.org/abs/1511.06434)
    - Train the model on the CelebA dataset to generate new faces
    - Requirements:
        - Use Tensorboard, WandDB or some other experiment tracker
        - Show the capabilities of your model to generate images
        - Evaluate and track during training using one quantitative metric (e.g. FID)
 
- Extra point:
    - Train a SAGAN (self-attention GAN, (https://arxiv.org/abs/1805.08318) or BigGAN (https://arxiv.org/abs/1809.11096)) model on the CelebA dataset
    - You are allowed to use open-source implementations
    - Compare, both qualitatively and quantitatively, SAGAN/BigGAN with the DCGAN from task 2
    

## Contents

- [x] [Setup](#setup)
    - [x] [Config](#setup_config)
    - [x] [Modules](#setup_modules)
    - [x] [Paths and names](#setup_paths_and_names)
- [x] [Data](#data)
    - [x] [Visualization](#data_visualization)
        - [x] [SVHN](#data_visualization_svhn)
        - [x] [CelebA](#data_visualization_celeba)
    - [x] [Remarks](#data_remarks)
- [x] [Models](#models)
    - [x] [DCGAN](#models_dcgan)
    - [x] [CDCGAN](#models_cdcgan)
- [x] [Experiments](#experiments)
    - [x] [CDCGAN on SVHN](#experiments_svhn_cdcgan)
    - [x] [DCGAN on CelebA](#experiments_celeba_dcgan)
    - [x] [Discussion](#experiments_discussion)

## Setup
<a id="setup"></a>

In [1]:
%load_ext autoreload
%autoreload 2

### Config
<a id="setup_config"></a>

In [2]:
import assignment.config as config

config.list_available()

Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_6/assignment/config.yaml


['celeba_dcgan', 'celeba_unnormalized', 'svhn_cdcgan', 'svhn_unnormalized']

### Modules
<a id="setup_modules"></a>

In [3]:
from pathlib import Path

import torchsummary

from assignment.evaluation.evaluator import Evaluator
import assignment.libs.utils_checkpoints as utils_checkpoints

### Paths and names
<a id="setup_paths_and_names"></a>

In [4]:
name_exp_svhn_unnormalized = "svhn_unnormalized"
name_exp_celeba_unnormalized = "celeba_unnormalized"
name_exp_svhn_cdcgan = "svhn_cdcgan"
name_exp_celeba_dcgan = "celeba_dcgan"

path_dir_exp_svhn_cdcgan = Path(config._PATH_DIR_EXPS) / name_exp_svhn_cdcgan
path_dir_exp_celeba_dcgan = Path(config._PATH_DIR_EXPS) / name_exp_celeba_dcgan

## Data
<a id="data"></a>

### Visualization
<a id="data_visualization"></a>

#### SVHN
<a id="data_visualization_svhn"></a>

##### Test dataset

![Test dataset](experiments/svhn_cdcgan/visualizations/Sample_test.png)

##### Validation dataset

![Validate dataset](experiments/svhn_cdcgan/visualizations/Sample_validation.png)

##### Training dataset

![Training dataset](experiments/svhn_cdcgan/visualizations/Sample_training.png)

##### Training dataset (normalized)

![Training dataset (normalized)](experiments/svhn_cdcgan/visualizations/Sample_training_normalized.png)

#### CelebA
<a id="data_visualization_celeba"></a>

##### Test dataset

![Test dataset](experiments/celeba_dcgan/visualizations/Sample_test.png)

##### Validation dataset

![Validate dataset](experiments/celeba_dcgan/visualizations/Sample_validation.png)

##### Training dataset

![Training dataset](experiments/celeba_dcgan/visualizations/Sample_training.png)

##### Training dataset (normalized)

![Training dataset (normalized)](experiments/celeba_dcgan/visualizations/Sample_training_normalized.png)

### Remarks
<a id="data_remarks"></a>

> Implementation:
>
> - [SCHN dataset class](assignment/datasets/schn.py)
> - [CelebA dataset class](assignment/datasets/celeba.py)
> - [Script for computing mean and standard deviation of training dataset](assignment/scripts/compute_mean_and_std.py)

> Remarks:
> 
> - No data augmentation is used. As in the previous assignment, most augmentations would be unreasonable when training generative momdels. At most, some slight spatial augmentations might be useful to create additional data samples, but the datasets are large enough.
> - CelebA: Original images of shape $178 \times 218$ are cropped $178 \times 178$. Subsequently, they are rezised to $128 \times 128$.
> - The images are normalized using the mean and standard deviation of the training dataset. The normalization is applied also during validation and inference.
> - Since we normalize the images, we need to pay attention to do the same for any generated images (first mapping to [0, 1], then normalization)

## Models
<a id="models"></a>

### DCGAN
<a id="models_dcgan"></a>

In [5]:
config.set_config_exp(path_dir_exp_celeba_dcgan)

model_generator = utils_checkpoints.load_model_generator(path_dir_exp_celeba_dcgan / "checkpoints" / "latest.pth")
model_discriminator = utils_checkpoints.load_model_discriminator(path_dir_exp_celeba_dcgan / "checkpoints" / "latest.pth")

print(torchsummary.summary(model_generator, [config.MODEL_GENERATOR["shape_input"]], verbose=0))
print(torchsummary.summary(model_discriminator, [config.MODEL_DISCRIMINATOR["shape_input"]], verbose=0))

Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_6/experiments/celeba_dcgan/config.yaml
Layer (type:depth-idx)                   Output Shape              Param #
├─Sequential: 1-1                        [-1, 3, 128, 128]         --
|    └─BlockConvTranspose2d: 2-1         [-1, 1024, 2, 2]          --
|    |    └─ConvTranspose2d: 3-1         [-1, 1024, 2, 2]          2,098,176
|    |    └─BatchNorm2d: 3-2             [-1, 1024, 2, 2]          2,048
|    |    └─ReLU: 3-3                    [-1, 1024, 2, 2]          --
|    └─BlockConvTranspose2d: 2-2         [-1, 512, 4, 4]           --
|    |    └─ConvTranspose2d: 3-4         [-1, 512, 4, 4]           8,389,120
|    |    └─BatchNorm2d: 3-5             [-1, 512, 4, 4]           1,024
|    |    └─ReLU: 3-6                    [-1, 512, 4, 4]           --
|    └─BlockConvTranspose2d: 2-3         [-1, 256, 8, 8]           --
|    |    └─ConvTranspose2d: 3-7         [-1, 256, 8, 8]           2,097,408
|    |  

### CDCGAN
<a id="models_cdcgan"></a>

In [6]:
config.set_config_exp(path_dir_exp_svhn_cdcgan)

model_generator = utils_checkpoints.load_model_generator(path_dir_exp_svhn_cdcgan / "checkpoints" / "latest.pth")
model_discriminator = utils_checkpoints.load_model_discriminator(path_dir_exp_svhn_cdcgan / "checkpoints" / "latest.pth")

print(torchsummary.summary(model_generator, [config.MODEL_GENERATOR["shape_input"], [1]], verbose=0))
print(torchsummary.summary(model_generator, [config.MODEL_GENERATOR["shape_input"], [1]], verbose=0))

Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_6/experiments/svhn_cdcgan/config.yaml
Layer (type:depth-idx)                   Output Shape              Param #
├─BlockConvTranspose2d: 1-1              [-1, 512, 2, 2]           --
|    └─ConvTranspose2d: 2-1              [-1, 512, 2, 2]           262,656
|    └─BatchNorm2d: 2-2                  [-1, 512, 2, 2]           1,024
|    └─ReLU: 2-3                         [-1, 512, 2, 2]           --
├─BlockConvTranspose2d: 1-2              [-1, 512, 2, 2]           --
|    └─ConvTranspose2d: 2-4              [-1, 512, 2, 2]           8,704
|    └─BatchNorm2d: 2-5                  [-1, 512, 2, 2]           1,024
|    └─ReLU: 2-6                         [-1, 512, 2, 2]           --
├─Sequential: 1-3                        [-1, 3, 32, 32]           --
|    └─BlockConvTranspose2d: 2-7         [-1, 512, 4, 4]           --
|    |    └─ConvTranspose2d: 3-1         [-1, 512, 4, 4]           8,389,120
|    |    └─Bat

### Remarks
<a id="models_remarks"></a>

> Implementation:
>
> - [Training configs](assignment/configs)
> - [DCGAN model](assignment/models/gan.py)
> - [CDCGAN model class](assignment/models/gan.py)
> - [Convolutional blocks](assignment/models/cnn.py)

> Remarks:
> 
> - The discriminator and generator are almost symmetric, most importnatly, they share the same dimensions (in opoosite order) for the hidden layers.
> - Each block double/halves the image shape. As CelebA consists of $3 \times 128 \times 128$ images and SVHN of $3 \times 32 \times 32$ images, the configured DCGAN uses 6 Blocks both for the discriminator and generator. The CDCGAN for SVHN uses 4 Blocks, just like in the original publication.
> - The models are fully convolutional, no linear layers are used.
> - For the bottleneck, CelebA latent codes contain $128$ scalars, representing images of original input size $3 \cdot 128 \cdot 128 = 49152$. SVHN latent codes contain $32$ scalars, representing images of original input size $3 \cdot 32 \cdot 32 = 3072$.

## Experiments
<a id="experiments"></a>

### CDCGAN on SVHN
<a id="experiments_svhn_cdcgan"></a>

#### Training

##### Discriminator

![Loss](experiments/svhn_cdcgan/plots/Loss_discriminator.png)
![Learning rate](experiments/svhn_cdcgan/plots/Learning_rate_discriminator.png)
![Metrics_BinaryAccuracy_discriminator](experiments/svhn_cdcgan/plots/Metrics_BinaryAccuracy_discriminator.png)

##### Generator

![Loss](experiments/svhn_cdcgan/plots/Loss_generator.png)
![Learning rate](experiments/svhn_cdcgan/plots/Learning_rate_generator.png)
![FrechetInceptionDistance](experiments/svhn_cdcgan/plots/Metrics_FrechetInceptionDistance_generator.png)

#### Inference

![Samples generated unnormalized](experiments/svhn_cdcgan/visualizations/Samples_generated_unnormalized.png)
![Interpolation grid](experiments/svhn_cdcgan/visualizations/Interpolation_grid.png)

### DCGAN on CelebA
<a id="experiments_celeba_dcgan"></a>

#### Training

##### Discriminator

![Loss](experiments/celeba_dcgan/plots/Loss_discriminator.png)
![Learning rate](experiments/celeba_dcgan/plots/Learning_rate_discriminator.png)
![Metrics_BinaryAccuracy_discriminator](experiments/celeba_dcgan/plots/Metrics_BinaryAccuracy_discriminator.png)

##### Generator

![Loss](experiments/celeba_dcgan/plots/Loss_generator.png)
![Learning rate](experiments/celeba_dcgan/plots/Learning_rate_generator.png)
![FrechetInceptionDistance](experiments/celeba_dcgan/plots/Metrics_FrechetInceptionDistance_generator.png)

#### Inference

![Samples generated unnormalized](experiments/celeba_dcgan/visualizations/Samples_generated_unnormalized.png)
![Interpolation grid](experiments/celeba_dcgan/visualizations/Interpolation_grid.png)

### Discussion
<a id="comparison_of_recurrent_models_discussion"></a>

> Implementation:
>
> - [Configs](assignment/configs)
> - [Trainer](assignment/training/trainer_gan.py)
> - [Jupyter notebook for CelebA experiment](assignment_6_run_celeba.ipynb)
> - [Jupyter notebook for SVHN experiment](assignment_6_run_svhn.ipynb)
>

> Some remarks:
>
> - All models have been trained for 20 epochs. Training did take a lot of time as the dataset is very large, the models are medium sized, and some operations like visualizing generated images in tensorboard or evaluating the FID score are somewhat costly.
> - A warmup + exponential decay scheduler has been used. Since I intended to run the training for 20 epochs, the decay is rather strong.
> - The Frechet Inception Distance (FID) is rather costly to evaluate but it shows the training progress nicely.
>

> Results:
>
> Learnable params refers to the generator model. FID on the training dataset:

    | Model             |   FID    | learnable params | Total mult-adds (M): |
    | :---------------- | :------: | :--------------: | ------------------:  |
    | DCGAN on CelebA   |  3.823   |    13,278,627    |    731.19            |
    | CDCGAN on SVHN    |  0.493   |    11,292,291    |    432.33            |


> Observations/Conclusion:
>
> - In both experiments, I would say that reasonable results have been obtained.
> - Generally, the loss curves do not tell us too much. If one model performs better the loss increases for the other. Therefore, we might see wild changes and stagnation instead of a smooth progression. Also, it is probably safe to say that the discriminator has the easier task, which is consistent with the high binary classification Accuracy of $0.8$ most of the time.
> - The FID score for the CDCGAN on the SVHN dataset shows a nice and smooth progression over the 20 training epochs. The generated samples for the number 0 are not clearly identifyable as this number, however, arguably, the images resemble the digit 0 most among all digits.
> - The CelebA images are larger and much more complex (face vs digit). Therefore, a worse generation performance is to be expected. The results are consistent with this assumption.
> - The generated CelebA face images clearly show different human faces. However, the quality not as good that fake and real images would be indistinguishable for a human (which I also would not expect here). One can say, however, that the images do show some complex facial traits. One can also see that some depicted persons are smiling, looking in different directions, having different backgrounds, hairstyles, etc. Some additional regularization and hyperparameter tuning might already be sufficient to improve the performance further.
> - Possibly, one can see some rectagular artifacts in the generated images. These might be the consequence of the strided transposed convolutions that have been used to generate them. A deeper network with less stride distances and/or pooling and linear layers might be able to mitigate this effect.

> 