# Assignment 6

Team:

- Bertan Karacora

Tasks:

- Task 1:
    - Implement a **conditional DCGAN** model (https://arxiv.org/abs/1411.1784)
    - Train the model for conditional generation on the SVHN dataset
    - Requirements:
        - Use Tensorboard, WandDB or some other experiment tracker
        - Show the capabilities of the model to generate data based on given label
     
- Task 2:
    - Implement a fully convolutional DCGAN-like model (https://arxiv.org/abs/1511.06434)
    - Train the model on the CelebA dataset to generate new faces
    - Requirements:
        - Use Tensorboard, WandDB or some other experiment tracker
        - Show the capabilities of your model to generate images
        - Evaluate and track during training using one quantitative metric (e.g. FID)
 
- Extra point:
    - Train a SAGAN (self-attention GAN, (https://arxiv.org/abs/1805.08318) or BigGAN (https://arxiv.org/abs/1809.11096)) model on the CelebA dataset
    - You are allowed to use open-source implementations
    - Compare, both qualitatively and quantitatively, SAGAN/BigGAN with the DCGAN from task 2
    

## Contents

- [x] [Setup](#setup)
    - [x] [Config](#setup_config)
    - [x] [Modules](#setup_modules)
    - [x] [Paths and names](#setup_paths_and_names)
- [x] [Data](#data)
    - [x] [Visualization](#data_visualization)
    - [x] [Remarks](#data_remarks)
- [x] [Models](#models)
    - [x] [Convolutional VAE](#models_convolutional_vae)
    - [x] [Remarks](#models_remarks)
- [x] [Experiments](#experiments)
    - [x] [ELBO weight 1.0e-2](#experiments_elboweight12)
    - [x] [ELBO weight 1.0e-3](#experiments_elboweight13)
    - [x] [ELBO weight 1.0e-4](#experiments_elboweight14)
    - [x] [Discussion](#experiments_discussion)

## Setup
<a id="setup"></a>

In [1]:
%load_ext autoreload
%autoreload 2

### Config
<a id="setup_config"></a>

In [2]:
import assignment.config as config

config.list_available()

Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_5/assignment/config.yaml


['celeba_unnormalized',
 'celeba_vaeconv_elboweight12',
 'celeba_vaeconv_elboweight13',
 'celeba_vaeconv_elboweight14',
 'celeba_vaeconv_elboweight15']

### Modules
<a id="setup_modules"></a>

In [3]:
from pathlib import Path

from assignment.evaluation.evaluator import Evaluator
import assignment.libs.utils_checkpoints as utils_checkpoints

### Paths and names
<a id="setup_paths_and_names"></a>

In [4]:
name_exp_vaeconv12 = "celeba_vaeconv_elboweight12"
name_exp_vaeconv13 = "celeba_vaeconv_elboweight13"
name_exp_vaeconv14 = "celeba_vaeconv_elboweight14"
name_exp_vaeconv15 = "celeba_vaeconv_elboweight15"
name_exp_unnormalized = "celeba_unnormalized"

path_dir_exp_vaeconv12 = Path(config._PATH_DIR_EXPS) / name_exp_vaeconv12
path_dir_exp_vaeconv13 = Path(config._PATH_DIR_EXPS) / name_exp_vaeconv13
path_dir_exp_vaeconv14 = Path(config._PATH_DIR_EXPS) / name_exp_vaeconv14
path_dir_exp_vaeconv15 = Path(config._PATH_DIR_EXPS) / name_exp_vaeconv15

## Data
<a id="data"></a>

### Visualization
<a id="data_visualization"></a>

#### Test dataset

![Test dataset](experiments/celeba_vaeconv_elboweight14/visualizations/Sample_test.png)

#### Validation dataset

![Validate dataset](experiments/celeba_vaeconv_elboweight14/visualizations/Sample_validation.png)

#### Training dataset

![Training dataset](experiments/celeba_vaeconv_elboweight14/visualizations/Sample_training.png)

#### Training dataset (normalized)

![Training dataset](experiments/celeba_vaeconv_elboweight14/visualizations/Sample_training_normalized.png)

### Remarks
<a id="data_remarks"></a>

> Implementation:
>
> - [CelebA dataset class](assignment/datasets/celeba.py)
> - [Script for computing mean and standard deviation of training dataset](assignment/scripts/compute_mean_and_std.py)

> Remarks:
> 
> - No data augmentation is used since we train our model to reconstruct real images. Spatial augmentations could have been used to generate additional training samples, however, CelebA is already a rather large dataset, so this should not be necessary. Othe augmentations such as noise augmentations should be counter-productive.
> - Original images of shape $178 \times 218$ are cropped $178 \times 178$. Subsequently, they are rezised to $128 \times 128$.
> - The images are normalized using the mean and standard deviation of the training dataset. The normalization is applied also during validation and inference.

## Models
<a id="models"></a>

### Convolutional VAE
<a id="models_convolutional_vae"></a>

In [5]:
config.set_config_exp(path_dir_exp_vaeconv14)

evaluator = Evaluator(name_exp_vaeconv14)

Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_5/experiments/celeba_vaeconv_elboweight14/config.yaml
Evaluator for experiment celeba_vaeconv_elboweight14
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_5/experiments/celeba_vaeconv_elboweight14
    Dataset: Dataset CelebA
    Number of samples: 19962
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_5/data/celeba
    Split: test
    Transform of samples: Compose(
      PILToTensor()
      CenterCrop(size=[178, 178])
      Resize(size=[128, 128], interpolation=InterpolationMode.BILINEAR, antialias=True)
      ToDtype(
    scale=True
    (transform_tv): ToDtype(scale=True)
  )
      Normalize(mean=[0.5084, 0.4224, 0.3768], std=[0.3048, 0.2824, 0.2808], inplace=False)
)
    Transform of targets: None
    Model: VAEGaussian(
  (linear_mean): Linear(in_features=4096, out_features=64, bias=True)
  (linear_log_var): Linear(in_features=4096, out_features=64, bias=Tr

### Remarks
<a id="models_remarks"></a>

> Implementation:
>
> - [Model config](assignment/configs/celeba_vaeconv_elboweight14.yaml)
> - [VAE model class](assignment/models/vae.py)
> - [Convolutional encoder and decoder](assignment/models/cnn.py)

> Remarks:
> 
> - The encoder and decoder are designed to be symmetric (except for bias parameter sizes).
> - They consist of 6 blocks each, including (transposed) convolutional layers, batch normalization, and LeakyRelu layers for non-linearity.
> - For the bottleneck, two linear layers are used to map features to the mean and the log of the variance of the Gaussian distribution which we are modeling. Another linear layer is used to map the low-dimensional latent code to the correct input dimension for the decoder.
> - For the bottleneck, latent codes containing $64$ scalars are generated, representing images of original input size $128*128*3 = 49152$.

## Experiments
<a id="experiments"></a>

### ELBO weight 1.0e-2
<a id="experiments_elboweight12"></a>

#### Training

![Loss](experiments/celeba_vaeconv_elboweight12/plots/Loss.png)
![Learning rate](experiments/celeba_vaeconv_elboweight12/plots/Learning_rate.png)
![SmoothL1Loss](experiments/celeba_vaeconv_elboweight12/plots/Metrics_SmoothL1Loss.png)
![ELBOGaussian](experiments/celeba_vaeconv_elboweight12/plots/Metrics_ELBOGaussian.png)
![FrechetInceptionDistance](experiments/celeba_vaeconv_elboweight12/plots/Metrics_FrechetInceptionDistance.png)

#### Quantitative evaluation

In [6]:
config.set_config_exp(path_dir_exp_vaeconv12)

evaluator = Evaluator(name_exp_vaeconv12)
evaluator.evaluate()

print(f"Metrics on test data")
for name, metrics in evaluator.log["test"]["total"]["metrics"].items():
    print(f"    {name:<10}: {metrics}")

Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_5/experiments/celeba_vaeconv_elboweight12/config.yaml
Evaluator for experiment celeba_vaeconv_elboweight12
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_5/experiments/celeba_vaeconv_elboweight12
    Dataset: Dataset CelebA
    Number of samples: 19962
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_5/data/celeba
    Split: test
    Transform of samples: Compose(
      PILToTensor()
      CenterCrop(size=[178, 178])
      Resize(size=[128, 128], interpolation=InterpolationMode.BILINEAR, antialias=True)
      ToDtype(
    scale=True
    (transform_tv): ToDtype(scale=True)
  )
      Normalize(mean=[0.5084, 0.4224, 0.3768], std=[0.3048, 0.2824, 0.2808], inplace=False)
)
    Transform of targets: None
    Model: VAEGaussian(
  (linear_mean): Linear(in_features=4096, out_features=64, bias=True)
  (linear_log_var): Linear(in_features=4096, out_features=64, bias=Tr

Validating: Batch 150: 100%|██████████| 156/156 [00:19<00:00,  8.11it/s]

Evaluation finished
Metrics on test data
    SmoothL1Loss: 0.4192132704835559
    ELBOGaussian: 1.7758202906446052
    FrechetInceptionDistance: 8.295232089269783





#### Qualitative evaluation

##### Reconstructions

![Predictions](experiments/celeba_vaeconv_elboweight12/visualizations/Predictions_test.png)

One can see the reconstruction accuracy better in the normalized images:
![Predictions normalized](experiments/celeba_vaeconv_elboweight12/visualizations/Predictions_test_normalized.png)

##### Image generation

![Samples generated unnormalized](experiments/celeba_vaeconv_elboweight12/visualizations/Samples_generated_unnormalized.png)

##### Latent space analysis

![Projection_pca_samples](experiments/celeba_vaeconv_elboweight12/plots/Projection_pca_samples.png)
![Projection_pca_generated](experiments/celeba_vaeconv_elboweight12/plots/Projection_pca_generated.png)
![Projection_tsne_samples](experiments/celeba_vaeconv_elboweight12/plots/Projection_tsne_samples.png)
![Projection_tsne_generated](experiments/celeba_vaeconv_elboweight12/plots/Projection_tsne_generated.png)
![Equispace_generated](experiments/celeba_vaeconv_elboweight12/visualizations/Equispace_generated.png)

##### Interpolation

![Endpoints_interpolation](experiments/celeba_vaeconv_elboweight12/visualizations/Endpoints_interpolation.png)
![Interpolation](experiments/celeba_vaeconv_elboweight12/visualizations/Interpolation.png)

### ELBO weight 1.0e-3
<a id="experiments_elboweight13"></a>

#### Training

![Loss](experiments/celeba_vaeconv_elboweight13/plots/Loss.png)
![Learning rate](experiments/celeba_vaeconv_elboweight13/plots/Learning_rate.png)
![SmoothL1Loss](experiments/celeba_vaeconv_elboweight13/plots/Metrics_SmoothL1Loss.png)
![ELBOGaussian](experiments/celeba_vaeconv_elboweight13/plots/Metrics_ELBOGaussian.png)
![FrechetInceptionDistance](experiments/celeba_vaeconv_elboweight13/plots/Metrics_FrechetInceptionDistance.png)

#### Quantitative evaluation

In [8]:
config.set_config_exp(path_dir_exp_vaeconv13)

evaluator = Evaluator(name_exp_vaeconv13)
evaluator.evaluate()

print(f"Metrics on test data")
for name, metrics in evaluator.log["test"]["total"]["metrics"].items():
    print(f"    {name:<10}: {metrics}")

Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_5/experiments/celeba_vaeconv_elboweight13/config.yaml
Evaluator for experiment celeba_vaeconv_elboweight13
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_5/experiments/celeba_vaeconv_elboweight13
    Dataset: Dataset CelebA
    Number of samples: 19962
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_5/data/celeba
    Split: test
    Transform of samples: Compose(
      PILToTensor()
      CenterCrop(size=[178, 178])
      Resize(size=[128, 128], interpolation=InterpolationMode.BILINEAR, antialias=True)
      ToDtype(
    scale=True
    (transform_tv): ToDtype(scale=True)
  )
      Normalize(mean=[0.5084, 0.4224, 0.3768], std=[0.3048, 0.2824, 0.2808], inplace=False)
)
    Transform of targets: None
    Model: VAEGaussian(
  (linear_mean): Linear(in_features=4096, out_features=64, bias=True)
  (linear_log_var): Linear(in_features=4096, out_features=64, bias=Tr

Validating: Batch 150: 100%|██████████| 156/156 [00:19<00:00,  8.05it/s]

Evaluation finished
Metrics on test data
    SmoothL1Loss: 0.3743369400758039
    ELBOGaussian: 12.619806736746504
    FrechetInceptionDistance: 7.872773217242415





#### Qualitative evaluation

##### Reconstructions

![Predictions](experiments/celeba_vaeconv_elboweight13/visualizations/Predictions_test.png)

One can see the reconstruction accuracy better in the normalized images:
![Predictions normalized](experiments/celeba_vaeconv_elboweight13/visualizations/Predictions_test_normalized.png)

##### Image generation

![Samples generated unnormalized](experiments/celeba_vaeconv_elboweight13/visualizations/Samples_generated_unnormalized.png)

##### Latent space analysis

![Projection_pca_samples](experiments/celeba_vaeconv_elboweight13/plots/Projection_pca_samples.png)
![Projection_pca_generated](experiments/celeba_vaeconv_elboweight13/plots/Projection_pca_generated.png)
![Projection_tsne_samples](experiments/celeba_vaeconv_elboweight13/plots/Projection_tsne_samples.png)
![Projection_tsne_generated](experiments/celeba_vaeconv_elboweight13/plots/Projection_tsne_generated.png)
![Equispace_generated](experiments/celeba_vaeconv_elboweight13/visualizations/Equispace_generated.png)

##### Interpolation

![Endpoints_interpolation](experiments/celeba_vaeconv_elboweight13/visualizations/Endpoints_interpolation.png)
![Interpolation](experiments/celeba_vaeconv_elboweight13/visualizations/Interpolation.png)

### ELBO weight 1.0e-4
<a id="experiments_elboweight14"></a>

#### Training

![Loss](experiments/celeba_vaeconv_elboweight14/plots/Loss.png)
![Learning rate](experiments/celeba_vaeconv_elboweight14/plots/Learning_rate.png)
![SmoothL1Loss](experiments/celeba_vaeconv_elboweight14/plots/Metrics_SmoothL1Loss.png)
![ELBOGaussian](experiments/celeba_vaeconv_elboweight14/plots/Metrics_ELBOGaussian.png)
![FrechetInceptionDistance](experiments/celeba_vaeconv_elboweight14/plots/Metrics_FrechetInceptionDistance.png)

#### Quantitative evaluation

In [9]:
config.set_config_exp(path_dir_exp_vaeconv14)

evaluator = Evaluator(name_exp_vaeconv14)
evaluator.evaluate()

print(f"Metrics on test data")
for name, metrics in evaluator.log["test"]["total"]["metrics"].items():
    print(f"    {name:<10}: {metrics}")

Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_5/experiments/celeba_vaeconv_elboweight14/config.yaml
Evaluator for experiment celeba_vaeconv_elboweight14
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_5/experiments/celeba_vaeconv_elboweight14
    Dataset: Dataset CelebA
    Number of samples: 19962
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_5/data/celeba
    Split: test
    Transform of samples: Compose(
      PILToTensor()
      CenterCrop(size=[178, 178])
      Resize(size=[128, 128], interpolation=InterpolationMode.BILINEAR, antialias=True)
      ToDtype(
    scale=True
    (transform_tv): ToDtype(scale=True)
  )
      Normalize(mean=[0.5084, 0.4224, 0.3768], std=[0.3048, 0.2824, 0.2808], inplace=False)
)
    Transform of targets: None
    Model: VAEGaussian(
  (linear_mean): Linear(in_features=4096, out_features=64, bias=True)
  (linear_log_var): Linear(in_features=4096, out_features=64, bias=Tr

Validating: Batch 150: 100%|██████████| 156/156 [00:14<00:00, 10.42it/s]

Evaluation finished
Metrics on test data
    SmoothL1Loss: 0.3577469870389887
    ELBOGaussian: 60.51262213077589
    FrechetInceptionDistance: 7.242685194257242





#### Qualitative evaluation

##### Reconstructions

![Predictions](experiments/celeba_vaeconv_elboweight14/visualizations/Predictions_test.png)

One can see the reconstruction accuracy better in the normalized images:
![Predictions normalized](experiments/celeba_vaeconv_elboweight14/visualizations/Predictions_test_normalized.png)

##### Image generation

![Samples generated unnormalized](experiments/celeba_vaeconv_elboweight14/visualizations/Samples_generated_unnormalized.png)

##### Latent space analysis

![Projection_pca_samples](experiments/celeba_vaeconv_elboweight14/plots/Projection_pca_samples.png)
![Projection_pca_generated](experiments/celeba_vaeconv_elboweight14/plots/Projection_pca_generated.png)
![Projection_tsne_samples](experiments/celeba_vaeconv_elboweight14/plots/Projection_tsne_samples.png)
![Projection_tsne_generated](experiments/celeba_vaeconv_elboweight14/plots/Projection_tsne_generated.png)
![Equispace_generated](experiments/celeba_vaeconv_elboweight14/visualizations/Equispace_generated.png)

##### Interpolation

![Endpoints_interpolation](experiments/celeba_vaeconv_elboweight14/visualizations/Endpoints_interpolation.png)
![Interpolation](experiments/celeba_vaeconv_elboweight14/visualizations/Interpolation.png)

### Discussion
<a id="comparison_of_recurrent_models_discussion"></a>

> Implementation:
>
> - [Configs](assignment/configs)
> - [Loss: weighted sum of SmoothL1Loss and ELBO (i.e., KL-Divergence of Gaussians)](assignment/losses)
> - [FrechetInceptionDistance](assignment/metrics/frechet.py)
> - [Run script (jupyter notebook in this case)](assignment_5_run.ipynb)
>

> Some remarks:
>
> - All models have been trained for 20 epochs. Training did take some time as the dataset is very large but the training converged after a few epochs in every experiment.
> - A warmup + exponential decay scheduler has been used (somewhat in an attempt to make the model learn more consistently instead of mostly in the first epoch).
> - The loss does look it stagnates but in Tensorboard it looked reasonable, especially the validation loss. I did not have the time to make my plotting function point this out.
> - The Frechet Inception Distance (FID) is rather costly to evaluate but it shows the training progress nicely.
> - No class labels are available in this dataset of faces, so the latent space analysis can not be used to analyze the clustering qualities of the latent space. One could have used the identities of the persons or some of the 40 properties that each image is annotated with, but this would still lead to a much too large number of classes to visualize and distinguish. Therefore, the analysis can only be used to make out the general "shape" of the data distributions.
>

> Results:
>
> On the test data (without weighting of the losses):

    | Model             |   FID    | ELBO loss |  SmoothL1Loss  | learnable params |
    | :---------------- | :------: | :-------: | :-------------:| ---------------: |
    | ELBO weight 1e-2  |  8.295   |   1.776   |     0.419      |    13,372,041    |
    | ELBO weight 1e-3  |  7.873   |  12.620   |     0.374      |    13,372,041    |
    | ELBO weight 1e-4  |  7.242   |  60.512   |     0.357      |    13,372,041    |


> Observations/Conclusion:
>
> - The training is mostly converged after a few epochs. Later the FID seems to increase but it seems there is no overfitting according to the loss.
> - Generally, the SmoothL1Loss is about 1-2 orders of magnitude smaller than the ELBO loss (this was different when the data was not normalized). In all cases the reconstruction loss (SmoothL1Loss) is still higher weighted. Therefore, during training, this loss is reduced early whle the ELBO loss might increase during the first epochs. Later, not much can be made with regard to the reconstruction, so the ELBO loss is decreasing after some epochs.
> - At the cost of a seemingly only slightly worse reconstruction, the ELBO loss (i.e., informally "the Gaussian-ness" of the latent space) can be reduced greatly. Notably, evaluating the reconstructions qualitatively, one can see that the difference in reconstruction quality is much greater than the metrics and losses may let it look. Many images of the 1e-2 weight model look identical. Using the 1e-4 weight, the faces are well recognizable. The opposite can be observed for the generated images. Here, the lowest ELBO weight makes the images unrealistic and unnatural. A more Gaussian-like distribution leads to better generated images. This can also be observed from the PCA plots. The interpolation look nice for all three models, but the better generalization capabilities of the higher weight models suggests that they might be better suited for use cases like latent space interpolation.
> - However, in practice, the reconstruction is more important of course. I would say that a weight value of 1.0e-3 or slightly below that is best (in this setting using normalized data and the SmoothL1loss).

> 