# Assignment 5

I attempted to use the bounding boxes for cropping but they are wrong.

Team:

- Bertan Karacora

Tasks:

- a) Write **Convolutional** Variational Autoencoder (ConvVAE) for **CelebA** dataset: https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
    - Use Conv. layers for encoder and Transposed-Conv. layers for decoder.
    - You are allowed to use one FC-layer in each module for the bottleneck, but it's not necessary
- b) Investigate the importance of the KL-divergence weight. For this purpose, train multiple models using different weighting values and investigate how this value affects the generation performance.
- c) Generate new images by sampling latent vectors, investigate latent space and visualize some interpolations.
- d) Compare the models from a) and b)
    - Qualitative comparison. Which images look better?
    - Quantitative comparison between models using the Fréchet Inception Distance: https://arxiv.org/abs/1706.08500

## Contents

- [x] [Setup](#setup)
    - [x] [Config](#setup_config)
    - [x] [Modules](#setup_modules)
    - [x] [Paths and names](#setup_paths_and_names)
- [x] [Data](#data)
    - [x] [Visualization](#data_visualization)
    - [x] [Discussion](#data_discussion)
- [x] [Comparison of recurrent models](#comparison_of_recurrent_models)
    - [x] [LSTM](#comparison_of_recurrent_models_lstm)
    - [x] [Own LSTM](#comparison_of_recurrent_models_own_lstm)
    - [x] [Own ConvLSTM](#comparison_of_recurrent_models_own_convlstm)
    - [x] [GRU](#comparison_of_recurrent_models_gru)
    - [x] [Discussion](#comparison_of_recurrent_models_discussion)
- [x] [Comparison with 3D CNN](#comparison_with_3d_cnn)
    - [x] [Results](#comparison_with_3d_cnn_results)
    - [x] [Discussion](#comparison_with_3d_cnn_discussion)

## Setup
<a id="setup"></a>

In [1]:
%load_ext autoreload
%autoreload 2

### Config
<a id="setup_config"></a>

In [2]:
import assignment.config as config

Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_4/assignment/config.yaml


In [3]:
config.list_available()

['kth_customlstm', 'kth_gru', 'kth_lstm', 'kth_lstmconv', 'kth_resnet3d']

### Modules
<a id="setup_modules"></a>

In [4]:
from pathlib import Path

from assignment.evaluation.evaluator import Evaluator
import assignment.libs.utils_checkpoints as utils_checkpoints

### Paths and names
<a id="setup_paths_and_names"></a>

In [14]:
name_exp_lstm = "kth_lstm"
name_exp_customlstm = "kth_customlstm"
name_exp_lstmconv = "kth_lstmconv"
name_exp_gru = "kth_gru"
name_exp_resnet3d = "kth_resnet3d"

path_dir_exp_lstm = Path(config._PATH_DIR_EXPS) / name_exp_lstm
path_dir_exp_customlstm = Path(config._PATH_DIR_EXPS) / name_exp_customlstm
path_dir_exp_lstmconv = Path(config._PATH_DIR_EXPS) / name_exp_lstmconv
path_dir_exp_gru = Path(config._PATH_DIR_EXPS) / name_exp_gru
path_dir_exp_resnet3d = Path(config._PATH_DIR_EXPS) / name_exp_resnet3d

## Data
<a id="data"></a>

### Visualization
<a id="data_visualization"></a>

#### Test dataset

![Test dataset](experiments/kth_lstm/visualizations/Images_test.png)

#### Validate dataset

![Validate dataset](experiments/kth_lstm/visualizations/Images_validate.png)

#### Train dataset

![Train dataset](experiments/kth_lstm/visualizations/Images_train.png)

### Discussion
<a id="data_discussion"></a>

> Just some remarks:
> 
> - The same data setup and the same augmentations are used in all experiments
> - The already processed data is used (spit into frames, cut to $64 \times 64$)
> - A sequence length of $15$ has been used
> - The following augmentations are applied to the train dataset:
>   - RandomHorizontalFlip
>   - RandomRotation
>   - ColorJitter
>   - GaussianNoise (own implementation) + Clipping

## Comparison of recurrent models
<a id="comparison_of_recurrent_models"></a>

### LSTM
<a id="comparison_of_recurrent_models_lstm"></a>

![LSTM Loss](experiments/kth_lstm/plots/Loss.png)
![LSTM Metrics](experiments/kth_lstm/plots/Metrics.png)

In [9]:
config.set_config_exp(path_dir_exp_lstm)
_, model, _, _ = utils_checkpoints.load(path_dir_exp_lstm / "checkpoints" / "final.pth")

evaluator = Evaluator(name_exp_lstm, model)
evaluator.evaluate()

print(f"Loss on test data: {evaluator.log["total"]["loss"]}")
print(f"Metrics on test data")
for name, metrics in evaluator.log["total"]["metrics"].items():
    print(f"    {name:<10}: {metrics}")

Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_4/experiments/kth_lstm/config.yaml
Initializing dataloader...
Test dataset
Dataset KTH
    Number of datapoints: 4670
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_4/data/kth
    Split: test
    Transform: ToDtype(
  scale=True
  (transform_tv): ToDtype(scale=True)
)
    Transform of target: None
Initializing dataloader finished
Initializing criterion...
Initializing criterion finished
Initializing measurers...
Initializing measurers finished
Layer (type:depth-idx)                   Output Shape              Param #
├─CNN2dEncoder: 1-1                      [-1, 512]                 --
|    └─Sequential: 2-1                   [-1, 512, 4, 4]           --
|    |    └─BlockCNN2d: 3-1              [-1, 64, 32, 32]          1,792
|    |    └─BlockCNN2d: 3-2              [-1, 128, 16, 16]         205,184
|    |    └─BlockCNN2d: 3-3              [-1, 256, 8, 8]           819,968
|    |  

Validating: Batch 070 | Loss 3.04440: 100%|██████████| 73/73 [00:10<00:00,  6.80it/s]

Loss on test data: 1.759734357079806
Metrics on test data
    Accuracy  : 0.6143468950749464





### Own LSTM
<a id="comparison_of_recurrent_models_own_lstm"></a>

![LSTM Loss](experiments/kth_customlstm/plots/Loss.png)
![LSTM Metrics](experiments/kth_customlstm/plots/Metrics.png)

In [7]:
config.set_config_exp(path_dir_exp_customlstm)
_, model, _, _ = utils_checkpoints.load(path_dir_exp_customlstm / "checkpoints" / "final.pth")

evaluator = Evaluator(name_exp_customlstm, model)
evaluator.evaluate()

print(f"Loss on test data: {evaluator.log["total"]["loss"]}")
print(f"Metrics on test data")
for name, metrics in evaluator.log["total"]["metrics"].items():
    print(f"    {name:<10}: {metrics}")

Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_4/experiments/kth_customlstm/config.yaml
Initializing dataloader...
Test dataset
Dataset KTH
    Number of datapoints: 4670
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_4/data/kth
    Split: test
    Transform: ToDtype(
  scale=True
  (transform_tv): ToDtype(scale=True)
)
    Transform of target: None
Initializing dataloader finished
Initializing criterion...
Initializing criterion finished
Initializing measurers...
Initializing measurers finished
Layer (type:depth-idx)                   Output Shape              Param #
├─CNN2dEncoder: 1-1                      [-1, 512]                 --
|    └─Sequential: 2-1                   [-1, 512, 4, 4]           --
|    |    └─BlockCNN2d: 3-1              [-1, 64, 32, 32]          1,792
|    |    └─BlockCNN2d: 3-2              [-1, 128, 16, 16]         205,184
|    |    └─BlockCNN2d: 3-3              [-1, 256, 8, 8]           819,968
| 

  return F.conv2d(input, weight, bias, self.stride,
Validating: Batch 070 | Loss 1.13969: 100%|██████████| 73/73 [00:11<00:00,  6.41it/s]

Loss on test data: 1.0212732574286247
Metrics on test data
    Accuracy  : 0.6910064239828694





### Own ConvLSTM
<a id="comparison_of_recurrent_models_own_convlstm"></a>

![LSTM Loss](experiments/kth_lstmconv/plots/Loss.png)
![LSTM Metrics](experiments/kth_lstmconv/plots/Metrics.png)

In [11]:
config.set_config_exp(path_dir_exp_lstmconv)
_, model, _, _ = utils_checkpoints.load(path_dir_exp_lstmconv / "checkpoints" / "final.pth")

evaluator = Evaluator(name_exp_lstmconv, model)
evaluator.evaluate()

print(f"Loss on test data: {evaluator.log["total"]["loss"]}")
print(f"Metrics on test data")
for name, metrics in evaluator.log["total"]["metrics"].items():
    print(f"    {name:<10}: {metrics}")

Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_4/experiments/kth_lstmconv/config.yaml
Initializing dataloader...
Test dataset
Dataset KTH
    Number of datapoints: 4670
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_4/data/kth
    Split: test
    Transform: ToDtype(
  scale=True
  (transform_tv): ToDtype(scale=True)
)
    Transform of target: None
Initializing dataloader finished
Initializing criterion...
Initializing criterion finished
Initializing measurers...
Initializing measurers finished
Layer (type:depth-idx)                   Output Shape              Param #
├─CNN2dEncoderSpatial: 1-1               [-1, 64, 2, 2]            --
|    └─Sequential: 2-1                   [-1, 64, 2, 2]            --
|    |    └─BlockCNN2d: 3-1              [-1, 32, 32, 32]          896
|    |    └─BlockCNN2d: 3-2              [-1, 64, 16, 16]          51,392
|    |    └─BlockCNN2d: 3-3              [-1, 128, 8, 8]           205,184
|    | 

Validating: Batch 070 | Loss 1.21665: 100%|██████████| 73/73 [00:05<00:00, 12.62it/s]

Loss on test data: 0.9880165955494506
Metrics on test data
    Accuracy  : 0.7109207708779444





### GRU
<a id="comparison_of_recurrent_models_gru"></a>

![GRU Loss](experiments/kth_gru/plots/Loss.png)
![GRU Metrics](experiments/kth_gru/plots/Metrics.png)

In [15]:
config.set_config_exp(path_dir_exp_gru)
_, model, _, _ = utils_checkpoints.load(path_dir_exp_gru / "checkpoints" / "final.pth")

evaluator = Evaluator(name_exp_gru, model)
evaluator.evaluate()

print(f"Loss on test data: {evaluator.log["total"]["loss"]}")
print(f"Metrics on test data")
for name, metrics in evaluator.log["total"]["metrics"].items():
    print(f"    {name:<10}: {metrics}")

Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_4/experiments/kth_gru/config.yaml
Initializing dataloader...
Test dataset
Dataset KTH
    Number of datapoints: 4670
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_4/data/kth
    Split: test
    Transform: ToDtype(
  scale=True
  (transform_tv): ToDtype(scale=True)
)
    Transform of target: None
Initializing dataloader finished
Initializing criterion...
Initializing criterion finished
Initializing measurers...
Initializing measurers finished
Layer (type:depth-idx)                   Output Shape              Param #
├─CNN2dEncoder: 1-1                      [-1, 512]                 --
|    └─Sequential: 2-1                   [-1, 512, 4, 4]           --
|    |    └─BlockCNN2d: 3-1              [-1, 64, 32, 32]          1,792
|    |    └─BlockCNN2d: 3-2              [-1, 128, 16, 16]         205,184
|    |    └─BlockCNN2d: 3-3              [-1, 256, 8, 8]           819,968
|    |   

Validating: Batch 070 | Loss 0.06183: 100%|██████████| 73/73 [00:04<00:00, 14.80it/s]

Loss on test data: 1.3556768455616413
Metrics on test data
    Accuracy  : 0.6884368308351178





### Discussion
<a id="comparison_of_recurrent_models_discussion"></a>

> Implementation:
>
> - [Own LSTM and LSTM using Pytorch: assignment/models/lstm.py](assignment/models/lstm.py)
> - [GRU using Pytorch: assignment/models/gru.py](assignment/models/gru.py)
> - [RNN classifier wrapper: assignment/models/rnn.py](assignment/models/rnn.py)
> - [CNN encoder: assignment/models/cnn.py](assignment/models/cnn.py)
> - [MLP classifier: assignment/models/mlp.py](assignment/models/mlp.py)
> - [Experiment recipes: assignment/configs/...](assignment/configs)
> - [KTH dataset with automatic filtering of empty frames: assignment/datasets/kth.py](assignment/datasets/kth.py)
> - [Results, tensorboard logs, visualizations, plots: experiments/...](experiments/)
>

> Some remarks:
>
> - A rather deep convolutional encoder did make the difference compared with earlier experiments not shown here.
> - Despite weight regularization, learning rate scheduler, and data augmentationm there is overfitting in all experiments.
> - Except for the LSTMConv the CNN encoder used the hidden dimensions $[64, 128, 256, 512]$
> - All recurrent models comprise of 5 cells/layers with channel depth 512
> - ReduceLROnPlateau scheduler and L2 regularization have been applied
> - The accuracy is shown for the final model (although there might have been a higher accuracy in earlier epochs). This is the reason why the Pytorch LSTM has a lower accuracy. For reference, see the accuracy plots above (for the validation dataset). I will implement some functionality to savethe best performing checkpoint (besides the final one) until next assignment to fix this. The accuracies still give a good rough indication though.
>

> Results:
>
> Times are for one loop over the test dataset (4670 sequences of 15 frames each).
> Training time is for a single epoch looping over the train set (4469 sequences).

    | Model             | Accuracy | Training time | Inference time (still batched) | learnable params |
    | :---------------- | :------: | :-----------: | :----------------------------: | ---------------: |
    | LSTM (Pytorch)    |  0.614   |     17s       |             10s                |    15,207,558    |
    | LSTM (Own)        |  0.691   |     17s       |             11s                |    15,197,318    |
    | LSTMConv (Own)    |  0.710   |      8s       |              5s                |     5,687,494    |
    | GRU               |  0.688   |     16s       |              4s                |    12,580,998    |


> Observations/Conclusion:
>
> - My own implementation of the LSTM seems to behave very similar but not exactly the same as the Pytorch one
> - The LSTMConv model performed best while requiring much less parameters than the other models and while being the fastest. This is similar to what we could observe when we compared 2D CNNs with MLPs. There is a great advantage in ackknowledging the spatial structure of an image instead of disregarding it completely.
> - The GRU cells are lighter than the LSTM cells as it has less gates. Its performance is very similar.

## Comparison with 3D CNN
<a id="comparison_with_3d_cnn"></a>

### Results
<a id="comparison_with_3d_cnn_results"></a>

![ResNet3d Loss](experiments/kth_resnet3d/plots/Loss.png)
![ResNet3d Metrics](experiments/kth_resnet3d/plots/Metrics.png)

In [None]:
config.set_config_exp(path_dir_exp_resnet3d)
_, model, _, _ = utils_checkpoints.load(path_dir_exp_resnet3d / "checkpoints" / "final.pth")

evaluator = Evaluator(name_exp_resnet3d, model)
evaluator.evaluate()

print(f"Loss on test data: {evaluator.log["total"]["loss"]}")
print(f"Metrics on test data")
for name, metrics in evaluator.log["total"]["metrics"].items():
    print(f"    {name:<10}: {metrics}")

### Discussion
<a id="comparison_with_3d_cnn_discussion"></a>

> Implementation:
>
> - [3D ResNet (mostly not own implementation, see code): assignment/models/resnet.py](assignment/models/resnet.py)
>


> Observations/Conclusion:
>
> - Didn't finish in time...

> 