# Assignment 4

Team:

- Bertan Karacora

Tasks:

- For your experiments, use at least one augmentation from each of the following types:
    - Spatial Augmentations (rotation, mirroring, croppoing, ...)
    - Use some other augmentations (color jitter, gaussian noise, ...).
    - Use one (or more) of the following advanced augmentations:
    - **CutMix**: https://arxiv.org/pdf/1905.04899.pdf
    - **Mixup**: https://arxiv.org/pdf/1710.09412.pdf

- **Experiments 1:** Using your aforementioned augmentions:
    - Fine-tune ResNet, MobileNet, and ConvNext for your augmented dataset for car type classification and compare them.
    - Compare the following on a model of your choice: Fine-Tuned model, model as fixed feature extractor, and model with a Combined Approach
    - Log your losses and accuracies into Tensorboard (or some other logging tool)
    - **Extra Point**:
        - Fine-tune a Transformer-based model (e.g. SwinTransformer). Compare the performance (accuracy, confusion matrix, training time, loss landscape, ...) with the one from the convolutional models.
   
- **Experiment 2:** Try to get the best performance possible on this dataset
    - Fine-tune a pretrained neural network of your choice for classification.
    - Select a good training recipe: augmentations, optimizer, learning rate scheduling, classifier, loss function, ...

## Contents

- [x] [Setup](#setup)
    - [x] [Config](#setup_config)
    - [x] [Modules](#setup_modules)
    - [x] [Paths and names](#setup_paths_and_names)
- [x] [Data augmentation](#data_augmentation)
    - [x] [Visualization](#data_augmentation_visualization)
    - [x] [Discussion](#data_augmentation_discussion)
- [x] [Comparison of fine-tuned models](#comparison_of_fine_tuned_models)
    - [x] [ResNet](#comparison_of_fine_tuned_models_resnet)
    - [x] [MobileNet](#comparison_of_fine_tuned_models_mobilenet)
    - [x] [ConvNext](#comparison_of_fine_tuned_models_convnext)
    - [x] [Discussion](#comparison_of_fine_tuned_models_discussion)
- [x] [Comparison of transfer learning approaches](#comparison_of_transfer_learning_approaches)
    - [x] [Fixed feature extraction](#comparison_of_transfer_learning_approaches_fixed_feature_extraction)
    - [x] [Fine-tuning](#comparison_of_transfer_learning_approaches_fine_tuning)
    - [x] [Combined approach](#comparison_of_transfer_learning_approaches_combined_approach)
    - [x] [Discussion](#comparison_of_transfer_learning_approaches_discussion)
- [x] [Tensorboard](#tensorboard)
    - [x] [Visualization](#tensorboard_visualization)
    - [x] [Discussion](#tensorboard_discussion)
- [ ] [Fine-tuning a transformer-based model](#fine_tuning_a_transformer_based_model)
    - [ ] [Training and evaluation](#fine_tuning_a_transformer_based_model_training_and_evaluation)
    - [ ] [Discussion](#fine_tuning_a_transformer_based_model_discussion)
- [ ] [Car type classification](#car_type_classification)
    - [ ] [Training and evaluation](#car_type_classification_training_and_evaluation)
    - [ ] [Discussion](#car_type_classification_discussion)

[LSTM implementation](assignment/models/lstm.py)

## Data

### Visualization

#### Test dataset

![Test dataset](experiments/kth_lstm/visualizations/Images_test.png)

#### Validate dataset

![Validate dataset](experiments/kth_lstm/visualizations/Images_validate.png)

#### Train dataset

![Train dataset](experiments/kth_lstm/visualizations/Images_train.png)

### Discussion
<a id="data_augmentation_discussion"></a>

> Besides data type conversion and normalization, the following augmentations are applied in all experiments:
>
> - Random cropping (and then resizing the resulting patch).
> - Random horizontal flip with probability $p=0.5$. Vertical flip would not be sensible.
> - Random rotation by a degree $d \sim [-20, 20]$. This seems like a realistic and efficient way to augment images.
> - Brightness jitter by a factor $f \sim [0.6, 1.4]$. Brightness would be a factor with high variability in real world photographs (e.g., depending on daytime).
> - Contrast jitter by a factor $f \sim [0.8, 1.2]$.
> - Saturation jitter by a factor $f \sim [0.9, 1.1]$.
> - Hue jitter by a factor $f \sim [-0.2, 0.2]$. Colors of cars are pretty much arbitrary so it makes sense to use this augmentation.
> - Gaussian noise with mean $0.0$ and standard deviation $0.05$. Since this actually affects the intensity ranges, they are clipped back to the interval $[0.0, 1.0]$ afterards. This is done in another transform.
> - MixUp with interpolating value according to beta distribution where $\alpha=\beta=1.0$.
>
> Some transforms that are not available in Torchvision have been implemented in `assignment/transforms/`.
> The config file of each experiment is used to define these parameters.
> CutMix has been tested, but MixUp seems to fulfill the same purpose in a smoother way by interpolating instead of doing unrealistic cuts.

### LSTM

_, model, _, _ = utils_checkpoints.load(path_dir_exp_lstm / "checkpoints" / "final.pth")

evaluator = Evaluator(name_exp_lstm, model)
evaluator.evaluate()

print(f"Loss on test data: {evaluator.log["total"]["loss"]}")
print(f"Metrics on test data")
for name, metrics in evaluator.log["total"]["metrics"].items():
    print(f"    {name:<10}: {metrics}")