# Validation

In [7]:
import warnings
import os
import sys

%load_ext autoreload
%autoreload 2

warnings.filterwarnings('ignore')
current_dir = %pwd

parent_dir = os.path.abspath(os.path.join(current_dir, '../'))
sys.path.append(parent_dir)

from src.model_selection import continual_hyperparameter_selection

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Overview

In this part of the showcase we use continual hyperparameter selection framework [[M. De Lange et al. 2022](https://arxiv.org/pdf/1909.08383)] to validate and find the best parameters of the selected models for all MNIST datasets. We do not include other datasets since their backbone models are too expensive to train.

Each model has its own set of parameters as specified in the corresponding yaml file in the hyperparameter folder and of them, only the learning rate and buffer size are used for finding the optimal plasticity, while the other are annealed with the hyperparameter_drop constant when considering the stability.

The validation split is, for every dataset, of the 10% of the training set, while keeping the original augmentation function and the same seed for the split.
Both $\alpha$ and $\beta$ are initially set to 1 (the latter only used in the DER++ model).

## Sequential MNIST with DER

We look for the best plasticity parameters among the following. 

In [8]:
import utils

utils.load_hparams('seq-mnist')

{'lr': [0.1, 0.05, 0.03, 0.01], 'buffer_size': [100, 200, 500]}

We chose as metric, for each task, the TIL and CIL averaged accuracy for this dataset since it can be evaluated in both settings, with a maximum drop of 3% with respect to the best accuracy.

As it can be seen, the model achieves good hold-out performance metrics on the test set after the continual selection process.

Let's see how a low accuracy drop margin focuses on performance on the new tasks.

In [12]:
continual_hyperparameter_selection('SequentialMNIST', accuracy_drop=0.03)

Epoch 1/1 - Loss: 0.21879360079765326
Task 0 - Best LR: 0.1 - Best Buffer Size: 100 - Best Accuracy on Validation set: 99.76303317535546

Epoch 1/1 - Loss: 0.197294950485229566
Task 1 - Best LR: 0.1 - Best Buffer Size: 200 - Best Accuracy on Validation set: 97.01986754966887

Epoch 1/1 - Loss: 0.47033986449241646
Task 2 - Best LR: 0.1 - Best Buffer Size: 500 - Best Accuracy on Validation set: 99.20071047957371

Epoch 1/1 - Loss: 0.24144992232322693
Task 3 - Best LR: 0.1 - Best Buffer Size: 100 - Best Accuracy on Validation set: 99.91789819376025

Epoch 1/1 - Loss: 0.24856895208358765
Task 4 - Best LR: 0.1 - Best Buffer Size: 100 - Best Accuracy on Validation set: 98.05084745762713

[[99.8108747   0.          0.          0.          0.        ]
 [99.90543735 97.2575906   0.          0.          0.        ]
 [99.62174941 86.19000979 98.29242263  0.          0.        ]
 [96.35933806 89.91185113 98.02561366 99.59718026  0.        ]
 [82.93144208 51.3712047  96.42475987 97.88519637 96.5708

{'best_lr': 0.1, 'best_buffer_size': 100, 'best_alpha': 1.0, 'best_beta': None}

Instead, if we increase the margin we expect a more stable model, meaning that the backward transfer should be higher as well as all metrics that take into account past performances.

In [13]:
continual_hyperparameter_selection('SequentialMNIST', accuracy_drop=0.2)

Epoch 1/1 - Loss: 0.21860808134078986
Task 0 - Best LR: 0.1 - Best Buffer Size: 500 - Best Accuracy on Validation set: 99.84202211690362

Epoch 1/1 - Loss: 0.34778621792793274
Task 1 - Best LR: 0.05 - Best Buffer Size: 200 - Best Accuracy on Validation set: 96.64735099337749

Epoch 1/1 - Loss: 0.44592320919036865
Task 2 - Best LR: 0.1 - Best Buffer Size: 500 - Best Accuracy on Validation set: 98.84547069271758

Epoch 1/1 - Loss: 0.19618040323257446
Task 3 - Best LR: 0.1 - Best Buffer Size: 100 - Best Accuracy on Validation set: 99.67159277504105

Epoch 1/1 - Loss: 0.30878353118896484
Task 4 - Best LR: 0.1 - Best Buffer Size: 200 - Best Accuracy on Validation set: 97.6271186440678

[[99.85815603  0.          0.          0.          0.        ]
 [99.71631206 96.52301665  0.          0.          0.        ]
 [99.47990544 87.80607248 98.02561366  0.          0.        ]
 [99.62174941 92.65426053 98.39914621 99.6978852   0.        ]
 [97.96690307 85.45543585 97.86552828 99.09365559 97.07513

{'best_lr': 0.1, 'best_buffer_size': 200, 'best_alpha': 1.0, 'best_beta': None}

## Sequential MNIST with DER++

The setting is the same as for the standard DER model, but we also look for the best $\beta$ parameter. 

Here, we can see a slight improvement over DER, with a better hold-out performance on the test set.

In [10]:
continual_hyperparameter_selection('SequentialMNIST', accuracy_drop=0.1, plus_plus=True)

Epoch 1/1 - Loss: 0.24193820357322693
Task 0 - Best LR: 0.1 - Best Buffer Size: 200 - Best Accuracy on Validation set: 99.92101105845181

Epoch 1/1 - Loss: 0.19866521656513214
Task 1 - Best LR: 0.1 - Best Buffer Size: 200 - Best Accuracy on Validation set: 97.26821192052981

Epoch 1/1 - Loss: 0.16898572444915771
Task 2 - Best LR: 0.1 - Best Buffer Size: 500 - Best Accuracy on Validation set: 98.84547069271758

Epoch 1/1 - Loss: 0.33798587322235115
Task 3 - Best LR: 0.05 - Best Buffer Size: 200 - Best Accuracy on Validation set: 99.83579638752053

Epoch 1/1 - Loss: 0.26001423597335815
Task 4 - Best LR: 0.1 - Best Buffer Size: 200 - Best Accuracy on Validation set: 97.96610169491525

[[99.85815603  0.          0.          0.          0.        ]
 [99.85815603 96.08227228  0.          0.          0.        ]
 [99.85815603 86.53281097 99.30629669  0.          0.        ]
 [99.71631206 92.85014691 98.98612593 99.54682779  0.        ]
 [99.71631206 62.5367287  98.82604055 97.2306143  97.5794

{'best_lr': 0.1, 'best_buffer_size': 200, 'best_alpha': 0.5, 'best_beta': 0.5}

## Permuted MNIST with DER

In [6]:
continual_hyperparameter_selection('PermutedMNIST', accuracy_drop=0.03)

Epoch 1/1 - Loss: 0.8987474441528322
Task 0 - Best LR: 0.1 - Best Buffer Size: 500 - Best Accuracy on Validation set: 90.66666666666666

Epoch 1/1 - Loss: 0.52269858121871957
Task 1 - Best LR: 0.1 - Best Buffer Size: 100 - Best Accuracy on Validation set: 91.5

Epoch 1/1 - Loss: 0.48922467231750495
Task 2 - Best LR: 0.1 - Best Buffer Size: 500 - Best Accuracy on Validation set: 92.05

Epoch 1/1 - Loss: 0.34915137290954595
Task 3 - Best LR: 0.1 - Best Buffer Size: 100 - Best Accuracy on Validation set: 92.95

Epoch 1/1 - Loss: 0.35405570268630984
Task 4 - Best LR: 0.1 - Best Buffer Size: 200 - Best Accuracy on Validation set: 92.93333333333334

Epoch 1/1 - Loss: 0.38096842169761667
Task 5 - Best LR: 0.1 - Best Buffer Size: 100 - Best Accuracy on Validation set: 93.28333333333333

Epoch 1/1 - Loss: 0.32035407423973083
Task 6 - Best LR: 0.1 - Best Buffer Size: 200 - Best Accuracy on Validation set: 93.13333333333334

Epoch 1/1 - Loss: 0.29441887140274055
Task 7 - Best LR: 0.1 - Best Buffe

{'best_lr': 0.1, 'best_buffer_size': 100, 'best_alpha': 1.0, 'best_beta': None}

## Permuted MNIST with DER++

In [5]:
continual_hyperparameter_selection('PermutedMNIST', accuracy_drop=0.03, plus_plus=True)

Epoch 1/1 - Loss: 0.6964246034622192
Task 0 - Best LR: 0.1 - Best Buffer Size: 200 - Best Accuracy on Validation set: 91.25

Epoch 1/1 - Loss: 0.59029185771942146
Task 1 - Best LR: 0.1 - Best Buffer Size: 200 - Best Accuracy on Validation set: 92.98333333333333

Epoch 1/1 - Loss: 0.8846891522407532
Task 2 - Best LR: 0.1 - Best Buffer Size: 500 - Best Accuracy on Validation set: 93.03333333333333

Epoch 1/1 - Loss: 0.5628597736358643
Task 3 - Best LR: 0.1 - Best Buffer Size: 500 - Best Accuracy on Validation set: 93.91666666666667

Epoch 1/1 - Loss: 0.5331328511238098
Task 4 - Best LR: 0.1 - Best Buffer Size: 200 - Best Accuracy on Validation set: 93.4

Epoch 1/1 - Loss: 0.53180593252182015
Task 5 - Best LR: 0.1 - Best Buffer Size: 500 - Best Accuracy on Validation set: 93.25

Epoch 1/1 - Loss: 0.5902520418167114
Task 6 - Best LR: 0.1 - Best Buffer Size: 500 - Best Accuracy on Validation set: 93.73333333333333

Epoch 1/1 - Loss: 0.7139091491699219
Task 7 - Best LR: 0.1 - Best Buffer Siz

{'best_lr': 0.1, 'best_buffer_size': 500, 'best_alpha': 1.0, 'best_beta': 1.0}

# Conclusion

We can see how continual hyperparameter selection is influenced by the choice of the accuracy drop margin:
- A higher allowed drop in accuracy preserves the stability of the model, meaning that we should get more backward transfer and less catastrophic forgetting.
- A lower allowed drop in accuracy will lead to a more plastic model, which will be able to learn more tasks, but with a higher risk of catastrophic forgetting.

We could even optimize different metrics, depending on the scenario we are dealing with.