# Avalanche Standalone

this notebook shows you how to use Avalanche components inside your own training loops. We will see how to use:
- Benchmarks
- Dynamic and MultiTask Models
- Replay methods

## 🤝 Run it on Google Colab

You can run _this chapter_ and play with it on Google Colaboratory: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AntonioCarta/avalanche-demo/blob/main/avl_standalone.ipynb)

https://github.com/AntonioCarta/avalanche-demo/blob/main/avl_standalone.ipynb

## Install Avalanche

First, let's install Avalanche. You can skip this step if you have installed it already.

In [None]:
!pip install avalanche-lib==0.5.0

## Benchmarks

you can import benchmarks from the `avl.benchmarks` module. We are going to use `SplitMNIST`, the class-incremental MNIST stream.

In [1]:
from avalanche.benchmarks import SplitMNIST

In [2]:
benchmark = SplitMNIST(
    n_experiences=5,
    return_task_id=False
)
train_stream = benchmark.train_stream
test_stream = benchmark.test_stream

print(f"len train: {len(train_stream)}, len test: {len(test_stream)}")

len train: 5, len test: 5


In [3]:
for exp in train_stream:
    eid = exp.current_experience
    curr_classes = exp.classes_in_this_experience
    tid = exp.task_label
    print(f"({eid}) - T{tid}, classes={curr_classes}")

(0) - T0, classes=[2, 3]
(1) - T0, classes=[8, 6]
(2) - T0, classes=[5, 7]
(3) - T0, classes=[0, 9]
(4) - T0, classes=[1, 4]


Notice that Avalanche does not order classes.

 You can create a task-aware benchmark by setting `return_task_id=True`.

In [4]:
benchmark = SplitMNIST(
    n_experiences=5,
    return_task_id=True
)
train_stream = benchmark.train_stream
test_stream = benchmark.test_stream

print(f"len train: {len(train_stream)}, len test: {len(test_stream)}")

len train: 5, len test: 5


In [5]:
for exp in train_stream:
    eid = exp.current_experience
    curr_classes = exp.classes_in_this_experience
    tid = exp.task_label
    print(f"({eid}) - T{tid}, classes={curr_classes}")

(0) - T0, classes=[0, 5]
(1) - T1, classes=[8, 7]
(2) - T2, classes=[9, 3]
(3) - T3, classes=[2, 4]
(4) - T4, classes=[1, 6]


## Multi-Task Model

you can use pytorch models

In [6]:
import torch
from torch import nn

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            nn.Linear(784, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU()
        )
        self.classifier = nn.Linear(512, 10)
    
    def forward(self, x, **kwargs):
        x = x.reshape(x.shape[0], -1)
        x = self.features(x)
        return self.classifier(x)
    
model = MLP()
model(torch.randn(32, 784))
print(model)

MLP(
  (features): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
  )
  (classifier): Linear(in_features=512, out_features=10, bias=True)
)


MLP is ignoring task_labels. You can use Avalanche MultiHeadClassifier to split the output layer by task id.

In [7]:
x = torch.randn(32, 784)
t = torch.randint(low=0, high=4, size=(32,))
model(x, task_labels=t)

tensor([[ 0.1224, -0.0981,  0.0701,  0.1571, -0.0390,  0.0817, -0.0917,  0.0924,
          0.0396, -0.0584],
        [-0.0256, -0.2319,  0.0117,  0.2477,  0.0060,  0.1637, -0.0587, -0.0685,
          0.0925, -0.0490],
        [ 0.1753, -0.0766,  0.0769,  0.1029,  0.0381,  0.0859, -0.0243,  0.0646,
          0.0877, -0.0614],
        [ 0.1560, -0.1191,  0.0392,  0.1476, -0.0180,  0.0283, -0.1122, -0.0116,
          0.1618,  0.0541],
        [ 0.1224,  0.0200,  0.0185,  0.1189,  0.0350,  0.0684, -0.2346,  0.0555,
         -0.0674, -0.1834],
        [ 0.0324, -0.0390,  0.0744,  0.1455, -0.0191, -0.0209, -0.1493, -0.1253,
          0.0086, -0.1311],
        [ 0.0764, -0.0901, -0.0229,  0.1111, -0.0336, -0.0343, -0.1621, -0.0245,
          0.0695, -0.0850],
        [-0.0245, -0.0969,  0.0976,  0.1234,  0.0315,  0.0225, -0.0494,  0.0012,
          0.1043, -0.0882],
        [-0.0150, -0.0201, -0.0356,  0.0671,  0.0365,  0.0346, -0.1827,  0.1166,
          0.0902, -0.0169],
        [ 0.1559, -

In [8]:
from avalanche.models import as_multitask

model_mt = as_multitask(MLP(), 'classifier')
print(model_mt)

MultiTaskDecorator(
  (model): MLP(
    (features): Sequential(
      (0): Linear(in_features=784, out_features=512, bias=True)
      (1): ReLU()
      (2): Linear(in_features=512, out_features=512, bias=True)
      (3): ReLU()
    )
    (classifier): Sequential()
  )
  (classifier): MultiHeadClassifier(
    (classifiers): ModuleDict(
      (0): IncrementalClassifier(
        (classifier): Linear(in_features=512, out_features=10, bias=True)
      )
    )
  )
)


In [9]:
t

tensor([3, 0, 0, 0, 1, 2, 1, 2, 2, 0, 0, 2, 2, 2, 2, 3, 2, 2, 0, 0, 1, 0, 1, 0,
        1, 2, 1, 2, 2, 0, 1, 1])

The model still doesn't know about all the tasks because it has neven seen them.

In [10]:
model_mt(x, task_labels=t)

KeyError: '1'

You have to adapt the model

In [11]:
from avalanche.models.utils import avalanche_model_adaptation

for exp in benchmark.train_stream:
    avalanche_model_adaptation(model_mt, exp)

print(model_mt)

MultiTaskDecorator(
  (model): MLP(
    (features): Sequential(
      (0): Linear(in_features=784, out_features=512, bias=True)
      (1): ReLU()
      (2): Linear(in_features=512, out_features=512, bias=True)
      (3): ReLU()
    )
    (classifier): Sequential()
  )
  (classifier): MultiHeadClassifier(
    (classifiers): ModuleDict(
      (0): IncrementalClassifier(
        (classifier): Linear(in_features=512, out_features=10, bias=True)
      )
      (1): IncrementalClassifier(
        (classifier): Linear(in_features=512, out_features=10, bias=True)
      )
      (2): IncrementalClassifier(
        (classifier): Linear(in_features=512, out_features=10, bias=True)
      )
      (3): IncrementalClassifier(
        (classifier): Linear(in_features=512, out_features=10, bias=True)
      )
      (4): IncrementalClassifier(
        (classifier): Linear(in_features=512, out_features=10, bias=True)
      )
    )
  )
)


Now the model has been adapted to all the tasks. A separate head for each task is available for classification.

In [12]:
model_mt(x, task_labels=t)

tensor([[-1.0000e+03, -1.0000e+03,  4.3231e-02, -1.0000e+03, -3.0058e-02,
         -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03],
        [ 9.6094e-02, -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03,
         -2.3990e-01, -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03],
        [ 7.2177e-02, -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03,
         -9.8190e-02, -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03],
        [ 2.4720e-03, -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03,
         -9.9390e-02, -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03],
        [-1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03,
         -1.0000e+03, -1.0000e+03, -1.0763e-01,  2.4434e-01, -1.0000e+03],
        [-1.0000e+03, -1.0000e+03, -1.0000e+03,  4.2126e-02, -1.0000e+03,
         -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03,  1.4921e-02],
        [-1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03,
         -1.0000e+03, -1.0000e+0

## Training loop - Finetuning (with Multi-head model)

We can train the model continually by iterating over the `train_stream` provided by the scenario.

In [13]:
from torch.optim import SGD
from torch.nn import CrossEntropyLoss
from avalanche.models import SimpleMLP
from avalanche.training import Naive
from torch.utils.data import DataLoader
import torch.nn.functional as F

# scenario
benchmark = SplitMNIST(
    n_experiences=5,
    return_task_id=True,
    seed=1
)

# model
model = as_multitask(MLP(), 'classifier')
optimizer = SGD(model.parameters(), lr=0.001, momentum=0.9)
criterion = CrossEntropyLoss()

In [14]:
from avalanche.models.utils import avalanche_model_adaptation
from avalanche.models.dynamic_optimizers import reset_optimizer

device = 'cpu'
num_epochs = 2

for exp in benchmark.train_stream:
    print(f"Experience ({exp.current_experience})")
    model.train()

    # AVALANCHE: model adaptation step adds new parameters
    # In our model: adds a new head for each task
    avalanche_model_adaptation(model, exp)
    # AVALANCHE: We just added parameters to the model. We must also update the optimizer
    reset_optimizer(optimizer, model)
    
    dataset = exp.dataset
    dataset = dataset.train()  # AVALANCHE: activate correct transformation group
    
    for epoch in range(num_epochs):
        dl = DataLoader(dataset, batch_size=128)
        for x, y, t in dl:
          x, y, t = x.to(device), y.to(device), t.to(device)

          optimizer.zero_grad()
          # AVALANCHE: MultiTaskModels need task labels
          output = model(x, t)
          loss = F.cross_entropy(output, y)
          loss.backward()
          optimizer.step()
        print('Train Epoch: {} \tLoss: {:.6f}'.format(epoch, loss.item()))


Experience (0)
Train Epoch: 0 	Loss: 0.040961
Train Epoch: 1 	Loss: 0.023422
Experience (1)
Train Epoch: 0 	Loss: 0.028009
Train Epoch: 1 	Loss: 0.038673
Experience (2)
Train Epoch: 0 	Loss: 0.063915
Train Epoch: 1 	Loss: 0.040975
Experience (3)
Train Epoch: 0 	Loss: 0.059366
Train Epoch: 1 	Loss: 0.019216
Experience (4)
Train Epoch: 0 	Loss: 0.066897
Train Epoch: 1 	Loss: 0.011423


# Example: Replay Buffers

In [15]:
from avalanche.training.storage_policy import ParametricBuffer, RandomExemplarsSelectionStrategy
from types import SimpleNamespace

# scenario
benchmark = SplitMNIST(
    n_experiences=5,
    return_task_id=True,
    seed=1
)

# model
model = as_multitask(MLP(), 'classifier')
optimizer = SGD(model.parameters(), lr=0.001, momentum=0.9)
criterion = CrossEntropyLoss()

In [16]:
from types import SimpleNamespace
from avalanche.benchmarks.utils.data_loader import ReplayDataLoader

device = 'cpu'
num_epochs = 2

# AVALANCHE: init replay buffer
storage_p = ParametricBuffer(
    max_size=30,
    groupby='class',
    selection_strategy=RandomExemplarsSelectionStrategy()
)

print(f"Max buffer size: {storage_p.max_size}, current size: {len(storage_p.buffer)}")
print(f"Max buffer size: {storage_p.max_size}, current size: {len(storage_p.buffer)}")   
for exp in benchmark.train_stream:
    print(f"Experience ({exp.current_experience})")
    model.train()
    avalanche_model_adaptation(model, exp)
    reset_optimizer(optimizer, model)
    dataset = exp.dataset
    dataset = dataset.train()
 
    for epoch in range(num_epochs):
        # AVALANCHE: ReplayDataLoader to sample jointly from buffer and current data.
        dl = ReplayDataLoader(dataset, storage_p.buffer, batch_size=128)
        for x, y, t in dl:
          x, y, t = x.to(device), y.to(device), t.to(device)

          optimizer.zero_grad()
          output = model(x, t)
          loss = F.cross_entropy(output, y)
          loss.backward()
          optimizer.step()
        print('Train Epoch: {} \tLoss: {:.6f}'.format(epoch, loss.item()))
    
    # AVALANCHE: you can use a SimpleNamespace if you want to use Avalanche components with your own code.
    strategy_state = SimpleNamespace(experience=exp)
    # AVALANCHE: update replay buffer
    storage_p.update(strategy_state)
    print(f"Max buffer size: {storage_p.max_size}, current size: {len(storage_p.buffer)}")
    print(f"class targets: {list(storage_p.buffer.targets)}\n")


Max buffer size: 30, current size: 0
Max buffer size: 30, current size: 0
Experience (0)
Train Epoch: 0 	Loss: 0.040961
Train Epoch: 1 	Loss: 0.023422
Max buffer size: 30, current size: 30
class targets: [5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6]

Experience (1)
Train Epoch: 0 	Loss: 0.028027
Train Epoch: 1 	Loss: 0.038651
Max buffer size: 30, current size: 30
class targets: [5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]

Experience (2)
Train Epoch: 0 	Loss: 0.063074
Train Epoch: 1 	Loss: 0.041174
Max buffer size: 30, current size: 30
class targets: [5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8]

Experience (3)
Train Epoch: 0 	Loss: 0.059068
Train Epoch: 1 	Loss: 0.019269
Max buffer size: 30, current size: 30
class targets: [5, 5, 5, 5, 6, 6, 6, 6, 1, 1, 1, 1, 2, 2, 2, 2, 0, 0, 0, 0, 8, 8, 8, 9, 9, 9, 3, 3, 3, 3]

Experience (4)
Train Epoch: 0 	Loss: 0