# Avalanche Standalone

this notebook shows you how to use Avalanche components inside your own training loops. We will see how to use:
- Benchmarks
- Dynamic and MultiTask Models
- Replay methods

## 🤝 Run it on Google Colab

You can run _this chapter_ and play with it on Google Colaboratory: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AntonioCarta/avalanche-demo/blob/main/avl_standalone.ipynb)

https://github.com/AntonioCarta/avalanche-demo/blob/main/avl_standalone.ipynb

## Install Avalanche

First, let's install Avalanche. You can skip this step if you have installed it already.

In [None]:
!pip install avalanche-lib==0.5.0

## Benchmarks

you can import benchmarks from the `avl.benchmarks` module. We are going to use `SplitMNIST`, the class-incremental MNIST stream.

In [3]:
from avalanche.benchmarks import SplitMNIST

In [5]:
benchmark = SplitMNIST(
    n_experiences=5,
    return_task_id=False
)
train_stream = benchmark.train_stream
test_stream = benchmark.test_stream

print(f"len train: {len(train_stream)}, len test: {len(test_stream)}")

len train: 5, len test: 5


In [6]:
for exp in train_stream:
    eid = exp.current_experience
    curr_classes = exp.classes_in_this_experience
    tid = exp.task_label
    print(f"({eid}) - T{tid}, classes={curr_classes}")

(0) - T0, classes=[9, 3]
(1) - T0, classes=[2, 5]
(2) - T0, classes=[0, 7]
(3) - T0, classes=[4, 6]
(4) - T0, classes=[8, 1]


Notice that Avalanche does not order classes.

 You can create a task-aware benchmark by setting `return_task_id=True`.

In [7]:
benchmark = SplitMNIST(
    n_experiences=5,
    return_task_id=True
)
train_stream = benchmark.train_stream
test_stream = benchmark.test_stream

print(f"len train: {len(train_stream)}, len test: {len(test_stream)}")

len train: 5, len test: 5


In [8]:
for exp in train_stream:
    eid = exp.current_experience
    curr_classes = exp.classes_in_this_experience
    tid = exp.task_label
    print(f"({eid}) - T{tid}, classes={curr_classes}")

(0) - T0, classes=[3, 5]
(1) - T1, classes=[0, 6]
(2) - T2, classes=[2, 7]
(3) - T3, classes=[8, 4]
(4) - T4, classes=[1, 9]


## Multi-Task Model

you can use pytorch models

In [9]:
import torch
from torch import nn

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            nn.Linear(784, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU()
        )
        self.classifier = nn.Linear(512, 10)
    
    def forward(self, x, **kwargs):
        x = x.reshape(x.shape[0], -1)
        x = self.features(x)
        return self.classifier(x)
    
model = MLP()
model(torch.randn(32, 784))
print(model)

MLP(
  (features): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
  )
  (classifier): Linear(in_features=512, out_features=10, bias=True)
)


MLP is ignoring task_labels. You can use Avalanche MultiHeadClassifier to split the output layer by task id.

In [10]:
x = torch.randn(32, 784)
t = torch.randint(low=0, high=4, size=(32,))
model(x, task_labels=t)

tensor([[-0.0281, -0.0436, -0.0610,  0.0595, -0.0422,  0.0715, -0.0210, -0.1347,
          0.1593,  0.1408],
        [-0.0713, -0.1221, -0.0039, -0.1138,  0.0577,  0.1070,  0.0991, -0.0745,
          0.0892,  0.1129],
        [-0.0980, -0.1482, -0.0773, -0.0455, -0.0408,  0.0084, -0.0562,  0.0264,
          0.0495,  0.0881],
        [-0.1388, -0.1217,  0.1134,  0.0038, -0.0263,  0.0118, -0.0708, -0.1494,
          0.0318,  0.1721],
        [-0.1134, -0.0184, -0.0700,  0.0270,  0.0119,  0.0987, -0.1898, -0.0032,
          0.0830,  0.1565],
        [-0.1071, -0.0931, -0.0257,  0.1175,  0.0004,  0.1834, -0.0406, -0.0176,
          0.1134,  0.1953],
        [-0.1933,  0.0057, -0.0682, -0.0544,  0.0223,  0.0401, -0.0104, -0.1890,
          0.0552,  0.1008],
        [-0.0128, -0.0296, -0.1234,  0.0269, -0.0506,  0.0852, -0.0787, -0.0647,
          0.0424,  0.0081],
        [-0.1174, -0.0601,  0.0206,  0.1607,  0.1086,  0.0784, -0.1163, -0.0462,
          0.0632,  0.0500],
        [-0.0510, -

In [12]:
from avalanche.models import as_multitask

model_mt = as_multitask(MLP(), 'classifier')
print(model_mt)

MultiTaskDecorator(
  (model): MLP(
    (features): Sequential(
      (0): Linear(in_features=784, out_features=512, bias=True)
      (1): ReLU()
      (2): Linear(in_features=512, out_features=512, bias=True)
      (3): ReLU()
    )
    (classifier): Sequential()
  )
  (classifier): MultiHeadClassifier(
    (classifiers): ModuleDict(
      (0): IncrementalClassifier(
        (classifier): Linear(in_features=512, out_features=10, bias=True)
      )
    )
  )
)


In [13]:
t

tensor([3, 3, 0, 1, 0, 1, 1, 0, 2, 1, 2, 0, 3, 3, 1, 3, 3, 3, 2, 3, 1, 0, 2, 2,
        0, 1, 3, 3, 1, 1, 1, 3])

The model still doesn't know about all the tasks because it has neven seen them.

In [14]:
model_mt(x, task_labels=t)

KeyError: '1'

You have to adapt the model

In [15]:
from avalanche.models.utils import avalanche_model_adaptation

for exp in benchmark.train_stream:
    avalanche_model_adaptation(model_mt, exp)

print(model_mt)

MultiTaskDecorator(
  (model): MLP(
    (features): Sequential(
      (0): Linear(in_features=784, out_features=512, bias=True)
      (1): ReLU()
      (2): Linear(in_features=512, out_features=512, bias=True)
      (3): ReLU()
    )
    (classifier): Sequential()
  )
  (classifier): MultiHeadClassifier(
    (classifiers): ModuleDict(
      (0): IncrementalClassifier(
        (classifier): Linear(in_features=512, out_features=10, bias=True)
      )
      (1): IncrementalClassifier(
        (classifier): Linear(in_features=512, out_features=10, bias=True)
      )
      (2): IncrementalClassifier(
        (classifier): Linear(in_features=512, out_features=10, bias=True)
      )
      (3): IncrementalClassifier(
        (classifier): Linear(in_features=512, out_features=10, bias=True)
      )
      (4): IncrementalClassifier(
        (classifier): Linear(in_features=512, out_features=10, bias=True)
      )
    )
  )
)


Now the model has been adapted to all the tasks. A separate head for each task is available for classification.

In [16]:
model_mt(x, task_labels=t)

tensor([[-1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03,  2.9020e-02,
         -1.0000e+03, -1.0000e+03, -1.0000e+03, -5.3654e-02, -1.0000e+03],
        [-1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03,  2.3616e-02,
         -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0296e-02, -1.0000e+03],
        [-1.0000e+03, -1.0000e+03, -1.0000e+03,  7.9147e-02, -1.0000e+03,
          5.7862e-02, -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03],
        [ 3.6879e-02, -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03,
         -1.0000e+03,  8.3463e-02, -1.0000e+03, -1.0000e+03, -1.0000e+03],
        [-1.0000e+03, -1.0000e+03, -1.0000e+03,  1.5846e-01, -1.0000e+03,
          1.1499e-01, -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03],
        [-8.2920e-02, -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03,
         -1.0000e+03, -1.5061e-01, -1.0000e+03, -1.0000e+03, -1.0000e+03],
        [ 7.6269e-02, -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03,
         -1.0000e+03,  1.0547e-0

## Training loop - Finetuning (with Multi-head model)

We can train the model continually by iterating over the `train_stream` provided by the scenario.

In [17]:
from torch.optim import SGD
from torch.nn import CrossEntropyLoss
from avalanche.models import SimpleMLP
from avalanche.training import Naive
from torch.utils.data import DataLoader
import torch.nn.functional as F

# scenario
benchmark = SplitMNIST(
    n_experiences=5,
    return_task_id=True,
    seed=1
)

# model
model = as_multitask(MLP(), 'classifier')
optimizer = SGD(model.parameters(), lr=0.001, momentum=0.9)
criterion = CrossEntropyLoss()

In [14]:
from avalanche.models.utils import avalanche_model_adaptation
from avalanche.models.dynamic_optimizers import reset_optimizer

device = 'cpu'
num_epochs = 2

for exp in benchmark.train_stream:
    print(f"Experience ({exp.current_experience})")
    model.train()

    # AVALANCHE: model adaptation step adds new parameters
    # In our model: adds a new head for each task
    avalanche_model_adaptation(model, exp)
    # AVALANCHE: We just added parameters to the model. We must also update the optimizer
    reset_optimizer(optimizer, model)
    
    dataset = exp.dataset
    dataset = dataset.train()  # AVALANCHE: activate correct transformation group
    
    for epoch in range(num_epochs):
        dl = DataLoader(dataset, batch_size=128)
        for x, y, t in dl:
          x, y, t = x.to(device), y.to(device), t.to(device)

          optimizer.zero_grad()
          # AVALANCHE: MultiTaskModels need task labels
          output = model(x, t)
          loss = F.cross_entropy(output, y)
          loss.backward()
          optimizer.step()
        print('Train Epoch: {} \tLoss: {:.6f}'.format(epoch, loss.item()))


Experience (0)
Train Epoch: 0 	Loss: 0.040961
Train Epoch: 1 	Loss: 0.023422
Experience (1)
Train Epoch: 0 	Loss: 0.028009
Train Epoch: 1 	Loss: 0.038673
Experience (2)
Train Epoch: 0 	Loss: 0.063915
Train Epoch: 1 	Loss: 0.040975
Experience (3)
Train Epoch: 0 	Loss: 0.059366
Train Epoch: 1 	Loss: 0.019216
Experience (4)
Train Epoch: 0 	Loss: 0.066897
Train Epoch: 1 	Loss: 0.011423


# Example: Replay Buffers

In [15]:
from avalanche.training.storage_policy import ParametricBuffer, RandomExemplarsSelectionStrategy
from types import SimpleNamespace

# scenario
benchmark = SplitMNIST(
    n_experiences=5,
    return_task_id=True,
    seed=1
)

# model
model = as_multitask(MLP(), 'classifier')
optimizer = SGD(model.parameters(), lr=0.001, momentum=0.9)
criterion = CrossEntropyLoss()

In [16]:
from types import SimpleNamespace
from avalanche.benchmarks.utils.data_loader import ReplayDataLoader

device = 'cpu'
num_epochs = 2

# AVALANCHE: init replay buffer
storage_p = ParametricBuffer(
    max_size=30,
    groupby='class',
    selection_strategy=RandomExemplarsSelectionStrategy()
)

print(f"Max buffer size: {storage_p.max_size}, current size: {len(storage_p.buffer)}")
print(f"Max buffer size: {storage_p.max_size}, current size: {len(storage_p.buffer)}")   
for exp in benchmark.train_stream:
    print(f"Experience ({exp.current_experience})")
    model.train()
    avalanche_model_adaptation(model, exp)
    reset_optimizer(optimizer, model)
    dataset = exp.dataset
    dataset = dataset.train()
 
    for epoch in range(num_epochs):
        # AVALANCHE: ReplayDataLoader to sample jointly from buffer and current data.
        dl = ReplayDataLoader(dataset, storage_p.buffer, batch_size=128)
        for x, y, t in dl:
          x, y, t = x.to(device), y.to(device), t.to(device)

          optimizer.zero_grad()
          output = model(x, t)
          loss = F.cross_entropy(output, y)
          loss.backward()
          optimizer.step()
        print('Train Epoch: {} \tLoss: {:.6f}'.format(epoch, loss.item()))
    
    # AVALANCHE: you can use a SimpleNamespace if you want to use Avalanche components with your own code.
    strategy_state = SimpleNamespace(experience=exp)
    # AVALANCHE: update replay buffer
    storage_p.update(strategy_state)
    print(f"Max buffer size: {storage_p.max_size}, current size: {len(storage_p.buffer)}")
    print(f"class targets: {list(storage_p.buffer.targets)}\n")


Max buffer size: 30, current size: 0
Max buffer size: 30, current size: 0
Experience (0)
Train Epoch: 0 	Loss: 0.040961
Train Epoch: 1 	Loss: 0.023422
Max buffer size: 30, current size: 30
class targets: [5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6]

Experience (1)
Train Epoch: 0 	Loss: 0.028027
Train Epoch: 1 	Loss: 0.038651
Max buffer size: 30, current size: 30
class targets: [5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]

Experience (2)
Train Epoch: 0 	Loss: 0.063074
Train Epoch: 1 	Loss: 0.041174
Max buffer size: 30, current size: 30
class targets: [5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8]

Experience (3)
Train Epoch: 0 	Loss: 0.059068
Train Epoch: 1 	Loss: 0.019269
Max buffer size: 30, current size: 30
class targets: [5, 5, 5, 5, 6, 6, 6, 6, 1, 1, 1, 1, 2, 2, 2, 2, 0, 0, 0, 0, 8, 8, 8, 9, 9, 9, 3, 3, 3, 3]

Experience (4)
Train Epoch: 0 	Loss: 0