# Avalanche Standalone

this notebook shows you how to use Avalanche components inside your own training loops. We will see how to use:
- Benchmarks
- Dynamic and MultiTask Models
- Replay methods

## 🤝 Run it on Google Colab

You can run _this chapter_ and play with it on Google Colaboratory: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AntonioCarta/avalanche-demo/blob/main/avl_standalone.ipynb)

https://github.com/AntonioCarta/avalanche-demo/blob/main/avl_standalone.ipynb

## Install Avalanche

First, let's install Avalanche. You can skip this step if you have installed it already.

In [None]:
!pip install avalanche-lib==0.3.1

## Benchmarks

you can import benchmarks from the `avl.benchmarks` module. We are going to use `SplitMNIST`, the class-incremental MNIST stream.

In [16]:
from avalanche.benchmarks import SplitMNIST

In [17]:
benchmark = SplitMNIST(
    n_experiences=5,
    return_task_id=False
)
train_stream = benchmark.train_stream
test_stream = benchmark.test_stream

print(f"len train: {len(train_stream)}, len test: {len(test_stream)}")

len train: 5, len test: 5


In [18]:
for exp in train_stream:
    eid = exp.current_experience
    curr_classes = exp.classes_in_this_experience
    tid = exp.task_label
    print(f"({eid}) - T{tid}, classes={curr_classes}")

(0) - T0, classes=[3, 5]
(1) - T0, classes=[8, 7]
(2) - T0, classes=[1, 2]
(3) - T0, classes=[0, 4]
(4) - T0, classes=[9, 6]


Notice that Avalanche does not order classes.

 You can create a task-aware benchmark by setting `return_task_id=True`.

In [19]:
benchmark = SplitMNIST(
    n_experiences=5,
    return_task_id=True
)
train_stream = benchmark.train_stream
test_stream = benchmark.test_stream

print(f"len train: {len(train_stream)}, len test: {len(test_stream)}")

len train: 5, len test: 5


In [20]:
for exp in train_stream:
    eid = exp.current_experience
    curr_classes = exp.classes_in_this_experience
    tid = exp.task_label
    print(f"({eid}) - T{tid}, classes={curr_classes}")

(0) - T0, classes=[0, 6]
(1) - T1, classes=[1, 3]
(2) - T2, classes=[2, 5]
(3) - T3, classes=[9, 4]
(4) - T4, classes=[8, 7]


## Multi-Task Model

you can use pytorch models

In [21]:
import torch
from torch import nn

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            nn.Linear(784, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU()
        )
        self.classifier = nn.Linear(512, 10)
    
    def forward(self, x, **kwargs):
        x = x.reshape(x.shape[0], -1)
        x = self.features(x)
        return self.classifier(x)
    
model = MLP()
model(torch.randn(32, 784))
print(model)

MLP(
  (features): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
  )
  (classifier): Linear(in_features=512, out_features=10, bias=True)
)


MLP is ignoring task_labels. You can use Avalanche MultiHeadClassifier to split the output layer by task id.

In [22]:
x = torch.randn(32, 784)
t = torch.randint(low=0, high=4, size=(32,))
model(x, task_labels=t)

tensor([[-6.4492e-02, -1.4380e-02, -9.4812e-02,  1.0382e-01, -6.8324e-02,
         -2.9779e-02, -6.6003e-03,  1.0679e-01, -3.6159e-04,  8.6781e-02],
        [-6.0239e-02,  2.0708e-01, -6.8435e-02,  1.3024e-01, -8.3926e-02,
          1.1956e-01,  9.2401e-02,  7.5060e-03,  7.6041e-02,  1.5360e-01],
        [-4.3249e-02, -1.6782e-03,  1.8559e-02,  2.1451e-01,  1.9979e-02,
          1.1966e-01,  3.8635e-03,  1.1762e-01,  9.6107e-02,  1.8521e-01],
        [-4.0077e-02,  1.1503e-02, -1.1938e-01,  5.5516e-02, -8.8268e-02,
          8.8493e-02,  6.3556e-02, -1.1909e-02,  7.1932e-02,  1.0055e-01],
        [-9.6918e-02, -3.9918e-02, -4.4975e-02,  8.0528e-02, -6.5870e-02,
          1.4336e-01,  3.1440e-02,  1.2471e-02,  9.2671e-02,  9.8498e-02],
        [-2.3040e-02,  5.0516e-02, -9.0975e-02,  2.4542e-02,  1.5074e-03,
          7.6702e-02,  1.5388e-01,  1.0248e-01,  2.0437e-01,  1.4489e-01],
        [ 3.4005e-02, -4.0677e-02, -6.5232e-02,  4.5179e-02,  6.2408e-02,
          6.8461e-02,  7.8795e-0

In [23]:
from avalanche.models import as_multitask

model_mt = as_multitask(MLP(), 'classifier')
print(model_mt)

MultiTaskDecorator(
  (model): MLP(
    (features): Sequential(
      (0): Linear(in_features=784, out_features=512, bias=True)
      (1): ReLU()
      (2): Linear(in_features=512, out_features=512, bias=True)
      (3): ReLU()
    )
    (classifier): Sequential()
  )
  (classifier): MultiHeadClassifier(
    (classifiers): ModuleDict(
      (0): IncrementalClassifier(
        (classifier): Linear(in_features=512, out_features=10, bias=True)
      )
    )
  )
)


In [25]:
t

tensor([1, 2, 2, 2, 3, 1, 1, 2, 3, 1, 2, 1, 1, 1, 0, 1, 2, 0, 0, 3, 2, 0, 1, 1,
        1, 3, 0, 3, 1, 0, 0, 2])

The model still doesn't know about all the tasks because it has neven seen them.

In [24]:
model_mt(x, task_labels=t)

KeyError: '1'

You have to adapt the model

In [27]:
from avalanche.models.utils import avalanche_model_adaptation

for exp in benchmark.train_stream:
    avalanche_model_adaptation(model_mt, exp)

print(model_mt)

MultiTaskDecorator(
  (model): MLP(
    (features): Sequential(
      (0): Linear(in_features=784, out_features=512, bias=True)
      (1): ReLU()
      (2): Linear(in_features=512, out_features=512, bias=True)
      (3): ReLU()
    )
    (classifier): Sequential()
  )
  (classifier): MultiHeadClassifier(
    (classifiers): ModuleDict(
      (0): IncrementalClassifier(
        (classifier): Linear(in_features=512, out_features=10, bias=True)
      )
      (1): IncrementalClassifier(
        (classifier): Linear(in_features=512, out_features=10, bias=True)
      )
      (2): IncrementalClassifier(
        (classifier): Linear(in_features=512, out_features=10, bias=True)
      )
      (3): IncrementalClassifier(
        (classifier): Linear(in_features=512, out_features=10, bias=True)
      )
      (4): IncrementalClassifier(
        (classifier): Linear(in_features=512, out_features=10, bias=True)
      )
    )
  )
)


Now the model has been adapted to all the tasks. A separate head for each task is available for classification.

In [28]:
model_mt(x, task_labels=t)

tensor([[-1.0000e+03,  1.0238e-01, -1.0000e+03,  7.3385e-04, -1.0000e+03,
         -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03],
        [-1.0000e+03, -1.0000e+03, -3.9755e-02, -1.0000e+03, -1.0000e+03,
          8.0407e-02, -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03],
        [-1.0000e+03, -1.0000e+03,  6.7919e-02, -1.0000e+03, -1.0000e+03,
         -2.1614e-03, -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03],
        [-1.0000e+03, -1.0000e+03, -2.8848e-02, -1.0000e+03, -1.0000e+03,
          3.4406e-02, -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03],
        [-1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03,  1.4304e-01,
         -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03,  3.4524e-02],
        [-1.0000e+03,  2.4361e-02, -1.0000e+03, -1.2844e-02, -1.0000e+03,
         -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03, -1.0000e+03],
        [-1.0000e+03,  1.4934e-01, -1.0000e+03, -2.3455e-02, -1.0000e+03,
         -1.0000e+03, -1.0000e+0

## Training loop - Finetuning (with Multi-head model)

We can train the model continually by iterating over the `train_stream` provided by the scenario.

In [29]:
from torch.optim import SGD
from torch.nn import CrossEntropyLoss
from avalanche.models import SimpleMLP
from avalanche.training import Naive
from torch.utils.data import DataLoader
import torch.nn.functional as F

# scenario
benchmark = SplitMNIST(
    n_experiences=5,
    return_task_id=True,
    seed=1
)

# model
model = as_multitask(MLP(), 'classifier')
optimizer = SGD(model.parameters(), lr=0.001, momentum=0.9)
criterion = CrossEntropyLoss()

In [30]:
from avalanche.models.utils import avalanche_model_adaptation
from avalanche.models.dynamic_optimizers import reset_optimizer

device = 'cpu'
num_epochs = 2

for exp in benchmark.train_stream:
    print(f"Experience ({exp.current_experience})")
    model.train()

    # AVALANCHE: model adaptation step adds new parameters
    # In our model: adds a new head for each task
    avalanche_model_adaptation(model, exp)
    # AVALANCHE: We just added parameters to the model. We must also update the optimizer
    reset_optimizer(optimizer, model)
    
    dataset = exp.dataset
    dataset = dataset.train()  # AVALANCHE: activate correct transformation group
    
    for epoch in range(num_epochs):
        dl = DataLoader(dataset, batch_size=128)
        for x, y, t in dl:
          x, y, t = x.to(device), y.to(device), t.to(device)

          optimizer.zero_grad()
          # AVALANCHE: MultiTaskModels need task labels
          output = model(x, t)
          loss = F.cross_entropy(output, y)
          loss.backward()
          optimizer.step()
        print('Train Epoch: {} \tLoss: {:.6f}'.format(epoch, loss.item()))


Experience (0)
Train Epoch: 0 	Loss: 0.219103
Train Epoch: 1 	Loss: 0.205423
Experience (1)
Train Epoch: 0 	Loss: 0.100193
Train Epoch: 1 	Loss: 0.044830
Experience (2)
Train Epoch: 0 	Loss: 0.089650
Train Epoch: 1 	Loss: 0.037385
Experience (3)
Train Epoch: 0 	Loss: 0.077952
Train Epoch: 1 	Loss: 0.032585
Experience (4)
Train Epoch: 0 	Loss: 0.099708
Train Epoch: 1 	Loss: 0.101100


# Example: Replay Buffers

In [31]:
from avalanche.training.storage_policy import ParametricBuffer, RandomExemplarsSelectionStrategy
from types import SimpleNamespace

# scenario
benchmark = SplitMNIST(
    n_experiences=5,
    return_task_id=True,
    seed=1
)

# model
model = as_multitask(MLP(), 'classifier')
optimizer = SGD(model.parameters(), lr=0.001, momentum=0.9)
criterion = CrossEntropyLoss()

In [32]:
from types import SimpleNamespace
from avalanche.benchmarks.utils.data_loader import ReplayDataLoader

device = 'cpu'
num_epochs = 2

# AVALANCHE: init replay buffer
storage_p = ParametricBuffer(
    max_size=30,
    groupby='class',
    selection_strategy=RandomExemplarsSelectionStrategy()
)

print(f"Max buffer size: {storage_p.max_size}, current size: {len(storage_p.buffer)}")
print(f"Max buffer size: {storage_p.max_size}, current size: {len(storage_p.buffer)}")   
for exp in benchmark.train_stream:
    print(f"Experience ({exp.current_experience})")
    model.train()
    avalanche_model_adaptation(model, exp)
    reset_optimizer(optimizer, model)
    dataset = exp.dataset
    dataset = dataset.train()
 
    for epoch in range(num_epochs):
        # AVALANCHE: ReplayDataLoader to sample jointly from buffer and current data.
        dl = ReplayDataLoader(dataset, storage_p.buffer, batch_size=128)
        for x, y, t in dl:
          x, y, t = x.to(device), y.to(device), t.to(device)

          optimizer.zero_grad()
          output = model(x, t)
          loss = F.cross_entropy(output, y)
          loss.backward()
          optimizer.step()
        print('Train Epoch: {} \tLoss: {:.6f}'.format(epoch, loss.item()))
    
    # AVALANCHE: you can use a SimpleNamespace if you want to use Avalanche components with your own code.
    strategy_state = SimpleNamespace(experience=exp)
    # AVALANCHE: update replay buffer
    storage_p.update(strategy_state)
    print(f"Max buffer size: {storage_p.max_size}, current size: {len(storage_p.buffer)}")
    print(f"class targets: {list(storage_p.buffer.targets)}\n")


Max buffer size: 30, current size: 0
Max buffer size: 30, current size: 0
Experience (0)
Train Epoch: 0 	Loss: 0.219103
Train Epoch: 1 	Loss: 0.205423
Max buffer size: 30, current size: 30
class targets: [5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6]

Experience (1)
Train Epoch: 0 	Loss: 0.097878
Train Epoch: 1 	Loss: 0.042620
Max buffer size: 30, current size: 30
class targets: [5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]

Experience (2)
Train Epoch: 0 	Loss: 0.107203
Train Epoch: 1 	Loss: 0.046786
Max buffer size: 30, current size: 30
class targets: [5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8]

Experience (3)
Train Epoch: 0 	Loss: 0.079960
Train Epoch: 1 	Loss: 0.032946
Max buffer size: 30, current size: 30
class targets: [5, 5, 5, 5, 6, 6, 6, 6, 1, 1, 1, 1, 2, 2, 2, 2, 0, 0, 0, 0, 8, 8, 8, 9, 9, 9, 3, 3, 3, 3]

Experience (4)
Train Epoch: 0 	Loss: 0