Welcome to the official repository for the CLAD benchmark. The goal of CLAD is to introduce a more realistic testing bed for continual learning. We used SODA10M, an industry scale dataset for autonomous driving to create two benchmarks. CLAD-C is an online classification benchmark with natural, temporal correlated and continuous distribution shifts. CLAD-D is a domain incremental continual object detection benchmark. Below are further details, examples and installation instructions for both benchmarks.
A paper describing the benchmark in more detail, as well as a discussion on current Continual Learning benchmarks and the solutions proposed by the participants in the 2021 ICCV challenge with CLAD can be found here.
If you use this benchmark, please cite:
@article{verwimp2023clad,
title={CLAD: A realistic Continual Learning benchmark for Autonomous Driving},
author={Verwimp, Eli and Yang, Kuo and Parisot, Sarah and Hong, Lanqing and McDonagh, Steven and P{\'e}rez-Pellitero, Eduardo and De Lange, Matthias and Tuytelaars, Tinne},
journal={Neural Networks},
volume={161},
pages={659--669},
year={2023},
publisher={Elsevier}
}
CLAD is provided as a python module and depends only on pytorch and torchvision. Optionally you can also use Avalanche and Detectron2 to easily benchmark your own solutions.
Clone this GitHub repo:
git clone git@github.com:VerwimpEli/CLAD.git
Add the installation directory to your python path. On Linux:
export PYTHONPATH=$PYTHONPATH:[clad_installation_folder]
(Optional) Install Avalanche version 0.2.0. Their are some breaking changes in later versions, I will come to them at some point.
pip install avalanche-lib[detection]==0.2.0
Note: there might be an import issue regarding Kinectis400 with this version of Avalanche. Just comment it out, we don't need it
(Optional) Install Detectron2, follow the instructions here for your Pytorch and Cuda installations.
CLAD-C is a classification benchmark for continual learning from a stream of chronologically ordered images.
A chronological stream induces continous, yet realistic distribution
shifts, both in the label distribution and the domain distributions. The image below gives an overview of the
distribution changes throughout the stream. The x-axis displays the time, along which the images are given.
An example of a distribution shift happens between
As an example, these are three subsequent batches if the batch size is set to 10. Note the domination of the cars and the multiple appearances of the same images from slightly different angles.
The goal of the challenge is to maximize
where
The original challenge at ICCV had some restrictions, which we believe are still worth considering now. Of course, if there's a good reason to deviate from them, there's no reason for not doing so now. Below are the original rules, order by our perceived importance at this point.
- Maximal replay memory size is 1000 samples
- The data should be trained as a stream, i.e. no repetetions of data that's not in the memory.
- Maximum batch size is 10
- No computationally heavy operations are allowed between training and testing (i.e. ideally the model should almost always be directly usable for predictions).
- Maximum number of model parameters are 105% those of a typical Resnet50
The method get_cladc_train
returns a sequence of training sets (which is actually just one large stream of data),
and should be trained once, in the returned
order. After each set, the model should be tested. get_cladc_val
or get_cladc_test
returns a single validation
or test set. For more elaborate examples, both with and without Avalanche, see here.
import clad
import torch
import torchvision.models
from torch.nn import Linear
from torch.utils.data import DataLoader
model = torchvision.models.resnet18(weights=False)
model.fc = Linear(model.fc.in_features, 7, bias=True)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
train_sets = clad.get_cladc_train('../../data')
val_set = clad.get_cladc_val('../../data')
val_loader = DataLoader(val_set, batch_size=10)
tester = clad.AMCAtester(val_loader, model)
for t, ts in enumerate(train_sets):
print(f'Training task {t}')
loader = DataLoader(ts, batch_size=10, shuffle=False)
for data, target in loader:
optimizer.zero_grad()
output = model(data)
loss = torch.nn.functional.cross_entropy(output, target)
loss.backward()
optimizer.step()
print('testing....')
tester.evaluate()
tester.summarize(print_results=True)
The graph below shows the results of finetuning and training wiht a rehearsal memory while oversampling rare classes from that memory on the CLAD-C benchmark. The model used is a ResNet50, for more details, see the CLAD paper.
To be expected soon: some baseline models and the results of the ICCV '21 challenge on this benchmark.
CLAD-D is a domain incremental continual object detection benchmark. Images from the SODA10M dataset are divided into four tasks, which should be learned incrementally by a machine learning model, without accessing the past data. The final performance of the model will be the average mAP over all tasks. The 4 tasks are defined as:
Task 1: clear weather - daytime - citystreet (4470 - 497 - 2433)
Task 2: clear weather - daytime - highway (1329 - 148 - 3126)
Task 3: night (1480 - 165 - 2968)
Task 4: rainy - daytime (524 - 59 - 1442)
Where the numbers between brackets indicate respectively the number of training, validation and test images per task. Below are some examples images of each task, with the corresponding bounding box annotations. The domain gaps in this benchmark are less harsh than those in domain-incrmental learning, yet still not trivial to overcome.
CLAD-D is evaluated using the average mAP @IOU = 0.5, as in VOC Pascal. We then average this over all four tasks to give each task equal weight.
The original challenge had some restirctions, some of which we believe are still worth considering. Of course, if there's a good reason to deviate from them, there's no reason for not doing so now. Below are the original rules, ordered by our perceived importance at this point.
- Maximal rehearsal memory of 250 samples.
- Only pretraining on Microsoft COCO and/or ImageNet1K.
To get the Avalanche-style benchmark, simply use the get_cladd_avalanche
method, which will create an Avalanche
benchmark in the usual format. Then your strategy can be created using the ObjectDetectionTemplate
, with optional
training and testing plugins.
import clad
import logging
import torch
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from avalanche.training.supervised.naive_object_detection import ObjectDetectionTemplate
from avalanche.evaluation.metrics import loss_metrics
from avalanche.evaluation.metrics.detection import DetectionMetrics
from avalanche.logging import InteractiveLogger
from avalanche.training.plugins import EvaluationPlugin
logging.basicConfig(level=logging.NOTSET)
# Get benchmark and models
benchmark = clad.get_cladd_avalanche(root='../../data')
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True).to('cuda')
# Update model and create optimizer
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, 6+1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.005, momentum=0.9, weight_decay=0.0005)
# Create Avalanche strategy
cl_strategy = ObjectDetectionTemplate(
model=model,
optimizer=optimizer,
train_mb_size=5,
train_epochs=1,
eval_mb_size=5,
device='cuda',
evaluator=EvaluationPlugin(
loss_metrics(epoch_running=True),
DetectionMetrics(default_to_coco=True),
loggers=[InteractiveLogger()],
),
)
# Train and test loop
for i, experience in enumerate(benchmark.train_stream):
cl_strategy.train(experience, num_workers=4)
cl_strategy.eval(benchmark.test_stream, num_workers=4)
To use Detectron2 to train CLAD-D, you only have to call register_cladd_detectron
, which will register the CLAD-D
datasets names to the DatasetCatalog of Detectron2. Then you can just use the names of the datasets in your config
files. Detectron2 doesn't support training multiple datasets sequentially, although you can probably write a script to
fix that.
import clad
from detectron2.model_zoo import model_zoo
from detectron2.engine import DefaultTrainer, DefaultPredictor
from detectron2.config import get_cfg
from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from detectron2.data import build_detection_test_loader
# This registers the CLAD-D datasets in Detectron2. They're accessible with the names cladd_T[i]_[split], with
# i the task-ID and [split] one of train/val/test.
clad.register_cladd_detectron(root='../../data')
cfg = get_cfg()
# Loads basic config file and then merges with our config file
cfg.merge_from_file(model_zoo.get_config_file("PascalVOC-Detection/faster_rcnn_R_50_C4.yaml"))
cfg.merge_from_file('./examples/cladd_detectron_ex.yaml')
trainer = DefaultTrainer(cfg)
trainer.resume_or_load()
trainer.train()
predictor = DefaultPredictor(cfg)
for test_dataset in cfg.DATASETS.TEST:
evaluator = COCOEvaluator(test_dataset, output_dir=f"{cfg.OUTPUT_DIR}/{test_dataset}")
val_loader = build_detection_test_loader(cfg, test_dataset)
print(inference_on_dataset(predictor.model, val_loader, evaluator))
The graph below shows the per class fine-tune mAP's at IOU 0.5 when finetuning a Faster-RCNN model with a ResNet50 backbone. While the model doesn't catastrophically forget, training on a different domain distribution does harm the performance on previous domains. For more details, see the CLAD paper.
To be expected: more baselines and the results of in the ICCV challenge.