CLAD

A Continual Learning benchmark for Autonomous Driving

Welcome to the official repository for the CLAD benchmark. The goal of CLAD is to introduce a more realistic testing bed for continual learning. We used SODA10M, an industry scale dataset for autonomous driving to create two benchmarks. CLAD-C is an online classification benchmark with natural, temporal correlated and continuous distribution shifts. CLAD-D is a domain incremental continual object detection benchmark. Below are further details, examples and installation instructions for both benchmarks.

Paper

A paper describing the benchmark in more detail, as well as a discussion on current Continual Learning benchmarks and the solutions proposed by the participants in the 2021 ICCV challenge with CLAD can be found here.

If you use this benchmark, please cite:

@article{verwimp2023clad,
  title={CLAD: A realistic Continual Learning benchmark for Autonomous Driving},
  author={Verwimp, Eli and Yang, Kuo and Parisot, Sarah and Hong, Lanqing and McDonagh, Steven and P{\'e}rez-Pellitero, Eduardo and De Lange, Matthias and Tuytelaars, Tinne},
  journal={Neural Networks},
  volume={161},
  pages={659--669},
  year={2023},
  publisher={Elsevier}
}

Installation

CLAD is provided as a python module and depends only on pytorch and torchvision. Optionally you can also use Avalanche and Detectron2 to easily benchmark your own solutions.

Clone this GitHub repo:

git clone git@github.com:VerwimpEli/CLAD.git

Add the installation directory to your python path. On Linux:

export PYTHONPATH=$PYTHONPATH:[clad_installation_folder]

(Optional) Install Avalanche version 0.2.0. Their are some breaking changes in later versions, I will come to them at some point.

pip install avalanche-lib[detection]==0.2.0

Note: there might be an import issue regarding Kinectis400 with this version of Avalanche. Just comment it out, we don't need it

(Optional) Install Detectron2, follow the instructions here for your Pytorch and Cuda installations.

CLAD-C

Benchmark introduction

CLAD-C is a classification benchmark for continual learning from a stream of chronologically ordered images. A chronological stream induces continous, yet realistic distribution shifts, both in the label distribution and the domain distributions. The image below gives an overview of the distribution changes throughout the stream. The x-axis displays the time, along which the images are given. An example of a distribution shift happens between $T_1$ and $T_2$, which is during the night. If you look at the classes that are present during this period, you'll see that there's almost no pedestrians and cyclist left. A similar thing happens during the other night, or when the car is on the highway. Also, the tricycle is most frequent in Guangzhou, not showing up much in the other cities. Beyond this, there are much more frequent but smaller distribution shifts not clearly visible in this plot.

As an example, these are three subsequent batches if the batch size is set to 10. Note the domination of the cars and the multiple appearances of the same images from slightly different angles.

Evaluation

The goal of the challenge is to maximize $AMCA$, or Average Mean Class Accuracy. This is the mean accuracy over all classes, averaged at different points during the datastream. We chose this metric because of the high class imbalance in the datastream and such that each class is equally important. We calculate this mean accuracy at different points during the stream, since the continual learner should be resistent to distributions shifts which isn't tested if you only test at the end of the stream. Somewhat arbitrary, we chose the switches between day and night as testing points (the $T_i$ in the plot above). This is because we noted that at these points naively trainig is most likely to have failed. Summarized, the metric we use in this challenge is:

$$ \begin{equation} AMCA = \frac{1}{T} \sum_{t} \frac{1}{C} \sum_c a_{c, t} \end{equation} $$

where $T$ are number of testing points and $C$ is the number of classes.

Original Challenge Rules

The original challenge at ICCV had some restrictions, which we believe are still worth considering now. Of course, if there's a good reason to deviate from them, there's no reason for not doing so now. Below are the original rules, order by our perceived importance at this point.

Maximal replay memory size is 1000 samples
The data should be trained as a stream, i.e. no repetetions of data that's not in the memory.
Maximum batch size is 10
No computationally heavy operations are allowed between training and testing (i.e. ideally the model should almost always be directly usable for predictions).
Maximum number of model parameters are 105% those of a typical Resnet50

Minimal example

The method get_cladc_train returns a sequence of training sets (which is actually just one large stream of data), and should be trained once, in the returned order. After each set, the model should be tested. get_cladc_val or get_cladc_test returns a single validation or test set. For more elaborate examples, both with and without Avalanche, see here.

import clad

import torch
import torchvision.models
from torch.nn import Linear
from torch.utils.data import DataLoader

model = torchvision.models.resnet18(weights=False)
model.fc = Linear(model.fc.in_features, 7, bias=True)

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

train_sets = clad.get_cladc_train('../../data')

val_set = clad.get_cladc_val('../../data')
val_loader = DataLoader(val_set, batch_size=10)
tester = clad.AMCAtester(val_loader, model)

for t, ts in enumerate(train_sets):
    print(f'Training task {t}')
    loader = DataLoader(ts, batch_size=10, shuffle=False)
    for data, target in loader:
        optimizer.zero_grad()
        output = model(data)
        loss = torch.nn.functional.cross_entropy(output, target)
        loss.backward()
        optimizer.step()

    print('testing....')
    tester.evaluate()
    tester.summarize(print_results=True)

Results

The graph below shows the results of finetuning and training wiht a rehearsal memory while oversampling rare classes from that memory on the CLAD-C benchmark. The model used is a ResNet50, for more details, see the CLAD paper.

To be expected soon: some baseline models and the results of the ICCV '21 challenge on this benchmark.

CLAD-Detection

Benchmark introduction

CLAD-D is a domain incremental continual object detection benchmark. Images from the SODA10M dataset are divided into four tasks, which should be learned incrementally by a machine learning model, without accessing the past data. The final performance of the model will be the average mAP over all tasks. The 4 tasks are defined as:

Task 1: clear weather - daytime - citystreet (4470 - 497 - 2433)
Task 2: clear weather - daytime - highway (1329 - 148 - 3126)
Task 3: night (1480 - 165 - 2968)
Task 4: rainy - daytime (524 - 59 - 1442)

Where the numbers between brackets indicate respectively the number of training, validation and test images per task. Below are some examples images of each task, with the corresponding bounding box annotations. The domain gaps in this benchmark are less harsh than those in domain-incrmental learning, yet still not trivial to overcome.

Evaluation

CLAD-D is evaluated using the average mAP @IOU = 0.5, as in VOC Pascal. We then average this over all four tasks to give each task equal weight.

$$ \text{Average mAP} = \frac{1}{T} \sum_{t} \text{mAP}_t $$

Original Challenge Rules

The original challenge had some restirctions, some of which we believe are still worth considering. Of course, if there's a good reason to deviate from them, there's no reason for not doing so now. Below are the original rules, ordered by our perceived importance at this point.

Maximal rehearsal memory of 250 samples.
Only pretraining on Microsoft COCO and/or ImageNet1K.

Minimal Example

To get the Avalanche-style benchmark, simply use the get_cladd_avalanche method, which will create an Avalanche benchmark in the usual format. Then your strategy can be created using the ObjectDetectionTemplate, with optional training and testing plugins.

import clad

import logging
import torch
import torchvision

from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from avalanche.training.supervised.naive_object_detection import ObjectDetectionTemplate
from avalanche.evaluation.metrics import loss_metrics
from avalanche.evaluation.metrics.detection import DetectionMetrics
from avalanche.logging import InteractiveLogger
from avalanche.training.plugins import EvaluationPlugin

logging.basicConfig(level=logging.NOTSET)

# Get benchmark and models
benchmark = clad.get_cladd_avalanche(root='../../data')
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True).to('cuda')

# Update model and create optimizer
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, 6+1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.005, momentum=0.9, weight_decay=0.0005)

# Create Avalanche strategy
cl_strategy = ObjectDetectionTemplate(
    model=model,
    optimizer=optimizer,
    train_mb_size=5,
    train_epochs=1,
    eval_mb_size=5,
    device='cuda',
    evaluator=EvaluationPlugin(
        loss_metrics(epoch_running=True),
        DetectionMetrics(default_to_coco=True),
        loggers=[InteractiveLogger()],
    ),
)

# Train and test loop
for i, experience in enumerate(benchmark.train_stream):
    cl_strategy.train(experience, num_workers=4)
    cl_strategy.eval(benchmark.test_stream, num_workers=4)

To use Detectron2 to train CLAD-D, you only have to call register_cladd_detectron, which will register the CLAD-D datasets names to the DatasetCatalog of Detectron2. Then you can just use the names of the datasets in your config files. Detectron2 doesn't support training multiple datasets sequentially, although you can probably write a script to fix that.

import clad

from detectron2.model_zoo import model_zoo
from detectron2.engine import DefaultTrainer, DefaultPredictor
from detectron2.config import get_cfg
from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from detectron2.data import build_detection_test_loader

# This registers the CLAD-D datasets in Detectron2. They're accessible with the names cladd_T[i]_[split], with
# i the task-ID and [split] one of train/val/test.
clad.register_cladd_detectron(root='../../data')

cfg = get_cfg()

# Loads basic config file and then merges with our config file
cfg.merge_from_file(model_zoo.get_config_file("PascalVOC-Detection/faster_rcnn_R_50_C4.yaml"))
cfg.merge_from_file('./examples/cladd_detectron_ex.yaml')

trainer = DefaultTrainer(cfg)
trainer.resume_or_load()
trainer.train()

predictor = DefaultPredictor(cfg)

for test_dataset in cfg.DATASETS.TEST:
    evaluator = COCOEvaluator(test_dataset, output_dir=f"{cfg.OUTPUT_DIR}/{test_dataset}")
    val_loader = build_detection_test_loader(cfg, test_dataset)
    print(inference_on_dataset(predictor.model, val_loader, evaluator))

Results

The graph below shows the per class fine-tune mAP's at IOU 0.5 when finetuning a Faster-RCNN model with a ResNet50 backbone. While the model doesn't catastrophically forget, training on a different domain distribution does harm the performance on previous domains. For more details, see the CLAD paper.

To be expected: more baselines and the results of in the ICCV challenge.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
clad		clad
examples		examples
.gitignore		.gitignore
README.md		README.md
env.yml		env.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLAD

A Continual Learning benchmark for Autonomous Driving

Paper

Installation

CLAD-C

Benchmark introduction

Evaluation

Original Challenge Rules

Minimal example

Results

CLAD-Detection

Benchmark introduction

Evaluation

Original Challenge Rules

Minimal Example

Results

About

Releases 1

Packages

Languages

VerwimpEli/CLAD

Folders and files

Latest commit

History

Repository files navigation

CLAD

A Continual Learning benchmark for Autonomous Driving

Paper

Installation

CLAD-C

Benchmark introduction

Evaluation

Original Challenge Rules

Minimal example

Results

CLAD-Detection

Benchmark introduction

Evaluation

Original Challenge Rules

Minimal Example

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages