Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add force_data_batch_size to ReplayPlugin for manual assignment of data-mem ratio #834

Merged
merged 86 commits into from
Jan 7, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
4aef41d
add CTrL integration
TomVeniat Apr 23, 2021
e2e6e60
Add normalization to each task
TomVeniat Apr 23, 2021
ae65f14
Add tests
TomVeniat May 4, 2021
d0e7936
Add example script
TomVeniat May 23, 2021
1ca7cca
Clone model to compute transfer
TomVeniat May 25, 2021
648062a
Fix PEP8 errors
TomVeniat Jun 10, 2021
3cda277
Example script cleaning
TomVeniat Jun 14, 2021
67f611c
Add support for the long stream + early stopping
TomVeniat Jun 16, 2021
ac183eb
Add modes to the early stopping plugin
TomVeniat Jun 16, 2021
21426fa
Add S_long to the demo
TomVeniat Jun 17, 2021
dc75875
Use early stopping plugin + add license
TomVeniat Jun 22, 2021
e8e0741
Shorten the long_stream for testing
TomVeniat Jun 24, 2021
6a0e97f
Fix PEP8 error
TomVeniat Jun 24, 2021
f43544a
update ctrl dependency
TomVeniat Aug 16, 2021
511ff18
Merge remote-tracking branch 'upstream/master'
lrzpellegrini Nov 17, 2021
d751a05
FIX an issue with mem_batch_size in ReplayDataLoader when forcing dat…
HamedHemati Nov 19, 2021
42edd44
Add force_data_batch_size option to ReplayPlugin to enable manual ass…
HamedHemati Nov 19, 2021
4858b48
FIX an issue with mem_batch_size in ReplayDataLoader [PEP8-FIX]
HamedHemati Nov 19, 2021
17fc1a8
minor changes to docstrings
AntonioCarta Nov 25, 2021
89a4989
minor changes to docstrings
AntonioCarta Nov 25, 2021
39a49a6
Merge branch 'doc'
AntonioCarta Nov 25, 2021
8964dcd
Merge branch 'master' of https://github.com/ContinualAI/avalanche
AntonioCarta Nov 25, 2021
454534e
change GroupBalancedDataloader to have a fixed batch size for any num…
HamedHemati Nov 26, 2021
1cd5609
Merge remote-tracking branch 'upstream/master'
lrzpellegrini Nov 28, 2021
d59a9e2
Added first 3 notebooks of the AvalancheDataset How-To series.
lrzpellegrini Nov 28, 2021
51c2ac4
ImageSamples metric now works with tensors
AntonioCarta Nov 29, 2021
660c905
Merge branch 'master' into fixes
AntonioCarta Nov 29, 2021
ac97308
GitBook: [#123] how-to added
vlomonaco Nov 29, 2021
0a35699
Merge remote-tracking branch 'upstream/master'
lrzpellegrini Nov 29, 2021
e03b399
REFACTOR tests
AntonioCarta Nov 29, 2021
6d60a28
GitBook: [#125] Adding preamble page to AvalancheDataset How-To
vlomonaco Nov 29, 2021
f24d325
Merge remote-tracking branch 'upstream/master'
lrzpellegrini Nov 29, 2021
b6aaf1d
Adapted AvalancheDataset How-To notebooks to the GitBook structure.
lrzpellegrini Nov 29, 2021
508821b
Remove old How-To structure.
lrzpellegrini Nov 29, 2021
a238df7
Merge pull request #842 from lrzpellegrini/master
vlomonaco Nov 29, 2021
11e374f
Update gitbook documentation
ContinualAI-bot Nov 29, 2021
f25043b
update notebooks
AntonioCarta Nov 30, 2021
e3855b0
Merge pull request #840 from AntonioCarta/master
AntonioCarta Nov 30, 2021
5d01290
Update gitbook documentation
ContinualAI-bot Nov 30, 2021
8bba427
Update bug_report.md
AntonioCarta Dec 1, 2021
466d1aa
ADD periodic eval at each iteration
AntonioCarta Dec 2, 2021
d3f58b2
FIX issue #838
AntonioCarta Dec 2, 2021
d9a0e7b
UPDATE target metrics
AntonioCarta Dec 2, 2021
4d3d923
Fixed
ashok-arjun Dec 5, 2021
6ad2428
Merge remote-tracking branch 'upstream/master'
lrzpellegrini Dec 14, 2021
3efab4c
update metric targets
AntonioCarta Dec 14, 2021
896a2be
Added sphinx doc for the benchmarks module. Fixed import issues.
lrzpellegrini Dec 14, 2021
8ce0f49
Merge pull request #852 from lrzpellegrini/master
AntonioCarta Dec 15, 2021
66878cb
Changed version
AndreaCossu Dec 16, 2021
30ba7a9
GitBook: [#126] No subject
Dec 16, 2021
ac83c45
Fixed bug in setuptools
AndreaCossu Dec 16, 2021
c82526a
Added beta release and pip install details
AndreaCossu Dec 17, 2021
7fa8655
Merge remote-tracking branch 'upstream/master'
lrzpellegrini Dec 19, 2021
eb68a71
GitBook: [#129] No subject
AntonioCarta Dec 20, 2021
0cb205b
GitBook: [#130] No subject
AntonioCarta Dec 20, 2021
c3e9f33
GitBook: [#131] No subject
AntonioCarta Dec 20, 2021
49fc725
Merge remote-tracking branch 'upstream/master'
lrzpellegrini Dec 20, 2021
1ff70f7
Support for ReduceLROnPlateau. Fixes #858. Refactored scheduler tests.
lrzpellegrini Dec 21, 2021
9ceb11d
Updated metrics files
AndreaCossu Dec 21, 2021
572bcb5
Updated metrics
AndreaCossu Dec 21, 2021
8fb4e9b
Merge branch 'ContinualAI:master' into Issue#771
ashok-arjun Dec 22, 2021
7cb42ae
Change tensor to int type
ashok-arjun Dec 22, 2021
b6e9239
Merge pull request #843 from AntonioCarta/peval_iterations
AndreaCossu Dec 22, 2021
e068e77
Merge remote-tracking branch 'origin/master'
lrzpellegrini Dec 24, 2021
382e052
Merge pull request #859 from lrzpellegrini/master
lrzpellegrini Dec 24, 2021
8606ec1
Prevents TensorboardLogger from blocking process exit. Fixes issue #864
lrzpellegrini Dec 24, 2021
5db9dd0
Fixes PEP8 issue.
lrzpellegrini Dec 24, 2021
c92174c
GitBook: fixed a bug in documentation
zalakbhalani Dec 24, 2021
3aeac8b
Merge branch 'ContinualAI:master' into Issue#771
ashok-arjun Dec 25, 2021
f67492b
Merge pull request #866 from zalakbhalani/master
lrzpellegrini Dec 25, 2021
b875547
Merge pull request #865 from lrzpellegrini/master
lrzpellegrini Dec 25, 2021
390c9e5
Merge branch 'ContinualAI:master' into Issue#771
ashok-arjun Dec 29, 2021
8484008
Get classes by `dataset.targets`
ashok-arjun Dec 29, 2021
66337bb
Correct error in var name
ashok-arjun Dec 29, 2021
095460d
Merge branch 'master' into master
TomVeniat Dec 30, 2021
2b62cb8
Added tests for `concat_datasets_sequentially`
ashok-arjun Dec 31, 2021
438f700
Fix PEP8 errors
ashok-arjun Dec 31, 2021
2164acf
Force added ctrl-benchmark dependency
AndreaCossu Jan 3, 2022
bf5776f
Force add ctrl-benchmark as dependency
AndreaCossu Jan 3, 2022
743d232
Added ctrl-benchmark as dependency
AndreaCossu Jan 3, 2022
6e89f99
Merge remote-tracking branch 'upstream/master'
AndreaCossu Jan 3, 2022
93fb791
Fixed pep
AndreaCossu Jan 3, 2022
bf67e20
Merge pull request #844 from ashok-arjun/Issue#771
AndreaCossu Jan 3, 2022
21589ea
Merge pull request #561 from TomVeniat/master
AndreaCossu Jan 3, 2022
0313607
Merge branch 'fixes' of https://github.com/hamedhemati/avalanche into…
AntonioCarta Jan 5, 2022
19a5ffd
update tests and target metrics
AntonioCarta Jan 7, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ assignees: ''
A clear and concise description of what the bug is.

🐜 **To Reproduce**
Steps / minimal snipped of code to reproduce the issue.
A [minimal working example](https://en.wikipedia.org/wiki/Minimal_working_example) to reproduce the issue. The code should be executable without modifications.

🐝 **Expected behavior**
A clear and concise description of what you expected to happen.
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,10 +83,10 @@ Current Release

Avalanche is a framework in constant development. Thanks to the support of the [ContinualAI]() community and its active members we are quickly extending its features and improve its usability based on the demands of our research community!

A the moment, Avalanche is in [**Alpha v0.0.1**](https://avalanche.continualai.org/getting-started/alpha-version), but we already support [several *Benchmarks*, *Strategies* and *Metrics*](https://avalanche.continualai.org/getting-started/alpha-version), that make it, we believe, the best tool out there for your continual learning research! 💪
A the moment, Avalanche is in [**Beta (v0.1.0)**](https://github.com/ContinualAI/avalanche/releases/tag/v0.1.0). We support [several *Benchmarks*, *Strategies* and *Metrics*](https://avalanche.continualai.org/getting-started/alpha-version), that make it, we believe, the best tool out there for your continual learning research! 💪

*Please note that, at the moment, we **do not** support stable releases and packaged versions of the library.*
*We do this intentionally as in this early phase we would like to stimulate contributions only from experienced CL researchers and coders.*
**You can install Avalanche by running `pip install avalanche-lib`.**
Look [here](https://avalanche.continualai.org/getting-started/how-to-install) for a more complete guide on the different ways available to install Avalanche.

Getting Started
----------------
Expand Down
2 changes: 1 addition & 1 deletion avalanche/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from avalanche import training


__version__ = "0.0.1"
__version__ = "0.1.0"

_dataset_add = None

Expand Down
2 changes: 2 additions & 0 deletions avalanche/benchmarks/classic/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,12 @@
from .ccub200 import *
from .cfashion_mnist import *
from .cimagenet import *
from .cinaturalist import *
from .cmnist import *
from .comniglot import *
from .core50 import CORe50
from .ctiny_imagenet import *
from .ctrl import *
from .endless_cl_sim import *
from .openloris import *
from .stream51 import *
105 changes: 105 additions & 0 deletions avalanche/benchmarks/classic/ctrl.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
################################################################################
# Copyright (c) 2021 ContinualAI. #
# Copyrights licensed under the MIT License. #
# See the accompanying LICENSE file for terms. #
# #
# Date: 22-06-2021 #
# Author(s): Tom Veniat #
# E-mail: contact@continualai.org #
# Website: avalanche.continualai.org #
################################################################################

import random
import sys
from pathlib import Path

import torchvision.transforms.functional as F
from torchvision import transforms
from tqdm import tqdm

import ctrl
from avalanche.benchmarks import dataset_benchmark
from avalanche.benchmarks.datasets import default_dataset_location
from avalanche.benchmarks.utils import AvalancheTensorDataset, \
common_paths_root, AvalancheDataset, PathsDataset


def CTrL(stream_name: str, save_to_disk: bool = False,
path: Path = default_dataset_location(''), seed: int = None,
n_tasks: int = None):
"""
Gives access to the Continual Transfer Learning benchmark streams
introduced in https://arxiv.org/abs/2012.12631.
:param stream_name: Name of the test stream to generate. Must be one of
`s_plus`, `s_minus`, `s_in`, `s_out` and `s_pl`.
:param save_to_disk: Whether to save each stream on the disk or load
everything in memory. Setting it to `True` will save memory but takes more
time on the first generation using the corresponding seed.
:param path: The path under which the generated stream will be saved if
save_to_disk is True.
:param seed: The seed to use to generate the streams. If no seed is given,
a random one will be used to make sure that the generated stream can
be reproduced.
:param n_tasks: The number of tasks to generate. This parameter is only
relevant for the `s_long` stream, as all other streams have a fixed number
of tasks.
:return: A scenario containing 3 streams: train, val and test.
"""
seed = seed or random.randint(0, sys.maxsize)
if stream_name != 's_long' and n_tasks is not None:
raise ValueError('The n_tasks parameter can only be used with the '
f'"s_long" stream, asked {n_tasks} for {stream_name}')
elif stream_name == 's_long' and n_tasks is None:
n_tasks = 100

stream = ctrl.get_stream(stream_name, seed)

if save_to_disk:
folder = path / 'ctrl' / stream_name / f'seed_{seed}'

# Train, val and test experiences
exps = [[], [], []]
for t_id, t in enumerate(tqdm(stream, desc=f'Loading {stream_name}'), ):
trans = transforms.Normalize(t.statistics['mean'],
t.statistics['std'])
for split, split_name, exp in zip(t.datasets, t.split_names, exps):
samples, labels = split.tensors
task_labels = [t.id] * samples.size(0)
if save_to_disk:
exp_folder = folder / f'exp_{t_id}' / split_name
exp_folder.mkdir(parents=True, exist_ok=True)
files = []
for i, (sample, label) in enumerate(zip(samples, labels)):
sample_path = exp_folder / f'sample_{i}.png'
if not sample_path.exists():
F.to_pil_image(sample).save(sample_path)
files.append((sample_path, label.item()))

common_root, exp_paths_list = common_paths_root(files)
paths_dataset = PathsDataset(common_root, exp_paths_list)
dataset = AvalancheDataset(
paths_dataset,
task_labels=task_labels,
transform=transforms.Compose([
transforms.ToTensor(),
trans
])
)
else:
dataset = AvalancheTensorDataset(samples, labels.squeeze(1),
task_labels=task_labels,
transform=trans)
exp.append(dataset)
if stream_name == 's_long' and t_id == n_tasks - 1:
break

return dataset_benchmark(
train_datasets=exps[0],
test_datasets=exps[2],
other_streams_datasets=dict(val=exps[1]),
)


__all__ = [
'CTrL'
]
2 changes: 1 addition & 1 deletion avalanche/benchmarks/classic/stream51.py
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ def CLStream51(


__all__ = [
'Stream51'
'CLStream51'
]

if __name__ == "__main__":
Expand Down
3 changes: 2 additions & 1 deletion avalanche/benchmarks/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,11 @@
from .downloadable_dataset import *
from .core50 import *
from .cub200 import *
from .endless_cl_sim import *
from .mini_imagenet import *
from .openloris import *
from .stream51 import *
from .tiny_imagenet import *
from .omniglot import *
from .stream51 import *
from .torchvision_wrapper import *
from .inaturalist import *
26 changes: 20 additions & 6 deletions avalanche/benchmarks/utils/avalanche_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -2004,20 +2004,34 @@ def concat_datasets_sequentially(

new_class_ids_per_dataset = []
for dataset_idx in range(len(train_dataset_list)):

# Get the train and test sets of the dataset
train_set = train_dataset_list[dataset_idx]
test_set = test_dataset_list[dataset_idx]

# Get the classes in the dataset
dataset_classes = set(map(int, train_set.targets))

# The class IDs for this dataset will be in range
# [n_classes_in_previous_datasets,
# n_classes_in_previous_datasets + classes_in_this_dataset)
class_mapping = list(
new_classes = list(
range(next_remapped_idx,
next_remapped_idx + classes_per_dataset[dataset_idx]))
new_class_ids_per_dataset.append(class_mapping)

train_set = train_dataset_list[dataset_idx]
test_set = test_dataset_list[dataset_idx]

new_class_ids_per_dataset.append(new_classes)

# AvalancheSubset is used to apply the class IDs transformation.
# Remember, the class_mapping parameter must be a list in which:
# new_class_id = class_mapping[original_class_id]
# Hence, a list of size equal to the maximum class index is created
# Only elements corresponding to the present classes are remapped
class_mapping = [-1] * (max(dataset_classes) + 1)
j = 0
for i in dataset_classes:
class_mapping[i] = new_classes[j]
j += 1

# Create remapped datasets and append them to the final list
remapped_train_datasets.append(
AvalancheSubset(train_set, class_mapping=class_mapping))
remapped_test_datasets.append(
Expand Down
43 changes: 36 additions & 7 deletions avalanche/benchmarks/utils/data_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ def _default_collate_mbatches_fn(mbatches):


class TaskBalancedDataLoader:
""" Task-balanced data loader for Avalanche's datasets."""

def __init__(self, data: AvalancheDataset,
oversample_small_tasks: bool = False,
collate_mbatches=_default_collate_mbatches_fn,
Expand Down Expand Up @@ -95,9 +97,12 @@ def __len__(self):


class GroupBalancedDataLoader:
""" Data loader that balances data from multiple datasets."""

def __init__(self, datasets: Sequence[AvalancheDataset],
oversample_small_groups: bool = False,
collate_mbatches=_default_collate_mbatches_fn,
batch_size: int = 32,
**kwargs):
""" Data loader that balances data from multiple datasets.

Expand All @@ -115,6 +120,8 @@ def __init__(self, datasets: Sequence[AvalancheDataset],
:param collate_mbatches: function that given a sequence of mini-batches
(one for each task) combines them into a single mini-batch. Used to
combine the mini-batches obtained separately from each task.
:param batch_size: the size of the batch. It must be greater than or
equal to the number of groups.
:param kwargs: data loader arguments used to instantiate the loader for
each group separately. See pytorch :class:`DataLoader`.
"""
Expand All @@ -123,8 +130,19 @@ def __init__(self, datasets: Sequence[AvalancheDataset],
self.oversample_small_groups = oversample_small_groups
self.collate_mbatches = collate_mbatches

# check if batch_size is larger than or equal to the number of datasets
assert batch_size >= len(datasets)

# divide the batch between all datasets in the group
ds_batch_size = batch_size // len(datasets)
remaining = batch_size % len(datasets)

for data in self.datasets:
self.dataloaders.append(DataLoader(data, **kwargs))
bs = ds_batch_size
if remaining > 0:
bs += 1
remaining -= 1
self.dataloaders.append(DataLoader(data, batch_size=bs, **kwargs))
self.max_len = max([len(d) for d in self.dataloaders])

def __iter__(self):
Expand Down Expand Up @@ -166,6 +184,9 @@ def __len__(self):


class GroupBalancedInfiniteDataLoader:
""" Data loader that balances data from multiple datasets emitting an
infinite stream."""

def __init__(self, datasets: Sequence[AvalancheDataset],
collate_mbatches=_default_collate_mbatches_fn,
**kwargs):
Expand Down Expand Up @@ -214,6 +235,8 @@ def __len__(self):


class ReplayDataLoader:
""" Custom data loader for rehearsal/replay strategies."""

def __init__(self, data: AvalancheDataset, memory: AvalancheDataset = None,
oversample_small_tasks: bool = False,
collate_mbatches=_default_collate_mbatches_fn,
Expand All @@ -240,7 +263,9 @@ def __init__(self, data: AvalancheDataset, memory: AvalancheDataset = None,
combine the mini-batches obtained separately from each task.
:param batch_size: the size of the batch. It must be greater than or
equal to the number of tasks.
:param ratio_data_mem: How many of the samples should be from
:param force_data_batch_size: How many of the samples should be from the
current `data`. If None, it will equally divide each batch between
samples from all seen tasks in the current `data` and `memory`.
:param kwargs: data loader arguments used to instantiate the loader for
each task separately. See pytorch :class:`DataLoader`.
"""
Expand All @@ -256,19 +281,23 @@ def __init__(self, data: AvalancheDataset, memory: AvalancheDataset = None,
assert force_data_batch_size <= batch_size, \
"Forced batch size of data must be <= entire batch size"

mem_batch_size = batch_size - force_data_batch_size
remaining_example = 0
remaining_example_data = 0

mem_keys = len(self.memory.task_set)
mem_batch_size = batch_size - force_data_batch_size
mem_batch_size_k = mem_batch_size // mem_keys
remaining_example_mem = mem_batch_size % mem_keys

assert mem_batch_size >= mem_keys, \
"Batch size must be greator or equal " \
"to the number of tasks in the memory."

self.loader_data, _ = self._create_dataloaders(
data, force_data_batch_size,
remaining_example, **kwargs)
remaining_example_data, **kwargs)
self.loader_memory, _ = self._create_dataloaders(
memory, mem_batch_size,
remaining_example, **kwargs)
memory, mem_batch_size_k,
remaining_example_mem, **kwargs)
else:
num_keys = len(self.data.task_set) + len(self.memory.task_set)
assert batch_size >= num_keys, \
Expand Down