Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As part of the effort to add support for FFCV data loading, this PR brings an overall change in the way data loaders are created.
Avalanche data loaders are used to return batches which are the concatenation of smaller mini-batches from multiple datasets.
Up to now, this was accomplished by:
This led to issues regarding
num_workers
(which were spawned for each sub-dataset thus leading to spawningN * num_workers
), bottlenecks related to the collate on the main thread, etc, not to mention that the code of each loader replicated operations already implemented in other loaders.With the improvements found in this PR, a single PyTorch DataLoader is created based on a newly implemented Sampler that is used to return the indices in the desired order. The new mechanism supports distributed sampling, ensures that the correct number of workers is created, and that flags like
persistent_workers
andpin_memory
have appreciable results. In addition, minibatch collate is done at the DataLoader level, thus removing a problematic bottleneck.Note: experiment results may slightly change w.r.t. the previous implementation. This is not a bug and is related to the way random number generators are consumed, which may lead to a different choice of replay examples in the ReplayPlugin. Setting
shuffle=False
or setting a manual seed in the ReplayPlugin update method leads to the same results as the previous loader implementations.