Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make CoPE example data incremental #616

Closed
Mattdl opened this issue May 17, 2021 · 8 comments
Closed

Make CoPE example data incremental #616

Mattdl opened this issue May 17, 2021 · 8 comments
Assignees
Labels
Benchmarks Related to the Benchmarks module Training Related to the Training module

Comments

@Mattdl
Copy link
Collaborator

Mattdl commented May 17, 2021

Putting the new data-incremental interface to use for the CoPE example, which is now still class incremental.
Can someone assign this issue to me? Thanks!

@vlomonaco vlomonaco added the Training Related to the Training module label May 18, 2021
@Mattdl
Copy link
Collaborator Author

Mattdl commented May 18, 2021

I get an error when implementing the data incremental loader.

-- >> Start of training phase << --
-- Starting training on experience 242 (Task 0) from train stream --

[...] RecursionError: maximum recursion depth exceeded while calling a Python object

Should I make a PR to have a working example?

@vlomonaco
Copy link
Member

Yes so that we can provide help in case you need it!

@Mattdl
Copy link
Collaborator Author

Mattdl commented May 19, 2021

I've added the PR to reproduce the error. See #543 which had the same error.
@lrzpellegrini you found a fix for #543, do you have an idea what may cause the error in this case?

@lrzpellegrini
Copy link
Collaborator

Hi @Mattdl, sorry for the delay. I reproduced the bug on the cope example of your fork. The recursion bug is very similar to the one I previously fixed, but this seems to be linked to a different corner case. I'll fix it asap.

@lrzpellegrini
Copy link
Collaborator

The last PR should fix the recursion issue and it will be merged soon. Let me know if it works!

@Mattdl
Copy link
Collaborator Author

Mattdl commented May 26, 2021

Hi thanks for the effort! The recursion bug seemed to be solved, but now after 201 iterations another problem in the corner case arises. It seems to happen when concatting for the replay buffer, i.e. in AvalancheConcatDataset.
This is the stack trace, it should be an index out of range exception:

Traceback (most recent call last):
  File .../avalanche/training/plugins/cope.py", line 88, in before_training_exp
    AvalancheConcatDataset(self.replay_mem.values()),
  File .../avalanche/benchmarks/utils/avalanche_dataset.py", line 1616, in __init__
    super().__init__(ClassificationDataset(),  # not used
  File .../avalanche/benchmarks/utils/avalanche_dataset.py", line 306, in __init__
    self._flatten_dataset()
  File .../avalanche/benchmarks/utils/avalanche_dataset.py", line 1849, in _flatten_dataset
    flattened_list += self._flatten_subset_concat_branch(dataset)
  File .../avalanche/benchmarks/utils/avalanche_dataset.py", line 1904, in _flatten_subset_concat_branch
    last_c_targets.append(dataset.targets[idx])
IndexError: list index out of range

The PR should make it already reproducible: #623

@Mattdl
Copy link
Collaborator Author

Mattdl commented Jun 3, 2021

The issue seemed to be solved. However, the training time has a huge increase compared to the class-incremental setup.
Split-MNIST now takes about 6hours to complete a single epoch, compared to usually 10min on my setup. (About factor 40 slowdown, which makes it infeasible when going to larger setups)

Should I make another Issue for this separately?

@lrzpellegrini
Copy link
Collaborator

Yes, I think it's better to create a separate issue for this.

Do you have a log of how the running time is split across the various procedures (replay update, training, etc.)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmarks Related to the Benchmarks module Training Related to the Training module
Projects
None yet
Development

No branches or pull requests

3 participants