Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster attributes #1383

Merged
merged 5 commits into from May 29, 2023
Merged

Conversation

AntonioCarta
Copy link
Collaborator

closes #1357

the high level idea is that before data attributes were part of the "dataset tree". However, we can manage them independently. Now an AvalancheDataset is a DatasetWithTransforms + a set of attributes. Overall easier to manage and speeds up #1357, which is now on par with the normal concat.

@lrzpellegrini as you can see there are two tests failing

  • top-k I think is a change in the API, so it doesn't allow to check k=4 anymore
  • I have no idea what the equality check in custom_streams_name_and_length is supposed to do. Shouldn't we check equality of the samples?

@neuperc @AlbinSou this fixes the DER slowdown.

@ContinualAI-bot
Copy link
Collaborator

Oh no! It seems there are some PEP8 errors! 😕
Don't worry, you can fix them! 💪
Here's a report about the errors and where you can find them:

avalanche/benchmarks/utils/classification_dataset.py:103:81: E501 line too long (84 > 80 characters)
avalanche/benchmarks/utils/classification_dataset.py:107:81: E501 line too long (84 > 80 characters)
avalanche/benchmarks/utils/data.py:238:81: E501 line too long (88 > 80 characters)
avalanche/benchmarks/utils/data.py:241:81: E501 line too long (89 > 80 characters)
avalanche/benchmarks/utils/data.py:410:81: E501 line too long (82 > 80 characters)
avalanche/benchmarks/utils/data.py:417:81: E501 line too long (92 > 80 characters)
avalanche/benchmarks/utils/data.py:512:81: E501 line too long (91 > 80 characters)
avalanche/benchmarks/utils/utils.py:487:81: E501 line too long (88 > 80 characters)
8       E501 line too long (84 > 80 characters)

@lrzpellegrini
Copy link
Collaborator

Please correct me if I'm wrong. With this PR AvalancheDataset will now have two main components/fields:

  • flat_data: in charge of managing transformations
  • data_attributes: in charge of data attributes

with each field managing its own subset operations. If this is the case, it seems reasonable.

As for the tests, custom_streams_name_and_length seems to fail correctly: an object should be equal to itself, especially if it is the very same object (with the same memory address). The equality check method may need a fix before merging.

@AntonioCarta
Copy link
Collaborator Author

Yes, you are correct about the PR implementation.

As for the tests, custom_streams_name_and_length seems to fail correctly: an object should be equal to itself, especially if it is the very same object (with the same memory address). The equality check method may need a fix before merging.

I misunderstood the test. I broke the equality method. Now it's fixed.

@AntonioCarta AntonioCarta merged commit f7aa170 into ContinualAI:master May 29, 2023
5 of 13 checks passed
@AntonioCarta AntonioCarta deleted the faster_attributes branch June 1, 2023 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

concat_datasets is slow when used for grouped ReservoirSampling buffers with attributed dataset
3 participants