Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster indices for FlatData with lazy operations #1416

Merged
merged 5 commits into from Jun 14, 2023

Conversation

AntonioCarta
Copy link
Collaborator

Lazy indices for FlatData to support faster operations.

@ContinualAI-bot
Copy link
Collaborator

Oh no! It seems there are some PEP8 errors! 😕
Don't worry, you can fix them! 💪
Here's a report about the errors and where you can find them:

avalanche/benchmarks/utils/data_attribute.py:167:81: E501 line too long (83 > 80 characters)
avalanche/benchmarks/utils/flat_data.py:60:21: E117 over-indented (comment)
avalanche/benchmarks/utils/flat_data.py:61:21: E117 over-indented
avalanche/benchmarks/utils/flat_data.py:204:81: E501 line too long (87 > 80 characters)
avalanche/benchmarks/utils/flat_data.py:216:81: E501 line too long (86 > 80 characters)
avalanche/benchmarks/utils/flat_data.py:229:81: E501 line too long (86 > 80 characters)
avalanche/benchmarks/utils/flat_data.py:240:81: E501 line too long (86 > 80 characters)
2       E117 over-indented (comment)
5       E501 line too long (83 > 80 characters)

@AntonioCarta
Copy link
Collaborator Author

@lrzpellegrini do you have any suggestion on how to check for performance (runtime) regression with automated tests?

@coveralls
Copy link

coveralls commented Jun 14, 2023

Pull Request Test Coverage Report for Build 5267889477

  • 81 of 88 (92.05%) changed or added relevant lines in 4 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.1%) to 72.682%

Changes Missing Coverage Covered Lines Changed/Added Lines %
avalanche/benchmarks/utils/utils.py 8 9 88.89%
avalanche/benchmarks/utils/flat_data.py 68 74 91.89%
Files with Coverage Reduction New Missed Lines %
avalanche/benchmarks/utils/flat_data.py 1 90.94%
Totals Coverage Status
Change from base Build 5257858549: 0.1%
Covered Lines: 16325
Relevant Lines: 22461

💛 - Coveralls

@lrzpellegrini
Copy link
Collaborator

lrzpellegrini commented Jun 14, 2023

Everything seems ok!

As for the performance, it's hard to check times in an uncontrolled environment (such as GH Actions), where the processor may change. Our best shot is to run performance checks on ContinualAI servers. There are ways to tell GitHub to run actions on external runners, but I don't know if it is hard to set up.

As for the PR, I wonder if we can further improve performance by storing the eager indices in PyTorch or NumPy tensors.

def _to_eager(self):
if self._eager_list is not None:
return
self._eager_list = [el + self._offset for el in self._lazy_sequence]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can further optimize using NumPy both as a storage for _eager_list and both to optimize summing the _offset to each index.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is useful if we do it every time we offset (there are many places). Here most cost is due to the full iteration of the iteration. Not sure numpy would help a lot. But in general, converting everything to tensors may speed up general subsets and concat ops.

@AntonioCarta
Copy link
Collaborator Author

As for the PR, I wonder if we can further improve performance by storing the eager indices in PyTorch or NumPy tensors.

Maybe. Most of the cost was the concatenation of large lists for each concat operation. It should be more efficient in numpy.
For example, I added a LazyRange which should help avoid the "eagerification" step in some common cases.

I think we can speed this up further in the future, either pushing more on the lazyness or using numpy tensors.

@AntonioCarta
Copy link
Collaborator Author

I will leave further improvements as future work.

@AntonioCarta AntonioCarta merged commit e915622 into ContinualAI:master Jun 14, 2023
18 checks passed
@AntonioCarta AntonioCarta deleted the replay-flattening branch June 14, 2023 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants