faster indices for FlatData with lazy operations #1416

AntonioCarta · 2023-06-14T10:07:00Z

Lazy indices for FlatData to support faster operations.

ContinualAI-bot · 2023-06-14T10:07:51Z

Oh no! It seems there are some PEP8 errors! 😕
Don't worry, you can fix them! 💪
Here's a report about the errors and where you can find them:

avalanche/benchmarks/utils/data_attribute.py:167:81: E501 line too long (83 > 80 characters)
avalanche/benchmarks/utils/flat_data.py:60:21: E117 over-indented (comment)
avalanche/benchmarks/utils/flat_data.py:61:21: E117 over-indented
avalanche/benchmarks/utils/flat_data.py:204:81: E501 line too long (87 > 80 characters)
avalanche/benchmarks/utils/flat_data.py:216:81: E501 line too long (86 > 80 characters)
avalanche/benchmarks/utils/flat_data.py:229:81: E501 line too long (86 > 80 characters)
avalanche/benchmarks/utils/flat_data.py:240:81: E501 line too long (86 > 80 characters)
2       E117 over-indented (comment)
5       E501 line too long (83 > 80 characters)

AntonioCarta · 2023-06-14T12:55:28Z

@lrzpellegrini do you have any suggestion on how to check for performance (runtime) regression with automated tests?

coveralls · 2023-06-14T13:04:13Z

Pull Request Test Coverage Report for Build 5267889477

81 of 88 (92.05%) changed or added relevant lines in 4 files are covered.
1 unchanged line in 1 file lost coverage.
Overall coverage increased (+0.1%) to 72.682%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
avalanche/benchmarks/utils/utils.py	8	9	88.89%
avalanche/benchmarks/utils/flat_data.py	68	74	91.89%

Files with Coverage Reduction	New Missed Lines	%
avalanche/benchmarks/utils/flat_data.py	1	90.94%

Totals
Change from base Build 5257858549:	0.1%
Covered Lines:	16325
Relevant Lines:	22461

💛 - Coveralls

lrzpellegrini · 2023-06-14T13:32:26Z

Everything seems ok!

As for the performance, it's hard to check times in an uncontrolled environment (such as GH Actions), where the processor may change. Our best shot is to run performance checks on ContinualAI servers. There are ways to tell GitHub to run actions on external runners, but I don't know if it is hard to set up.

As for the PR, I wonder if we can further improve performance by storing the eager indices in PyTorch or NumPy tensors.

lrzpellegrini · 2023-06-14T13:34:08Z

avalanche/benchmarks/utils/flat_data.py

+    def _to_eager(self):
+        if self._eager_list is not None:
+            return
+        self._eager_list = [el + self._offset for el in self._lazy_sequence]


Maybe we can further optimize using NumPy both as a storage for _eager_list and both to optimize summing the _offset to each index.

This is useful if we do it every time we offset (there are many places). Here most cost is due to the full iteration of the iteration. Not sure numpy would help a lot. But in general, converting everything to tensors may speed up general subsets and concat ops.

AntonioCarta · 2023-06-14T13:41:34Z

As for the PR, I wonder if we can further improve performance by storing the eager indices in PyTorch or NumPy tensors.

Maybe. Most of the cost was the concatenation of large lists for each concat operation. It should be more efficient in numpy.
For example, I added a LazyRange which should help avoid the "eagerification" step in some common cases.

I think we can speed this up further in the future, either pushing more on the lazyness or using numpy tensors.

AntonioCarta · 2023-06-14T13:42:16Z

I will leave further improvements as future work.

faster indices for FlatData with lazy operations

d7bddf6

AntonioCarta added 2 commits June 14, 2023 14:50

fix lazy bugz

b4f2aa6

pep8

99ce5bb

AntonioCarta requested a review from lrzpellegrini June 14, 2023 12:54

lrzpellegrini reviewed Jun 14, 2023

View reviewed changes

AntonioCarta added 2 commits June 14, 2023 15:38

ADD lazy range to avoid range eagerification

4b3536b

pep8

351c2fb

AntonioCarta mentioned this pull request Jun 14, 2023

improve dataset speed for large-scale data #1417

Closed

AntonioCarta merged commit e915622 into ContinualAI:master Jun 14, 2023
18 checks passed

AntonioCarta deleted the replay-flattening branch June 14, 2023 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faster indices for FlatData with lazy operations #1416

faster indices for FlatData with lazy operations #1416

AntonioCarta commented Jun 14, 2023

ContinualAI-bot commented Jun 14, 2023

AntonioCarta commented Jun 14, 2023

coveralls commented Jun 14, 2023 •

edited

lrzpellegrini commented Jun 14, 2023 •

edited

lrzpellegrini Jun 14, 2023

AntonioCarta Jun 14, 2023

AntonioCarta commented Jun 14, 2023

AntonioCarta commented Jun 14, 2023

faster indices for FlatData with lazy operations #1416

faster indices for FlatData with lazy operations #1416

Conversation

AntonioCarta commented Jun 14, 2023

ContinualAI-bot commented Jun 14, 2023

AntonioCarta commented Jun 14, 2023

coveralls commented Jun 14, 2023 • edited

Pull Request Test Coverage Report for Build 5267889477

💛 - Coveralls

lrzpellegrini commented Jun 14, 2023 • edited

lrzpellegrini Jun 14, 2023

Choose a reason for hiding this comment

AntonioCarta Jun 14, 2023

Choose a reason for hiding this comment

AntonioCarta commented Jun 14, 2023

AntonioCarta commented Jun 14, 2023

coveralls commented Jun 14, 2023 •

edited

lrzpellegrini commented Jun 14, 2023 •

edited