New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster indices for FlatData with lazy operations #1416
faster indices for FlatData with lazy operations #1416
Conversation
Oh no! It seems there are some PEP8 errors! 😕
|
@lrzpellegrini do you have any suggestion on how to check for performance (runtime) regression with automated tests? |
Pull Request Test Coverage Report for Build 5267889477
💛 - Coveralls |
Everything seems ok! As for the performance, it's hard to check times in an uncontrolled environment (such as GH Actions), where the processor may change. Our best shot is to run performance checks on ContinualAI servers. There are ways to tell GitHub to run actions on external runners, but I don't know if it is hard to set up. As for the PR, I wonder if we can further improve performance by storing the eager indices in PyTorch or NumPy tensors. |
def _to_eager(self): | ||
if self._eager_list is not None: | ||
return | ||
self._eager_list = [el + self._offset for el in self._lazy_sequence] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can further optimize using NumPy both as a storage for _eager_list and both to optimize summing the _offset to each index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is useful if we do it every time we offset (there are many places). Here most cost is due to the full iteration of the iteration. Not sure numpy would help a lot. But in general, converting everything to tensors may speed up general subsets and concat ops.
Maybe. Most of the cost was the concatenation of large lists for each concat operation. It should be more efficient in numpy. I think we can speed this up further in the future, either pushing more on the lazyness or using numpy tensors. |
I will leave further improvements as future work. |
Lazy indices for FlatData to support faster operations.