New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable GPU-only operations in CudaTensor class #42
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I especially like the implementation of SliceIndex()
in terms of Transpose()
.
It's also a nice bonus that we don't need to change any of the tests!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Co-authored-by: Mikhail Andrenkov <Mandrenkov@users.noreply.github.com>
Context: This PR removes the intermediate transfers to the
Tensor
class for slicing, as well as providesTranspose
functionality via thecuTENSOR
library.Description of the Change: Via calls to the
cuTENSOR
library we enable permutation of the tensor class for a given set of indices. Additionally, via this permutation, we enable tensor slicing.Benefits: We can avoid intermediate GPU-CPU-GPU transfers to perform tensor slicing.
Possible Drawbacks: Debugging on-device code can be more challenging.
Related GitHub Issues: