Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable GPU-only operations in CudaTensor class #42

Merged
merged 41 commits into from Jul 7, 2021
Merged

Conversation

mlxd
Copy link
Member

@mlxd mlxd commented Jul 1, 2021

Context: This PR removes the intermediate transfers to the Tensor class for slicing, as well as provides Transpose functionality via the cuTENSOR library.

Description of the Change: Via calls to the cuTENSOR library we enable permutation of the tensor class for a given set of indices. Additionally, via this permutation, we enable tensor slicing.

Benefits: We can avoid intermediate GPU-CPU-GPU transfers to perform tensor slicing.

Possible Drawbacks: Debugging on-device code can be more challenging.

Related GitHub Issues:

mlxd and others added 28 commits June 15, 2021 10:46
@github-actions
Copy link

github-actions bot commented Jul 1, 2021

Test Report (C++) on Ubuntu

    1 files  ±0      1 suites  ±0   0s ⏱️ ±0s
521 tests ±0  521 ✔️ ±0  0 💤 ±0  0 ❌ ±0 
870 runs  ±0  870 ✔️ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit b44fdf3. ± Comparison against base commit b44fdf3.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Jul 1, 2021

Test Report (C++) on MacOS

    1 files  ±0      1 suites  ±0   0s ⏱️ ±0s
521 tests ±0  521 ✔️ ±0  0 💤 ±0  0 ❌ ±0 
870 runs  ±0  870 ✔️ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit 6b80040. ± Comparison against base commit 29805e4.

♻️ This comment has been updated with latest results.

Copy link
Collaborator

@Mandrenkov Mandrenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I especially like the implementation of SliceIndex() in terms of Transpose().

It's also a nice bonus that we don't need to change any of the tests!

.github/CHANGELOG.md Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
include/jet/TensorNetwork.hpp Show resolved Hide resolved
include/jet/CudaTensor.hpp Outdated Show resolved Hide resolved
include/jet/CudaTensor.hpp Outdated Show resolved Hide resolved
include/jet/CudaTensor.hpp Outdated Show resolved Hide resolved
include/jet/CudaTensor.hpp Outdated Show resolved Hide resolved
include/jet/CudaTensor.hpp Outdated Show resolved Hide resolved
include/jet/CudaTensor.hpp Outdated Show resolved Hide resolved
include/jet/CudaTensor.hpp Show resolved Hide resolved
@github-actions
Copy link

github-actions bot commented Jul 7, 2021

Test Report (Python) on Ubuntu

    1 files  ±0      1 suites  ±0   7s ⏱️ ±0s
490 tests ±0  490 ✔️ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit b44fdf3. ± Comparison against base commit b44fdf3.

♻️ This comment has been updated with latest results.

Copy link
Collaborator

@Mandrenkov Mandrenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

.github/CHANGELOG.md Outdated Show resolved Hide resolved
Co-authored-by: Mikhail Andrenkov <Mandrenkov@users.noreply.github.com>
@mlxd mlxd merged commit b44fdf3 into main Jul 7, 2021
@mlxd mlxd deleted the cudatensor_nocpu branch July 7, 2021 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants