Enable GPU-only operations in CudaTensor class #42

mlxd · 2021-07-01T14:46:18Z

Context: This PR removes the intermediate transfers to the Tensor class for slicing, as well as provides Transpose functionality via the cuTENSOR library.

Description of the Change: Via calls to the cuTENSOR library we enable permutation of the tensor class for a given set of indices. Additionally, via this permutation, we enable tensor slicing.

Benefits: We can avoid intermediate GPU-CPU-GPU transfers to perform tensor slicing.

Possible Drawbacks: Debugging on-device code can be more challenging.

Related GitHub Issues:

…nd CudaTensor

github-actions · 2021-07-01T14:51:10Z

Test Report (C++) on Ubuntu

    1 files ±0     1 suites ±0 0s ⏱️ ±0s
521 tests ±0 521 ✔️ ±0 0 💤 ±0 0 ❌ ±0
870 runs ±0 870 ✔️ ±0 0 💤 ±0 0 ❌ ±0

Results for commit b44fdf3. ± Comparison against base commit b44fdf3.

♻️ This comment has been updated with latest results.

github-actions · 2021-07-01T14:51:49Z

Test Report (C++) on MacOS

    1 files ±0     1 suites ±0 0s ⏱️ ±0s
521 tests ±0 521 ✔️ ±0 0 💤 ±0 0 ❌ ±0
870 runs ±0 870 ✔️ ±0 0 💤 ±0 0 ❌ ±0

Results for commit 6b80040. ± Comparison against base commit 29805e4.

♻️ This comment has been updated with latest results.

Mandrenkov

Looks great! I especially like the implementation of SliceIndex() in terms of Transpose().

It's also a nice bonus that we don't need to change any of the tests!

.github/CHANGELOG.md

CMakeLists.txt

include/jet/TensorNetwork.hpp

include/jet/CudaTensor.hpp

github-actions · 2021-07-07T10:38:06Z

Test Report (Python) on Ubuntu

    1 files ±0     1 suites ±0 7s ⏱️ ±0s
490 tests ±0 490 ✔️ ±0 0 💤 ±0 0 ❌ ±0

Results for commit b44fdf3. ± Comparison against base commit b44fdf3.

♻️ This comment has been updated with latest results.

Mandrenkov

Looks good to me!

.github/CHANGELOG.md

Co-authored-by: Mikhail Andrenkov <Mandrenkov@users.noreply.github.com>

mlxd and others added 28 commits June 15, 2021 10:46

Add support to load datafile as column major for CUDA

52a2468

Ensure Tensor and CudaTensor supported for serializer

2bebe07

Fix conversion between Tensor and CudaTensor indices and sizes

3cf9050

Rename TBCC to TBC

7818dee

Additional renaming for TBCC to TBC

3204acb

Renaming tbcc to tbc and avoiding datatype ambiguity in class

3d8e24d

Refactor to generalise methods and interface between between Tensor a…

8bca4fa

…nd CudaTensor

Add tests foir CudaTensor backed TN

34800f7

Intermediate slicing and rehsping support via Tensor for CudaTensor

15834ac

Add initial slice and reshape support for CudaTensor

59b305a

Allow CudaTensor network contractions

8c991b9

Merge branch 'main' into tbc_cuda

a0d831a

Add support for incremental CuTensor naming convention

2c12afe

Enable CudaTensor single GPU support for TBC

5e8cb46

Add CuTensor AddTensor support for CudaTensor class

3a1b9ad

Remove redundant calls for cutensorElementwiseBinary

dfb95cd

Extend TBC tests to CudaTensor

cffb452

Ensure testing compliance for CudaTensor with TBC

5d6a4a1

Add private equality op for CudaTensor

3915f38

Merge branch 'main' into tbc_cuda

f1ca92f

Add safety check for CUDA and fix GPU memory leak

5f6155a

Fix CudaTensor formatting

e45bcb6

Add compile-time removal of CUDA safety checks

fbab9f4

Add prelim support for removing CPU intermediate ops for CudaTensor

d81147d

Merge branch 'main' into cudatensor_nocpu

fd521b4

Fix pointer offsets in slicing

119c850

Add tranpose-permute to CudaTensor

6cee0f2

Esnure CudaTensor ordering preserved after slicing

1427f5e

mlxd requested review from Mandrenkov and trevor-vincent July 1, 2021 14:46

Merge branch 'main' into cudatensor_nocpu

ef407ce

Merge branch 'main' into cudatensor_nocpu

6b80040

Mandrenkov reviewed Jul 5, 2021

View reviewed changes

mlxd and others added 10 commits July 7, 2021 10:59

Fix changelog typo

bdc34bf

Readd missing CudaTensor safety check option in CMake

c8c15e7

Update abort macro usage in CudaTensor

08a2562

Update CudaTensor Permute and Slice comments

be796f5

Update the ptr offset value calculation for CudaTensor

4cbffc5

Favour insert and emplace over count find and emaplce

690029c

Insert and emplace replacement

f006961

Remove unneeded comment blocks

4154c86

Fix formatting

ef9a5d4

Merge branch 'main' into cudatensor_nocpu

4ca521e

Mandrenkov approved these changes Jul 7, 2021

View reviewed changes

.github/CHANGELOG.md Outdated Show resolved Hide resolved

Update .github/CHANGELOG.md

13f5255

Co-authored-by: Mikhail Andrenkov <Mandrenkov@users.noreply.github.com>

mlxd merged commit b44fdf3 into main Jul 7, 2021

mlxd deleted the cudatensor_nocpu branch July 7, 2021 15:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable GPU-only operations in CudaTensor class #42

Enable GPU-only operations in CudaTensor class #42

mlxd commented Jul 1, 2021

github-actions bot commented Jul 1, 2021 •

edited

github-actions bot commented Jul 1, 2021 •

edited

Mandrenkov left a comment

github-actions bot commented Jul 7, 2021 •

edited

Mandrenkov left a comment

Enable GPU-only operations in CudaTensor class #42

Enable GPU-only operations in CudaTensor class #42

Conversation

mlxd commented Jul 1, 2021

github-actions bot commented Jul 1, 2021 • edited

Test Report (C++) on Ubuntu

github-actions bot commented Jul 1, 2021 • edited

Test Report (C++) on MacOS

Mandrenkov left a comment

Choose a reason for hiding this comment

github-actions bot commented Jul 7, 2021 • edited

Test Report (Python) on Ubuntu

Mandrenkov left a comment

Choose a reason for hiding this comment

github-actions bot commented Jul 1, 2021 •

edited

github-actions bot commented Jul 1, 2021 •

edited

github-actions bot commented Jul 7, 2021 •

edited