Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ITensorGPU][Bug] Restrict to CUDA.jl v4.0.1 to fix compatibility issues with cuTENSOR #1107

Merged
merged 11 commits into from Apr 3, 2023

Conversation

kmp5VT
Copy link
Collaborator

@kmp5VT kmp5VT commented Mar 31, 2023

Description

It looks like there is an issue with the updated version of cuTENSOR where it is not properly linking to the lib binaries. To get around that issue, use an older version of CUDA and cuTENSOR

@kmp5VT kmp5VT requested a review from mtfishman March 31, 2023 20:06
@mtfishman
Copy link
Member

mtfishman commented Mar 31, 2023

I was wondering what was going on with the CI, glad you figured out the cause.

Instead of changing the compat entries in ITensorGPU/test/Project.toml, I think it makes more sense to change them in ITensorGPU/Project.toml. Otherwise we are just fixing the tests, but we're still allowing people to install a version of cuTENSOR that we know breaks ITensorGPU.

I think this should work:

cuTENSOR = "1.0.0 - 1.0.1"

to not allow upgrading to the problematic cuTENSOR v1.0.2 (the notation for compat entries is specified here: https://pkgdocs.julialang.org/v1/compatibility/).

EDIT: The simpler fix implemented in this PR is restricting CUDA.jl compat to be below v4.1, i.e.:

CUDA = "4.0 - 4.0.1"

Also, I don't see any issues about this in CUDA.jl, should we raise an issue over there?

@codecov-commenter
Copy link

codecov-commenter commented Mar 31, 2023

Codecov Report

Merging #1107 (eecb117) into main (3f2ab60) will decrease coverage by 31.58%.
The diff coverage is n/a.

❗ Current head eecb117 differs from pull request most recent head 8a9938d. Consider uploading reports for the commit 8a9938d to get more accurate results

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff             @@
##             main    #1107       +/-   ##
===========================================
- Coverage   85.11%   53.53%   -31.58%     
===========================================
  Files          86       85        -1     
  Lines        8305     7770      -535     
===========================================
- Hits         7069     4160     -2909     
- Misses       1236     3610     +2374     

see 65 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@mtfishman mtfishman changed the title [ITensorGPU][Bug] Fix Jenkins using older CUDA/cuTENSOR [ITensorGPU][Bug] Restrict to CUDA.jl v4.0.1 to fix compatibility issues with cuTENSOR Apr 3, 2023
@mtfishman
Copy link
Member

@kshyatt FYI it looks like CUDA.jl v4.1 breaks compatibility with cuTENSOR. I think basically CUDA.jl v4.1 bumps to CUDA 12.1 [1] but CUDA 12.1 isn't supported by cuTENSOR [2].

[1] JuliaGPU/CUDA.jl#1793
[2] https://docs.nvidia.com/cuda/cutensor/index.html#support

@mtfishman
Copy link
Member

Looks like this issue reported about CUDA.jl v4 and cuDNN may be related: JuliaGPU/CUDA.jl#1850

@mtfishman mtfishman merged commit ef779f4 into ITensor:main Apr 3, 2023
8 checks passed
@kmp5VT kmp5VT deleted the kmp5/bug/fix_jenkins branch April 3, 2023 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants