Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudnn missing after downloading artifact #521

Closed
denizyuret opened this issue Nov 3, 2020 · 3 comments
Closed

cudnn missing after downloading artifact #521

denizyuret opened this issue Nov 3, 2020 · 3 comments

Comments

@denizyuret
Copy link
Contributor

We got some new machines on the cluster and when I try to build CUDA it says CUDNN and CUTENSOR are missing even though they seem to be downloaded. Any ideas what might be going wrong?

julia> CUDA.versioninfo()
CUDA.versioninfo()
Downloading artifact: CUDA110
Downloading artifact: CUDA110
Downloading artifact: CUDA102
Downloading artifact: CUDA102
Downloading artifact: CUDNN_CUDA101
Downloading artifact: CUDNN_CUDA101
Downloading artifact: CUTENSOR_CUDA101
Downloading artifact: CUTENSOR_CUDA101
CUDA toolkit 10.1.243, artifact installation
CUDA driver 11.0.0
NVIDIA driver 450.80.2

Libraries:
- CUBLAS: 10.2.0
- CURAND: 10.1.1
- CUFFT: 10.1.1
- CUSOLVER: 10.2.0
- CUSPARSE: 10.3.0
- CUPTI: 12.0.0
- NVML: 11.0.0+450.80.2
- CUDNN: missing
- CUTENSOR: missing

Toolchain:
- Julia: 1.5.2
- LLVM: 9.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4
- Device support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75

Environment:
- JULIA_CUDA_USE_BINARYBUILDER: true

1 device:
  0: Tesla V100-SXM2-32GB (sm_70, 31.385 GiB / 31.749 GiB available)

@maleadt
Copy link
Member

maleadt commented Nov 3, 2020

Which exact version of CUDA.jl?

@maleadt
Copy link
Member

maleadt commented Nov 3, 2020

Also, something is wrong here, as it's downloading all artifacts:

Downloading artifact: CUDA110
Downloading artifact: CUDA110
Downloading artifact: CUDA102
Downloading artifact: CUDA102

I'm guessing the downloads failed, and it ended up using the 10.1 installation I presume you have downloaded before. Then it continued to fail to download the CUDNN and CUTENSOR artifacts.

Try running with JULIA_DEBUG=CUDA, it should print about those failures:

@debug "Could not load the CUDA $(cuda.release) artifact" exception=(ex,catch_backtrace())

@denizyuret
Copy link
Contributor Author

You are right, the download was failing because of a firewall on the cluster workers. When I tried a build on the host node everything worked fine. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants