import CuArrays
always fails with CUDA 10.2.89 (but works fine with CUDA 10.0.130 and 10.1.105)
#601
Comments
import CuArrays
always fails with CUDA 10.2 (but works fine with CUDA 10.0.130 and 10.1.105)import CuArrays
always fails with CUDA 10.2.89 (but works fine with CUDA 10.0.130 and 10.1.105)
cc: @maleadt |
The relevant code is here: Lines 69 to 98 in 6feb28f
There isn't any mention about searching for libcublas in the debug output, which is strange given how the dlopen subsequently fails. The initial value for The difference between CUDA versions is probably caused by |
WIth CUDA 10.0.130: julia> import Libdl
julia> Libdl.dlopen("libcublas")
Ptr{Nothing} @0x00000000013c7360
julia> Libdl.dlopen_e("libcublas")
Ptr{Nothing} @0x00000000013c7360 WIth CUDA 10.1.105: julia> import Libdl
julia> Libdl.dlopen("libcublas")
Ptr{Nothing} @0x00000000012f0300
julia> Libdl.dlopen_e("libcublas")
Ptr{Nothing} @0x00000000012f0300 WIth CUDA 10.2.89 julia> import Libdl
julia> Libdl.dlopen("libcublas")
ERROR: could not load library "libcublas"
libcublas.so: cannot open shared object file: No such file or directory
Stacktrace:
[1] dlopen(::String, ::UInt32; throw_error::Bool) at /gpfs_home/daluthge/dev/JuliaLang/julia/usr/share/julia/stdlib/v1.5/Libdl/src/Libdl.jl:109
[2] dlopen at /gpfs_home/daluthge/dev/JuliaLang/julia/usr/share/julia/stdlib/v1.5/Libdl/src/Libdl.jl:109 [inlined] (repeats 2 times)
[3] top-level scope at REPL[2]:1
julia> Libdl.dlopen_e("libcublas")
Ptr{Nothing} @0x0000000000000000
julia> Libdl.dlopen_e("libcublas") == C_NULL
true
julia> Libdl.dlopen_e("libcublas") === C_NULL
true |
Confirms my suspicion, but doesn't explain the lack of debug info. Could you run the following with JULIA_DEBUG=CUDAapi:
|
julia> ENV["JULIA_DEBUG"] = "CUDAapi"
"CUDAapi"
julia> import CUDAapi
julia> import CUDAnative
┌ Warning: Incompatibility detected between CUDA and LLVM 8.0+; disabling debug info emission for CUDA kernels
└ @ CUDAnative ~/.julia/packages/CUDAnative/hfulr/src/CUDAnative.jl:114
┌ Debug: Looking for CUDA toolkit via environment variables CUDA_HOME
└ @ CUDAapi ~/.julia/packages/CUDAapi/wYUAO/src/CUDAapi.jl:8
┌ Debug: Request to look for binary nvdisasm
│ locations =
│ 1-element Array{String,1}:
│ "/gpfs/runtime/opt/cuda/10.2/cuda"
└ @ CUDAapi ~/.julia/packages/CUDAapi/wYUAO/src/CUDAapi.jl:8
┌ Debug: Looking for binary nvdisasm
│ locations =
│ 20-element Array{String,1}:
│ "/gpfs/runtime/opt/cuda/10.2/cuda"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/bin"
│ "/gpfs/runtime/opt/gcc/8.3/bin"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/bin"
│ "/users/daluthge/bin"
│ "/gpfs/runtime/opt/python/3.7.4/bin"
│ "/gpfs/runtime/opt/git/2.20.2/bin"
│ "/gpfs/runtime/opt/binutils/2.31/bin"
│ ⋮
│ "/usr/bin"
│ "/usr/local/sbin"
│ "/usr/sbin"
│ "/usr/lpp/mmfs/bin"
│ "/usr/lpp/mmfs/sbin"
│ "/opt/ibutils/bin"
│ "/gpfs/runtime/bin"
└ @ CUDAapi ~/.julia/packages/CUDAapi/wYUAO/src/CUDAapi.jl:8
┌ Debug: Found binary nvdisasm at /gpfs/runtime/opt/cuda/10.2/cuda/bin
└ @ CUDAapi ~/.julia/packages/CUDAapi/wYUAO/src/discovery.jl:141
┌ Debug: CUDA toolkit identified as 10.2.89
└ @ CUDAapi ~/.julia/packages/CUDAapi/wYUAO/src/discovery.jl:297
┌ Debug: Request to look for libdevice
│ locations =
│ 1-element Array{String,1}:
│ "/gpfs/runtime/opt/cuda/10.2/cuda"
└ @ CUDAapi ~/.julia/packages/CUDAapi/wYUAO/src/CUDAapi.jl:8
┌ Debug: Look for libdevice
│ locations =
│ 2-element Array{String,1}:
│ "/gpfs/runtime/opt/cuda/10.2/cuda"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/nvvm/libdevice"
└ @ CUDAapi ~/.julia/packages/CUDAapi/wYUAO/src/CUDAapi.jl:8
┌ Debug: Found unified device library at /gpfs/runtime/opt/cuda/10.2/cuda/nvvm/libdevice/libdevice.10.bc
└ @ CUDAapi ~/.julia/packages/CUDAapi/wYUAO/src/discovery.jl:327
┌ Debug: Request to look for libcudadevrt
│ locations =
│ 1-element Array{String,1}:
│ "/gpfs/runtime/opt/cuda/10.2/cuda"
└ @ CUDAapi ~/.julia/packages/CUDAapi/wYUAO/src/CUDAapi.jl:8
┌ Debug: Looking for CUDA device runtime library libcudadevrt.a
│ locations =
│ 3-element Array{String,1}:
│ "/gpfs/runtime/opt/cuda/10.2/cuda"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/lib"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/lib64"
└ @ CUDAapi ~/.julia/packages/CUDAapi/wYUAO/src/CUDAapi.jl:8
┌ Debug: Found CUDA device runtime library libcudadevrt.a at /gpfs/runtime/opt/cuda/10.2/cuda/lib64
└ @ CUDAapi ~/.julia/packages/CUDAapi/wYUAO/src/discovery.jl:379
┌ Debug: Request to look for library nvToolsExt
│ locations =
│ 1-element Array{String,1}:
│ "/gpfs/runtime/opt/cuda/10.2/cuda"
└ @ CUDAapi ~/.julia/packages/CUDAapi/wYUAO/src/CUDAapi.jl:8
┌ Debug: Looking for library libnvToolsExt.so, libnvToolsExt.so.1, libnvToolsExt.so.1.0
│ locations =
│ 4-element Array{String,1}:
│ "/gpfs/runtime/opt/cuda/10.2/cuda"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/lib"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/lib64"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/libx64"
└ @ CUDAapi ~/.julia/packages/CUDAapi/wYUAO/src/CUDAapi.jl:8
┌ Debug: Found library libnvToolsExt.so at /gpfs/runtime/opt/cuda/10.2/cuda/lib64
└ @ CUDAapi ~/.julia/packages/CUDAapi/wYUAO/src/discovery.jl:90
┌ Debug: Request to look for library cupti
│ locations =
│ 2-element Array{String,1}:
│ "/gpfs/runtime/opt/cuda/10.2/cuda"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/extras/CUPTI"
└ @ CUDAapi ~/.julia/packages/CUDAapi/wYUAO/src/CUDAapi.jl:8
┌ Debug: Looking for library libcupti.so, libcupti.so.10, libcupti.so.10.2
│ locations =
│ 8-element Array{String,1}:
│ "/gpfs/runtime/opt/cuda/10.2/cuda"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/lib"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/lib64"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/libx64"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/extras/CUPTI"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/extras/CUPTI/lib"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/extras/CUPTI/lib64"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/extras/CUPTI/libx64"
└ @ CUDAapi ~/.julia/packages/CUDAapi/wYUAO/src/CUDAapi.jl:8
┌ Debug: Found library libcupti.so at /gpfs/runtime/opt/cuda/10.2/cuda/extras/CUPTI/lib64
└ @ CUDAapi ~/.julia/packages/CUDAapi/wYUAO/src/discovery.jl:90
julia> CUDAnative.prefix()
2-element Array{String,1}:
"/gpfs/runtime/opt/cuda/10.2/cuda"
"/gpfs/runtime/opt/cuda/10.2/cuda/extras/CUPTI"
julia> CUDAnative.version()
v"10.2.89"
julia> CUDAapi.find_cuda_library("cublas", CUDAnative.prefix(), [CUDAnative.version()])
┌ Debug: Request to look for library cublas
│ locations =
│ 2-element Array{String,1}:
│ "/gpfs/runtime/opt/cuda/10.2/cuda"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/extras/CUPTI"
└ @ CUDAapi ~/.julia/packages/CUDAapi/wYUAO/src/CUDAapi.jl:8
┌ Debug: Looking for library libcublas.so, libcublas.so.10, libcublas.so.10.2
│ locations =
│ 8-element Array{String,1}:
│ "/gpfs/runtime/opt/cuda/10.2/cuda"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/lib"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/lib64"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/libx64"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/extras/CUPTI"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/extras/CUPTI/lib"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/extras/CUPTI/lib64"
│ "/gpfs/runtime/opt/cuda/10.2/cuda/extras/CUPTI/libx64"
└ @ CUDAapi ~/.julia/packages/CUDAapi/wYUAO/src/CUDAapi.jl:8 |
So it returns |
Yeah it returns $ find /gpfs/runtime/opt/cuda/10.2/ -name "*blas*"
/gpfs/runtime/opt/cuda/10.2/cuda/doc/man/man7/libcublas.7
/gpfs/runtime/opt/cuda/10.2/cuda/doc/man/man7/libcublas.so.7
/gpfs/runtime/opt/cuda/10.2/cuda/doc/html/cublas
/gpfs/runtime/opt/cuda/10.2/cuda/doc/html/cublas/graphics/cublasmg_gemm.jpg
/gpfs/runtime/opt/cuda/10.2/cuda/doc/html/nvblas
/gpfs/runtime/opt/cuda/10.2/cuda/targets/x86_64-linux/include/cublas_api.h
/gpfs/runtime/opt/cuda/10.2/cuda/targets/x86_64-linux/include/nvblas.h
/gpfs/runtime/opt/cuda/10.2/cuda/targets/x86_64-linux/include/cublas.h
/gpfs/runtime/opt/cuda/10.2/cuda/targets/x86_64-linux/include/cublas_v2.h
/gpfs/runtime/opt/cuda/10.2/cuda/targets/x86_64-linux/include/cublasLt.h
/gpfs/runtime/opt/cuda/10.2/cuda/targets/x86_64-linux/include/cublasXt.h
/gpfs/runtime/opt/cuda/10.2/src/include/cublas_api.h
/gpfs/runtime/opt/cuda/10.2/src/include/nvblas.h
/gpfs/runtime/opt/cuda/10.2/src/include/cublas.h
/gpfs/runtime/opt/cuda/10.2/src/include/cublas_v2.h
/gpfs/runtime/opt/cuda/10.2/src/include/cublasLt.h
/gpfs/runtime/opt/cuda/10.2/src/include/cublasXt.h
/gpfs/runtime/opt/cuda/10.2/src/lib64/libcublasLt.so
/gpfs/runtime/opt/cuda/10.2/src/lib64/libcublas.so.10.2.2.89
/gpfs/runtime/opt/cuda/10.2/src/lib64/libnvblas.so.10.2.2.89
/gpfs/runtime/opt/cuda/10.2/src/lib64/libcublasLt.so.10
/gpfs/runtime/opt/cuda/10.2/src/lib64/libcublas_static.a
/gpfs/runtime/opt/cuda/10.2/src/lib64/libcublas.so.10
/gpfs/runtime/opt/cuda/10.2/src/lib64/libcublasLt_static.a
/gpfs/runtime/opt/cuda/10.2/src/lib64/stubs/libcublasLt.so
/gpfs/runtime/opt/cuda/10.2/src/lib64/stubs/libcublas.so
/gpfs/runtime/opt/cuda/10.2/src/lib64/libnvblas.so.10
/gpfs/runtime/opt/cuda/10.2/src/lib64/libcublasLt.so.10.2.2.89
/gpfs/runtime/opt/cuda/10.2/src/lib64/libnvblas.so
/gpfs/runtime/opt/cuda/10.2/src/lib64/libcublas.so
$ find /gpfs/runtime/opt/cuda/10.1.105/ -name "*blas*"
/gpfs/runtime/opt/cuda/10.1.105/cuda/doc/man/man7/libcublas.7
/gpfs/runtime/opt/cuda/10.1.105/cuda/doc/man/man7/libcublas.so.7
/gpfs/runtime/opt/cuda/10.1.105/cuda/doc/html/cublas
/gpfs/runtime/opt/cuda/10.1.105/cuda/doc/html/cublas/graphics/cublasmg_gemm.jpg
/gpfs/runtime/opt/cuda/10.1.105/cuda/doc/html/nvblas
/gpfs/runtime/opt/cuda/10.1.105/cuda/targets/x86_64-linux/include/cublas_api.h
/gpfs/runtime/opt/cuda/10.1.105/cuda/targets/x86_64-linux/include/nvblas.h
/gpfs/runtime/opt/cuda/10.1.105/cuda/targets/x86_64-linux/include/cublas.h
/gpfs/runtime/opt/cuda/10.1.105/cuda/targets/x86_64-linux/include/cublas_v2.h
/gpfs/runtime/opt/cuda/10.1.105/cuda/targets/x86_64-linux/include/cublasLt.h
/gpfs/runtime/opt/cuda/10.1.105/cuda/targets/x86_64-linux/include/cublasXt.h
/gpfs/runtime/opt/cuda/10.1.105/cuda/targets/x86_64-linux/lib/libcublasLt.so
/gpfs/runtime/opt/cuda/10.1.105/cuda/targets/x86_64-linux/lib/libnvblas.so.10.1.0.105
/gpfs/runtime/opt/cuda/10.1.105/cuda/targets/x86_64-linux/lib/libcublasLt.so.10.1.0.105
/gpfs/runtime/opt/cuda/10.1.105/cuda/targets/x86_64-linux/lib/libcublasLt.so.10
/gpfs/runtime/opt/cuda/10.1.105/cuda/targets/x86_64-linux/lib/libcublas_static.a
/gpfs/runtime/opt/cuda/10.1.105/cuda/targets/x86_64-linux/lib/libcublas.so.10
/gpfs/runtime/opt/cuda/10.1.105/cuda/targets/x86_64-linux/lib/libcublasLt_static.a
/gpfs/runtime/opt/cuda/10.1.105/cuda/targets/x86_64-linux/lib/stubs/libcublasLt.so
/gpfs/runtime/opt/cuda/10.1.105/cuda/targets/x86_64-linux/lib/stubs/libcublas.so
/gpfs/runtime/opt/cuda/10.1.105/cuda/targets/x86_64-linux/lib/libnvblas.so.10
/gpfs/runtime/opt/cuda/10.1.105/cuda/targets/x86_64-linux/lib/libcublas.so.10.1.0.105
/gpfs/runtime/opt/cuda/10.1.105/cuda/targets/x86_64-linux/lib/libnvblas.so
/gpfs/runtime/opt/cuda/10.1.105/cuda/targets/x86_64-linux/lib/libcublas.so
$ find /gpfs/runtime/opt/cuda/10.0.130/ -name "*blas*"
/gpfs/runtime/opt/cuda/10.0.130/cuda/include/cublas_api.h
/gpfs/runtime/opt/cuda/10.0.130/cuda/include/nvblas.h
/gpfs/runtime/opt/cuda/10.0.130/cuda/include/cublas.h
/gpfs/runtime/opt/cuda/10.0.130/cuda/include/cublas_v2.h
/gpfs/runtime/opt/cuda/10.0.130/cuda/include/cublasXt.h
/gpfs/runtime/opt/cuda/10.0.130/cuda/lib64/libnvblas.so.10.0
/gpfs/runtime/opt/cuda/10.0.130/cuda/lib64/libcublas_static.a
/gpfs/runtime/opt/cuda/10.0.130/cuda/lib64/stubs/libcublas.so
/gpfs/runtime/opt/cuda/10.0.130/cuda/lib64/libnvblas.so
/gpfs/runtime/opt/cuda/10.0.130/cuda/lib64/libnvblas.so.10.0.130
/gpfs/runtime/opt/cuda/10.0.130/cuda/lib64/libcublas.so.10.0.130
/gpfs/runtime/opt/cuda/10.0.130/cuda/lib64/libcublas.so.10.0
/gpfs/runtime/opt/cuda/10.0.130/cuda/lib64/libcublas.so
/gpfs/runtime/opt/cuda/10.0.130/cuda/doc/man/man7/libcublas.7
/gpfs/runtime/opt/cuda/10.0.130/cuda/doc/man/man7/libcublas.so.7
/gpfs/runtime/opt/cuda/10.0.130/cuda/doc/html/cublas
/gpfs/runtime/opt/cuda/10.0.130/cuda/doc/html/cublas/graphics/cublasmg_gemm.jpg
/gpfs/runtime/opt/cuda/10.0.130/cuda/doc/html/nvblas
/gpfs/runtime/opt/cuda/10.0.130/cuda/pkgconfig/cublas-10.0.pc |
Why is it in the |
OK, this is your cluster being set-up all weird. |
You have to load the different environments with "modules". So e.g. when I run So presumably the cluster admins need to do something similar for their |
While I wait for them, is there any way I can point Julia directly to the location of |
Define LD_LIBRARY_PATH yourself? CUDAapi doesn't support per-library overrides, but picks up whatever is loadable out of the box. Alternatively, if that src directory contains the entire toolkit, you can set CUDA_ROOT to force the prefix to that directory. |
Unfortunately it does not. It only contains the All the other libraries (e.g. Seems like maybe a botched install of CUDA by the cluster admins? It's unclear to me why you would move those specific libraries to a separate location. |
I'll probably just use their CUDA 10.0 or 10.1 install (since those seem to be installed correctly) until they fix it. |
Summary
I am unable to run
import CuArrays
with CUDA 10.2.89. However, I am able to successfully runimport CuArrays
with either CUDA 10.0.130 or CUDA 10.1.105 on the same cluster. (This is an HPC cluster with multiple different versions of CUDA available.)The error I get looks like this:
How to reproduce
First run these commands in Bash:
Then open Julia and run the following:
Full output
CUDA 10.2.89: (fails)
Click to expand
CUDA 10.1.105: (works fine)
Click to expand
CUDA 10.0.130: (works fine)
Click to expand
The text was updated successfully, but these errors were encountered: