Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUBLAS fails on new CUDA.jl v4 #1852

Closed
marmarelis opened this issue Apr 3, 2023 · 10 comments
Closed

CUBLAS fails on new CUDA.jl v4 #1852

marmarelis opened this issue Apr 3, 2023 · 10 comments
Labels
bug Something isn't working needs information Further information is requested

Comments

@marmarelis
Copy link

Describe the bug

When any CUBLAS operation is invoked, it crashes. Non-CUBLAS operations seem to work fine. I've tried to use older CUDA runtimes to no avail.

To reproduce

The Minimal Working Example (MWE) for this bug:

using CUDA
a = randn(100,100)  |>  cu
a * a
ERROR: CUBLASError: the GPU program failed to execute (code 13, CUBLAS_STATUS_EXECUTION_FAILED)

Version info

CUDA.jl 4.1.3

Details on Julia:

Julia Version 1.8.0
Commit 5544a0fab76 (2022-08-17 13:38 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 72 × Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, cascadelake)
  Threads: 1 on 72 virtual cores

Details on CUDA:

CUDA runtime 11.8, artifact installation
CUDA driver 11.4
NVIDIA driver 470.57.2

Libraries: 
- CUBLAS: 11.5.1
- CURAND: 10.3.0
- CUFFT: 10.9.0
- CUSOLVER: 11.4.1
- CUSPARSE: 11.7.5
- CUPTI: 18.0.0
- NVML: 11.0.0+470.57.2

Toolchain:
- Julia: 1.8.0
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

4 devices:
  0: NVIDIA GeForce RTX 2080 Ti (sm_75, 10.757 GiB / 10.761 GiB available)
  1: NVIDIA GeForce RTX 2080 Ti (sm_75, 10.757 GiB / 10.761 GiB available)
  2: NVIDIA GeForce RTX 2080 Ti (sm_75, 10.757 GiB / 10.761 GiB available)
  3: NVIDIA GeForce RTX 2080 Ti (sm_75, 10.757 GiB / 10.761 GiB available)
@marmarelis marmarelis added the bug Something isn't working label Apr 3, 2023
@maleadt maleadt added the needs information Further information is requested label Apr 4, 2023
@maleadt
Copy link
Member

maleadt commented Apr 4, 2023

Please run with JULIA_DEBUG=CUBLAS and post the output.

@marmarelis
Copy link
Author

Here is the error output:

┌ Debug:  cuBLAS (v11.8) function cublasStatus_t cublasGetProperty(libraryPropertyType, int*) called:
│   type: type=SOME TYPE; val=0
│   value: type=int; val=POINTER (IN HEX:0x0x7f5075c79f90)
│  Time: 2023-04-05T09:44:37 elapsed from start 0.183333 minutes or 11.000000 seconds
│ Process=20286; Thread=139997437605696; GPU=0; Handle=POINTER (IN HEX:0x(nil))
│  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/is36v/lib/cublas/CUBLAS.jl:224
CUBLASError: ┌ Debug:  cuBLAS (v11.8) function cublasStatus_t cublasGetProperty(libraryPropertyType, int*) called:
│   type: type=SOME TYPE; val=1
│   value: type=int; val=POINTER (IN HEX:0x0x7f5075c79fa0)
│  Time: 2023-04-05T09:44:37 elapsed from start 0.183333 minutes or 11.000000 seconds
│ Process=20286; Thread=139997437605696; GPU=0; Handle=POINTER (IN HEX:0x(nil))
│  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/is36v/lib/cublas/CUBLAS.jl:224
the GPU program failed to execute┌ Debug:  cuBLAS (v11.8) function cublasStatus_t cublasGetProperty(libraryPropertyType, int*) called:
│   type: type=SOME TYPE; val=2
│   value: type=int; val=POINTER (IN HEX:0x0x7f5075c79fb0)
│  Time: 2023-04-05T09:44:37 elapsed from start 0.183333 minutes or 11.000000 seconds
│ Process=20286; Thread=139997437605696; GPU=0; Handle=POINTER (IN HEX:0x(nil))
│  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/is36v/lib/cublas/CUBLAS.jl:224
 (code 13, CUBLAS_STATUS_EXECUTION_FAILED)┌ Debug:  cuBLAS (v11.8) function cublasStatus_t cublasCreate_v2(cublasContext**) called:
│   handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x7ffff6596530)
│  Time: 2023-04-05T09:44:38 elapsed from start 0.200000 minutes or 12.000000 seconds
│ Process=20286; Thread=139997437605696; GPU=0; Handle=POINTER (IN HEX:0x(nil))
│  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/is36v/lib/cublas/CUBLAS.jl:224

Stacktrace:┌ Debug:  cuBLAS (v11.8) function cublasStatus_t cublasSetStream_v2(cublasHandle_t, cudaStream_t) called:
│   handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0xa2cf1a0)
│   streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0x1b49680)
│  Time: 2023-04-05T09:44:42 elapsed from start 0.266667 minutes or 16.000000 seconds
│ Process=20286; Thread=139997437605696; GPU=0; Handle=POINTER (IN HEX:0x0xa2cf1a0); StreamId=POINTER (IN HEX:0x(nil)) (defaultStream); MathMode=CUBLAS_DEFAULT_MATH
│  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/is36v/lib/cublas/CUBLAS.jl:224

┌ Debug:  cuBLAS (v11.8) function cublasStatus_t cublasGetProperty(libraryPropertyType, int*) called:
│   type: type=SOME TYPE; val=0
│   value: type=int; val=POINTER (IN HEX:0x0x7f5074134390)
│  Time: 2023-04-05T09:44:42 elapsed from start 0.266667 minutes or 16.000000 seconds
│ Process=20286; Thread=139997437605696; GPU=0; Handle=POINTER (IN HEX:0x(nil))
│  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/is36v/lib/cublas/CUBLAS.jl:224
 ┌ Debug:  cuBLAS (v11.8) function cublasStatus_t cublasGetProperty(libraryPropertyType, int*) called:
│   type: type=SOME TYPE; val=1
│   value: type=int; val=POINTER (IN HEX:0x0x7f50741343a0)
│  Time: 2023-04-05T09:44:42 elapsed from start 0.266667 minutes or 16.000000 seconds
│ Process=20286; Thread=139997437605696; GPU=0; Handle=POINTER (IN HEX:0x(nil))
│  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/is36v/lib/cublas/CUBLAS.jl:224
 [1]┌ Debug:  cuBLAS (v11.8) function cublasStatus_t cublasGetProperty(libraryPropertyType, int*) called:
│   type: type=SOME TYPE; val=2
│   value: type=int; val=POINTER (IN HEX:0x0x7f50741343b0)
│  Time: 2023-04-05T09:44:42 elapsed from start 0.266667 minutes or 16.000000 seconds
│ Process=20286; Thread=139997437605696; GPU=0; Handle=POINTER (IN HEX:0x(nil))
│  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/is36v/lib/cublas/CUBLAS.jl:224
 ┌ Debug:  cuBLAS (v11.8) function cublasStatus_t cublasGetProperty(libraryPropertyType, int*) called:
│   type: type=SOME TYPE; val=0
│   value: type=int; val=POINTER (IN HEX:0x0x7f50741343c0)
│  Time: 2023-04-05T09:44:42 elapsed from start 0.266667 minutes or 16.000000 seconds
│ Process=20286; Thread=139997437605696; GPU=0; Handle=POINTER (IN HEX:0x(nil))
│  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/is36v/lib/cublas/CUBLAS.jl:224
throw_api_error┌ Debug:  cuBLAS (v11.8) function cublasStatus_t cublasGetProperty(libraryPropertyType, int*) called:
│   type: type=SOME TYPE; val=1
│   value: type=int; val=POINTER (IN HEX:0x0x7f50741343d0)
│  Time: 2023-04-05T09:44:42 elapsed from start 0.266667 minutes or 16.000000 seconds
│ Process=20286; Thread=139997437605696; GPU=0; Handle=POINTER (IN HEX:0x(nil))
│  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/is36v/lib/cublas/CUBLAS.jl:224
(┌ Debug:  cuBLAS (v11.8) function cublasStatus_t cublasGetProperty(libraryPropertyType, int*) called:
│   type: type=SOME TYPE; val=2
│   value: type=int; val=POINTER (IN HEX:0x0x7f50741343e0)
│  Time: 2023-04-05T09:44:42 elapsed from start 0.266667 minutes or 16.000000 seconds
│ Process=20286; Thread=139997437605696; GPU=0; Handle=POINTER (IN HEX:0x(nil))
│  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/is36v/lib/cublas/CUBLAS.jl:224
res┌ Debug:  cuBLAS (v11.8) function cublasStatus_t cublasSetMathMode(cublasHandle_t, cublasMath_t) called:
│   handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0xa2cf1a0)
│   mode: type=cublasMath_t; val=CUBLAS_DEFAULT_MATH | CUBLAS_MATH_DISALLOW_REDUCED_PRECISION_REDUCTION(16)
│  Time: 2023-04-05T09:44:42 elapsed from start 0.266667 minutes or 16.000000 seconds
│ Process=20286; Thread=139997437605696; GPU=0; Handle=POINTER (IN HEX:0x0xa2cf1a0); StreamId=POINTER (IN HEX:0x0x1b49680); MathMode=CUBLAS_DEFAULT_MATH
│  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/is36v/lib/cublas/CUBLAS.jl:224
::┌ Debug:  cuBLAS (v11.8) function cublasStatus_t cublasGemmEx(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const void*, const void*, cudaDataType_t, int, const void*, cudaDataType_t, int, const vo
id*, void*, cudaDataType_t, int, cublasComputeType_t, cublasGemmAlgo_t) called:
│   handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0xa2cf1a0)
│   transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│   transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│   m: type=int; val=100
│   n: type=int; val=100
│   k: type=int; val=100
│   alpha: type=void; val=POINTER (IN HEX:0x0x7f5074134410)
│   A: type=void; val=POINTER (IN HEX:0x0x602000000)
│   Atype: type=cudaDataType_t; val=CUDA_R_32F(0)
│   lda: type=int; val=100
│   B: type=void; val=POINTER (IN HEX:0x0x602000000)
│   Btype: type=cudaDataType_t; val=CUDA_R_32F(0)
│   ldb: type=int; val=100
│   beta: type=void; val=POINTER (IN HEX:0x0x7f5074134420)
│   C: type=void; val=POINTER (IN HEX:0x0x602009e00)
│   Ctype: type=cudaDataType_t; val=CUDA_R_32F(0)
│   ldc: type=int; val=100
│   computeType: type=cublasComputeType_t; val=CUBLAS_COMPUTE_32F(68)
│   algo: type=SOME TYPE; val=CUBLAS_GEMM_DEFAULT(-1)
│  Time: 2023-04-05T09:44:42 elapsed from start 0.266667 minutes or 16.000000 seconds
│ Process=20286; Thread=139997437605696; GPU=0; Handle=POINTER (IN HEX:0x0xa2cf1a0); StreamId=POINTER (IN HEX:0x0x1b49680); MathMode=CUBLAS_DEFAULT_MATH | CUBLAS_MATH_DISALLOW_REDUCED_PRECISION_REDUCTION
│  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/is36v/lib/cublas/CUBLAS.jl:224
CUDA.CUBLAS.cublasStatus_t)
    @ CUDA.CUBLAS ~/.julia/packages/CUDA/is36v/lib/cublas/libcublas.jl:11
  [2] macro expansion
    @ ~/.julia/packages/CUDA/is36v/lib/cublas/libcublas.jl:24 [inlined]
  [3] cublasGemmEx(handle::Ptr{CUDA.CUBLAS.cublasContext}, transa::Char, transb::Char, m::Int64, n::Int64, k::Int64, alpha::Base.RefValue{Float32}, A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Atype::Type, lda::Int64, B:
:CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Btype::Type, ldb::Int64, beta::Base.RefValue{Float32}, C::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Ctype::Type, ldc::Int64, computeType::CUDA.CUBLAS.cublasComputeType_t, algo
::CUDA.CUBLAS.cublasGemmAlgo_t)
    @ CUDA.CUBLAS ~/.julia/packages/CUDA/is36v/lib/utils/call.jl:26
  [4] gemmEx!(transA::Char, transB::Char, alpha::Number, A::StridedCuVecOrMat, B::StridedCuVecOrMat, beta::Number, C::StridedCuVecOrMat; algo::CUDA.CUBLAS.cublasGemmAlgo_t)
    @ CUDA.CUBLAS ~/.julia/packages/CUDA/is36v/lib/cublas/wrappers.jl:897
  [5] gemmEx!
    @ ~/.julia/packages/CUDA/is36v/lib/cublas/wrappers.jl:875 [inlined]
  [6] gemm_dispatch!(C::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, B::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, alpha::Bool, beta::Bool)
    @ CUDA.CUBLAS ~/.julia/packages/CUDA/is36v/lib/cublas/linalg.jl:298
  [7] mul!
    @ ~/.julia/packages/CUDA/is36v/lib/cublas/linalg.jl:309 [inlined]
  [8] mul!
    @ ~/julia-1.8.0/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl:276 [inlined]
  [9] *(A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, B::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
    @ LinearAlgebra ~/julia-1.8.0/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl:148
 [10] top-level scope
    @ REPL[3]:1
 [11] top-level scope
    @ ~/.julia/packages/CUDA/is36v/src/initialization.jl:162

@maleadt
Copy link
Member

maleadt commented Apr 7, 2023

┌ Debug:  cuBLAS (v11.8) function cublasStatus_t cublasGemmEx(cublasHandle_t, cublasOperation_t, cublasOperation_t, int, int, int, const void*, const void*, cudaDataType_t, int, const void*, cudaDataType_t, int, const vo
id*, void*, cudaDataType_t, int, cublasComputeType_t, cublasGemmAlgo_t) called:
│   handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0xa2cf1a0)
│   transa: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│   transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
│   m: type=int; val=100
│   n: type=int; val=100
│   k: type=int; val=100
│   alpha: type=void; val=POINTER (IN HEX:0x0x7f5074134410)
│   A: type=void; val=POINTER (IN HEX:0x0x602000000)
│   Atype: type=cudaDataType_t; val=CUDA_R_32F(0)
│   lda: type=int; val=100
│   B: type=void; val=POINTER (IN HEX:0x0x602000000)
│   Btype: type=cudaDataType_t; val=CUDA_R_32F(0)
│   ldb: type=int; val=100
│   beta: type=void; val=POINTER (IN HEX:0x0x7f5074134420)
│   C: type=void; val=POINTER (IN HEX:0x0x602009e00)
│   Ctype: type=cudaDataType_t; val=CUDA_R_32F(0)
│   ldc: type=int; val=100
│   computeType: type=cublasComputeType_t; val=CUBLAS_COMPUTE_32F(68)
│   algo: type=SOME TYPE; val=CUBLAS_GEMM_DEFAULT(-1)
│  Time: 2023-04-05T09:44:42 elapsed from start 0.266667 minutes or 16.000000 seconds
│ Process=20286; Thread=139997437605696; GPU=0; Handle=POINTER (IN HEX:0x0xa2cf1a0); StreamId=POINTER (IN HEX:0x0x1b49680); MathMode=CUBLAS_DEFAULT_MATH | CUBLAS_MATH_DISALLOW_REDUCED_PRECISION_REDUCTION
│  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/is36v/lib/cublas/CUBLAS.jl:224

Nothing looks wrong with that. Maybe try upgrading your NVIDIA driver; 11.4 is pretty old.

@bjarthur
Copy link
Contributor

i can NOT reproduce with:

julia> CUDA.versioninfo()
CUDA runtime 11.8, artifact installation
CUDA driver 12.1
NVIDIA driver 530.30.2

Libraries: 
- CUBLAS: 11.11.3
- CURAND: 10.3.0
- CUFFT: 10.9.0
- CUSOLVER: 11.4.1
- CUSPARSE: 11.7.5
- CUPTI: 18.0.0
- NVML: 12.0.0+530.30.2

Toolchain:
- Julia: 1.8.5
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
- Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

2 devices:
  0: NVIDIA TITAN RTX (sm_75, 23.246 GiB / 24.000 GiB available)
  1: NVIDIA TITAN RTX (sm_75, 23.636 GiB / 24.000 GiB available)

@oscarvdvelde
Copy link

oscarvdvelde commented Jun 5, 2023

Not sure it is useful for you, as I came to this thread because my CUDA.jl test fails on CUBLAS 22 times (e.g.
Test Failed at C:\Users\oscar\.julia\packages\CUDA\pCcGc\test\cublas.jl:1649 Expression: ≈(C, Array(dC), rtol = rtol)
Maybe I need a clean install because it could not update certain packages.

However, when I run the a * a test provided here, it succeeds.

 Info: System information:
│ CUDA runtime 11.8, artifact installation
│ CUDA driver 11.4
│ NVIDIA driver 471.86.0
│ 
│ CUDA libraries:
│ - CUBLAS: 11.11.3
│ - CURAND: 10.3.0
│ - CUFFT: 10.9.0
│ - CUSOLVER: 11.4.1
│ - CUSPARSE: 11.7.5
│ - CUPTI: 18.0.0
│ - NVML: 11.0.0+471.86
│ 
│ Julia packages:
│ - CUDA.jl: 4.3.2
│ - CUDA_Driver_jll: 0.5.0+1
│ - CUDA_Runtime_jll: 0.6.0+0
│ - CUDA_Runtime_Discovery: 0.2.2
│ 
│ Toolchain:
│ - Julia: 1.9.0
│ - LLVM: 14.0.6
│ - PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4
│ - Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86
│ 
│ 1 device:
└   0: NVIDIA RTX A3000 Laptop GPU (sm_86, 5.875 GiB / 6.000 GiB available)

@maleadt
Copy link
Member

maleadt commented Jun 5, 2023

Can you verify you don't have LD_LIBRARY_PATH set?

@oscarvdvelde
Copy link

oscarvdvelde commented Jun 5, 2023

Can you verify you don't have LD_LIBRARY_PATH set?

julia> ENV["LD_LIBRARY_PATH"]
ERROR: KeyError: key "LD_LIBRARY_PATH" not found

@maleadt
Copy link
Member

maleadt commented Jun 5, 2023

Can you try upgrading your driver so that CUDA 12.1 can be installed? Maybe this is a CUBLAS bug that got fixed.

@oscarvdvelde
Copy link

oscarvdvelde commented Jun 5, 2023

Upgraded the driver (Asus and NVIDIA software did not suggest any new updates, so I downloaded it) and made a fresh project environment in which I added only CUDA.

The test CUDA completed successfully!

⌅ [856f044c] MKL_jll v2022.2.0+0
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading.  
Precompiling project...
  31 dependencies successfully precompiled in 65 seconds. 29 already precompiled.
     Testing Running tests...
┌ Info: System information:
│ CUDA runtime 12.1, artifact installation
│ CUDA driver 12.0
│ NVIDIA driver 528.2.0
│ 
│ CUDA libraries: 
│ - CUBLAS: 12.1.3
│ - CURAND: 10.3.2
│ - CUFFT: 11.0.2
│ - CUSOLVER: 11.4.5
│ - CUSPARSE: 12.1.0
│ - CUPTI: 18.0.0
│ - NVML: 12.0.0+528.2
│ 
│ Julia packages:
│ - CUDA.jl: 4.3.2
│ - CUDA_Driver_jll: 0.5.0+1
│ - CUDA_Runtime_jll: 0.6.0+0
│ - CUDA_Runtime_Discovery: 0.2.2
│ 
│ Toolchain:
│ - Julia: 1.9.0
│ - LLVM: 14.0.6
│ - PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
│ - Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86
│ 
│ 1 device:
└   0: NVIDIA RTX A3000 Laptop GPU (sm_86, 5.863 GiB / 6.000 GiB available)
Testing finished in 10 minutes, 23 seconds, 641 milliseconds

Test Summary: |  Pass  Broken  Total  Time
  Overall     | 20340       9  20349
    SUCCESS
     Testing CUDA tests passed 

It did give a warning:

cufft                                         (8) |    41.69 |   0.01 |  0.0 |     144.27 |      N/A |   3.38 |  8.1 |    1936.71 |  4226.76 |
      From worker 7:    WARNING: Method definition #4326#kernel(Any) in module Main at C:\Users\oscar\.julia\packages\CUDA\pCcGc\test\execution.jl:321 overwritten at C:\Users\oscar\.julia\packages\CUDA\pCcGc\test\execution.jl:329.

The a*a test works also. It seems the problems with cublas are resolved.

@maleadt
Copy link
Member

maleadt commented Jun 5, 2023

That's great! I guess we'll chalk this up to a CUDA/CUBLAS bug, which isn't unreasonable for Lovelace/Hopper (i.e. fairly recent) hardware on older CUDA toolkits.

If this would still reproduce on recent toolkits, feel free to comment or open a new issue.

@maleadt maleadt closed this as completed Jun 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs information Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants