-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gemmEx on sm_52 results in CUBLAS_STATUS_ARCH_MISMATCH #609
Comments
Can you run the CUBLAS tests with
This will generate a lot of output. |
Thanks @maleadt, please see attached for outputs from the 980ti and 3090. |
Those logs don't actually contain any test results, they seem to have hung as you describe. Can you try the following instead (you may have to install some of the dependencies here): julia> ENV["JULIA_DEBUG"] = "CUBLAS"
"CUBLAS"
julia> using CUDA
julia> using LinearAlgebra, Test, BFloat16s
julia> m = 20
20
julia> n = 35
35
julia> k = 13
13
julia> Base.eps(::Type{BFloat16}) = Base.bitcast(BFloat16, 0x3c00)
julia> @testset "mixed-precision matmul" begin
m,k,n = 4,4,4
cudaTypes = (Float16, Complex{Float16}, BFloat16, Complex{BFloat16}, Float32, Complex{Float32},
Float64, Complex{Float64}, Int8, Complex{Int8}, UInt8, Complex{UInt8},
Int16, Complex{Int16}, UInt16, Complex{UInt16}, Int32, Complex{Int32},
UInt32, Complex{UInt32}, Int64, Complex{Int64}, UInt64, Complex{UInt64})
for AT in cudaTypes, CT in cudaTypes
BT = AT # gemmEx requires identical A and B types
# we only test combinations of types that are supported by gemmEx
if CUBLAS.gemmExComputeType(AT, BT, CT, m,k,n) !== nothing
A = AT <: BFloat16 ? AT.(rand(m,k)) : rand(AT, m,k)
B = BT <: BFloat16 ? BT.(rand(k,n)) : rand(BT, k,n)
C = similar(B, CT)
mul!(C, A, B)
# Base can't do Int8*Int8 without losing accuracy
if (AT == Int8 && BT == Int8) || (AT == Complex{Int8} && BT == Complex{Int8})
C = CT.(A) * CT.(B)
end
dA = CuArray(A)
dB = CuArray(B)
dC = similar(dB, CT)
mul!(dC, dA, dB)
rtol = Base.rtoldefault(AT, BT, 0)
@test C ≈ Array(dC) rtol=rtol
end
end
end |
ok, thanks. gpu is tied up right now but ill try to post these in a day or so |
And FWIW, after #649 it should now also be possible to run the entire CUBLAS test suite under |
I have the same issue. I found this forum-post, maybe it is related? Manifest.toml
Version info Details on Julia (compiled Julia from sources, Ubuntu 20.04):
Details on CUDA:
|
No, that post details CUBLAS_STATUS_NOT_SUPPORTED for Int8 multiplication. You are getting CUBLAS_STATUS_ARCH_MISMATCH for a Float16=Float16*Float16 GEMM. This looks like a CUBLAS bug... You could try upgrading to CUDA 11.2 by using the CUDA.jl master branch. |
Thank you, that works! (Julia 1.6.0-rc1, CUDA.jl from master, CUDA 11.2.0) |
It looks like several people are running into this, but I don't have a clear debug log yet that points to the failing invocation of |
Coming here from #722 and feeling silly now because repro:
Notably, Environment info:
Using 11.2:
Using 11.1:
Using local 11.2 toolkit:
|
Well this is absurd; a plain Float32-based GEMM with no special flags seems to be failing here. Could you list your The fact that your local CUDA toolkit 11.2 Update 1 doesn't work is expected; support for that version of CUDA only lives on the CUDA.jl master branch (so you could try |
Oh okay, thanks for the explanation! Without setting any environment variables, this is what gets installed by default:
Setting JULIA_CUDA_VERSION=11.2 gives this working setup:
Interestingly, some of the libraries have different version numbers but CUBLAS is not one of them. ... even more interestingly the CUDNN and CUTENSOR are missing in the working setup lmao. Is it using the system-installed CUDNN/CUTENSOR in that case? That might explain it since I'm on driver 11.3. |
That's what I meant with:
But yeah it's funny that CUBLAS doesn't change between 11.2 and 11.2... That's not correct; what I get from the release notes is encoded here: Lines 204 to 231 in a34e69e
And indeed I get:
I wonder if its picking up |
Ah that makes sense, I'm getting CUBLAS to work at the cost of no CUDNN/CUTENSOR availability. And indeed libcublas is exactly as you say:
And yet we still have |
Can you show the output of |
edit: all 62 elements of the vector It's picking up the right libcublas but also libcublasLt?
|
|
It works :)
|
Thanks for the confirmation, I've pushed a similar fix here: #729. Could you please test? |
Probably dumb question, but how do I test this at the head of the repo? An |
@cchan You probably want to do |
Ah works great now 😋 |
Describe the bug
Pkg.test("CUDA") cublas errors
To reproduce
Manifest.toml
Expected behavior
All tests should pass
Version info
Details on Julia:
Details on CUDA:
Output [install via Artifacts and run Pkg.test("CUDA")]:
The text was updated successfully, but these errors were encountered: