-
Notifications
You must be signed in to change notification settings - Fork 258
Closed
JuliaLang/julia
#39756Labels
good first issueGood for newcomersGood for newcomersneeds testsTests are requested.Tests are requested.
Description
Describe the bug
CUDA.pinv does not work on a CuArray variable (stored in GPU memory). It works fine on a Matrix variable (stored in CPU memory)
To reproduce
a=CUDA.rand(Float32,(4,4))
CUDA.pinv(a)Actual results:
ERROR: GPU compilation of kernel broadcast_kernel(CUDA.CuKernelContext, CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(*), Tuple{Base.Broadcast.Extruded{Vector{Float32}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64) failed
KernelError: passing and using non-bitstype argument
Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(*), Tuple{Base.Broadcast.Extruded{Vector{Float32}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, which is not isbits:
.args is of type Tuple{Base.Broadcast.Extruded{Vector{Float32}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}} which
is not isbits.
.1 is of type Base.Broadcast.Extruded{Vector{Float32}, Tuple{Bool}, Tuple{Int64}} which is not isbits.
.x is of type Vector{Float32} which is not isbits.
Stacktrace:
[1] check_invocation(job::GPUCompiler.CompilerJob, entry::LLVM.Function)
@ GPUCompiler C:\Users\User\.julia\packages\GPUCompiler\8sSXl\src\validation.jl:66
[2] macro expansion
@ C:\Users\User\.julia\packages\GPUCompiler\8sSXl\src\driver.jl:301 [inlined]
[3] macro expansion
@ C:\Users\User\.julia\packages\TimerOutputs\4QAIk\src\TimerOutput.jl:206 [inlined]
[4] macro expansion
@ C:\Users\User\.julia\packages\GPUCompiler\8sSXl\src\driver.jl:300 [inlined]
[5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module, kernel::LLVM.Function; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
@ GPUCompiler C:\Users\User\.julia\packages\GPUCompiler\8sSXl\src\utils.jl:62
[6] cufunction_compile(job::GPUCompiler.CompilerJob)
@ CUDA C:\Users\User\.julia\packages\CUDA\k52QH\src\compiler\execution.jl:301
[7] check_cache
@ C:\Users\User\.julia\packages\GPUCompiler\8sSXl\src\cache.jl:47 [inlined]
[8] cached_compilation
@ C:\Users\User\.julia\packages\GPUArrays\0ShDd\src\host\broadcast.jl:57 [inlined]
[9] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#16", Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(*), Tuple{Base.Broadcast.Extruded{Vector{Float32}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64}}}, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
@ GPUCompiler C:\Users\User\.julia\packages\GPUCompiler\8sSXl\src\cache.jl:0
[10] cufunction(f::GPUArrays.var"#broadcast_kernel#16", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(*), Tuple{Base.Broadcast.Extruded{Vector{Float32}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ CUDA C:\Users\User\.julia\packages\CUDA\k52QH\src\compiler\execution.jl:289
[11] cufunction
@ C:\Users\User\.julia\packages\CUDA\k52QH\src\compiler\execution.jl:283 [inlined]
[12] macro expansion
@ C:\Users\User\.julia\packages\CUDA\k52QH\src\compiler\execution.jl:102 [inlined]
[13] #launch_heuristic#309
@ C:\Users\User\.julia\packages\CUDA\k52QH\src\gpuarrays.jl:17 [inlined]
[14] launch_heuristic
@ C:\Users\User\.julia\packages\CUDA\k52QH\src\gpuarrays.jl:17 [inlined]
[15] copyto!
@ C:\Users\User\.julia\packages\GPUArrays\0ShDd\src\host\broadcast.jl:63 [inlined]
[16] copyto!
@ .\broadcast.jl:936 [inlined]
[17] materialize!
@ .\broadcast.jl:894 [inlined]
[18] materialize!
@ .\broadcast.jl:891 [inlined]
[19] lmul!(D::LinearAlgebra.Diagonal{Float32, Vector{Float32}}, B::CuArray{Float32, 2})
@ LinearAlgebra C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\LinearAlgebra\src\diagonal.jl:212
[20] *
@ C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\LinearAlgebra\src\diagonal.jl:275 [inlined]
[21] pinv(A::CuArray{Float32, 2}; atol::Float64, rtol::Float32)
@ LinearAlgebra C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\LinearAlgebra\src\dense.jl:1395
[22] pinv(A::CuArray{Float32, 2})
@ LinearAlgebra C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\LinearAlgebra\src\dense.jl:1367
[23] top-level scope
@ none:1
CUDA version:
[052768ef] CUDA v3.1.0
Expected behavior
CUDA.pinv should act upon a CuArray, and return a CuArray.
Version info
Details on Julia:
# please post the output of:
versioninfo()
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i9-9900 CPU @ 3.10GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
JULIA_EDITOR = "C:\Users\User\AppData\Local\atom\app-1.56.0\atom.exe" -a
JULIA_NUM_THREADS = 8
Details on CUDA:
# please post the output of:
CUDA.versioninfo()
CUDA toolkit 11.2.2, artifact installation
CUDA driver 11.3.0
NVIDIA driver 465.89.0
Libraries:
- CUBLAS: 11.4.1
- CURAND: 10.2.3
- CUFFT: 10.4.1
- CUSOLVER: 11.1.0
- CUSPARSE: 11.4.1
- CUPTI: 14.0.0
- NVML: 11.0.0+465.89
- CUDNN: 8.10.0 (for CUDA 11.2.0)
- CUTENSOR: 1.2.2 (for CUDA 11.1.0)
Toolchain:
- Julia: 1.6.1
- LLVM: 11.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80
1 device:
0: NVIDIA Quadro P2200 (sm_61, 4.272 GiB / 5.000 GiB available)
Additional context
Currently CUDA.pinv does works on matrices stored on CPU memory, and returns the result on CPU array. Assuming the computation is performed on the GPU, this does not make sense as the transfer between CPU and GPU memories are very inefficient, and should be left to the developer discretion.
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomersneeds testsTests are requested.Tests are requested.