LinearMapsAA is not CUDA compatible? #43

ZongyuLi-umich · 2022-05-22T19:30:49Z

I found that LinearMapsAA built on CuSPECTrecon does not support CUDA.
Here is my test code:

using CuSPECTrecon
using LinearMapsAA
using CUDA
CUDA.allowscalar(false)

T = Float32
nx = 8; ny = nx
nz = 6
nview = 7

mumap = rand(T, nx, ny, nz)

px = 5
pz = 3
psfs = rand(T, px, pz, ny, nview)
psfs = psfs .+ mapslices(reverse, psfs, dims = [1, 2]) # symmetrize
psfs = psfs ./ mapslices(sum, psfs, dims = [1, 2])

dy = T(4.7952)

Cuimage = CuArray(zeros(T, nx, ny, nz))
Cuviews = CuArray(rand(T, nx, nz, nview))

Cumumap = CuArray(mumap)
Cupsfs = CuArray(psfs)
Cuplan = CuSPECTplan(Cumumap, Cupsfs, dy; T)

idim = (nx, ny, nz)
odim = (nx, nz, nview)

A = LinearMapAA(x -> Cuproject(x, Cuplan),
                y -> Cubackproject(y, Cuplan),
                (prod(odim), prod(idim));
                idim = idim,
                odim = odim,
                T = Float32)

I can do both

mul!(Cuviews, A, Cuimage)

and

Cuviews = Cuproject(Cuimage, Cuplan)

but when I run

Cuviews = A * Cuimage

Julia throws me the following error:

Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
error(s::String) at error.jl:33
assertscalar(op::String) at indexing.jl:53
getindex(xs::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, I::Int64) at indexing.jl:86
copyto_unaliased!(deststyle::IndexLinear, dest::SubArray{Float32, 1, Matrix{Float32}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, srcstyle::IndexLinear, src::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}) at abstractarray.jl:1018
copyto!(dest::SubArray{Float32, 1, Matrix{Float32}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, src::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}) at abstractarray.jl:998
_unsafe_mul!(y::SubArray{Float32, 1, Matrix{Float32}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, A::LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#9#11"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#10#12"}}, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}) at functionmap.jl:114
mul! at LinearMaps.jl:163 [inlined]
_generic_mapvec_mul!(y::SubArray{Float32, 1, Matrix{Float32}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, A::LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#9#11"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#10#12"}}, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, α::Int64, β::Int64) at LinearMaps.jl:204
_unsafe_mul! at LinearMaps.jl:249 [inlined]
mul! at LinearMaps.jl:198 [inlined]
lm_mul! at multiply.jl:88 [inlined]
lmao_mul!(Y::Array{Float32, 3}, Lm::LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#9#11"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#10#12"}}, X::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, α::Int64, β::Int64; idim::Tuple{Int64, Int64, Int64}, odim::Tuple{Int64, Int64, Int64}) at multiply.jl:153
(::LinearMapsAA.var"#lmao_mul!##kw")(::NamedTuple{(:idim, :odim), Tuple{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}}}, ::typeof(LinearMapsAA.lmao_mul!), Y::Array{Float32, 3}, Lm::LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#9#11"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#10#12"}}, X::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, α::Int64, β::Int64) at multiply.jl:136
lmax_mul(A::LinearMapAO{Float32, 3, 3, LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#9#11"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#10#12"}}, NamedTuple{(), Tuple{}}}, X::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}) at multiply.jl:212
*(A::LinearMapAO{Float32, 3, 3, LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#9#11"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#10#12"}}, NamedTuple{(), Tuple{}}}, X::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}) at multiply.jl:191
top-level scope at Cuautodiff.jl:36
eval at boot.jl:373 [inlined]

The text was updated successfully, but these errors were encountered:

JeffFessler · 2022-05-22T20:03:44Z

The error is taking place in LinearMaps.jl:163 so we probably need to make a MWE to report to that repo.

ZongyuLi-umich · 2022-05-22T20:14:04Z

I did a simple test:

using CUDA
CUDA.allowscalar(false)
using LinearMapsAA
nx = 8; ny = nx
nz = 6
idim = (nx, ny, nz)
Cuimage = CuArray(rand(Float32, nx, ny, nz))
forw1 = x -> 2 * x
back1 = y -> 0.5 * y
B = LinearMapAA(forw1, back1,
               (prod(idim), prod(idim));
               idim = idim,
               odim = idim,
               T = Float32)

And running

B * Cuimage

throws an error:

Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
error(s::String) at error.jl:33
assertscalar(op::String) at indexing.jl:53
getindex(xs::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, I::Int64) at indexing.jl:86
copyto_unaliased!(deststyle::IndexLinear, dest::SubArray{Float32, 1, Matrix{Float32}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, srcstyle::IndexLinear, src::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}) at abstractarray.jl:1018
copyto! at abstractarray.jl:998 [inlined]
_unsafe_mul!(y::SubArray{Float32, 1, Matrix{Float32}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, A::LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#29#30"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#31#32"}}, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}) at functionmap.jl:114
mul! at LinearMaps.jl:163 [inlined]
_generic_mapvec_mul!(y::SubArray{Float32, 1, Matrix{Float32}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, A::LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#29#30"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#31#32"}}, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, α::Int64, β::Int64) at LinearMaps.jl:204
_unsafe_mul! at LinearMaps.jl:249 [inlined]
mul! at LinearMaps.jl:198 [inlined]
lm_mul! at multiply.jl:88 [inlined]
lmao_mul!(Y::Array{Float32, 3}, Lm::LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#29#30"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#31#32"}}, X::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, α::Int64, β::Int64; idim::Tuple{Int64, Int64, Int64}, odim::Tuple{Int64, Int64, Int64}) at multiply.jl:153
(::LinearMapsAA.var"#lmao_mul!##kw")(::NamedTuple{(:idim, :odim), Tuple{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}}}, ::typeof(LinearMapsAA.lmao_mul!), Y::Array{Float32, 3}, Lm::LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#29#30"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#31#32"}}, X::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, α::Int64, β::Int64) at multiply.jl:136
lmax_mul(A::LinearMapAO{Float32, 3, 3, LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#29#30"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#31#32"}}, NamedTuple{(), Tuple{}}}, X::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}) at multiply.jl:212
*(A::LinearMapAO{Float32, 3, 3, LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#29#30"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#31#32"}}, NamedTuple{(), Tuple{}}}, X::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}) at multiply.jl:191
top-level scope at Cuautodiff.jl:65
eval at boot.jl:373 [inlined]

JeffFessler · 2022-05-22T20:16:01Z

We need a test with LinearMaps.jl directly, without LinearMapsAA.jl

ZongyuLi-umich · 2022-05-22T20:28:51Z

Yes, here is the test code:

using LinearMaps
nx = 8; ny = nx
nz = 6
idim = (nx, ny, nz)
Cuimage = CuArray(rand(Float32, idim))
forw1 = x -> 2 * x
back1 = y -> 0.5 * y
A = LinearMapAA(forw1, back1, (prod(idim), prod(idim));
               odim = idim, idim = idim)
B = LinearMap(forw1, back1, prod(idim))
C = LinearMapAA(forw1, back1, (prod(idim), prod(idim)))

Running

A * Cuimage

gives me the scalar indexing error.
But running

B * vec(Cuimage)
C * vec(Cuimage)

throws no error.

JeffFessler · 2022-05-23T02:07:49Z

Hopefully fixed by #44.
I should add more tests to be sure, so leaving this open for now.

JeffFessler · 2022-05-25T21:22:18Z

Closed by #46

JeffFessler mentioned this issue May 23, 2022

Use similar not undef to support CUDA #44

Merged

JeffFessler added the enhancement New feature or request label May 23, 2022

This was referenced May 24, 2022

mirt-demo on GPU JeffFessler/mirt-demo#5

Open

Inconsistency of == and isequal for CuArray JuliaGPU/CUDA.jl#1524

Closed

Add CUDA test #46

Merged

JeffFessler closed this as completed May 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LinearMapsAA is not CUDA compatible? #43

LinearMapsAA is not CUDA compatible? #43

ZongyuLi-umich commented May 22, 2022

JeffFessler commented May 22, 2022

ZongyuLi-umich commented May 22, 2022

JeffFessler commented May 22, 2022

ZongyuLi-umich commented May 22, 2022

JeffFessler commented May 23, 2022

JeffFessler commented May 25, 2022

LinearMapsAA is not CUDA compatible? #43

LinearMapsAA is not CUDA compatible? #43

Comments

ZongyuLi-umich commented May 22, 2022

JeffFessler commented May 22, 2022

ZongyuLi-umich commented May 22, 2022

JeffFessler commented May 22, 2022

ZongyuLi-umich commented May 22, 2022

JeffFessler commented May 23, 2022

JeffFessler commented May 25, 2022