Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LinearMapsAA is not CUDA compatible? #43

Closed
ZongyuLi-umich opened this issue May 22, 2022 · 6 comments
Closed

LinearMapsAA is not CUDA compatible? #43

ZongyuLi-umich opened this issue May 22, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@ZongyuLi-umich
Copy link

I found that LinearMapsAA built on CuSPECTrecon does not support CUDA.
Here is my test code:

using CuSPECTrecon
using LinearMapsAA
using CUDA
CUDA.allowscalar(false)

T = Float32
nx = 8; ny = nx
nz = 6
nview = 7

mumap = rand(T, nx, ny, nz)

px = 5
pz = 3
psfs = rand(T, px, pz, ny, nview)
psfs = psfs .+ mapslices(reverse, psfs, dims = [1, 2]) # symmetrize
psfs = psfs ./ mapslices(sum, psfs, dims = [1, 2])

dy = T(4.7952)

Cuimage = CuArray(zeros(T, nx, ny, nz))
Cuviews = CuArray(rand(T, nx, nz, nview))

Cumumap = CuArray(mumap)
Cupsfs = CuArray(psfs)
Cuplan = CuSPECTplan(Cumumap, Cupsfs, dy; T)

idim = (nx, ny, nz)
odim = (nx, nz, nview)

A = LinearMapAA(x -> Cuproject(x, Cuplan),
                y -> Cubackproject(y, Cuplan),
                (prod(odim), prod(idim));
                idim = idim,
                odim = odim,
                T = Float32)

I can do both

mul!(Cuviews, A, Cuimage)

and

Cuviews = Cuproject(Cuimage, Cuplan)

but when I run

Cuviews = A * Cuimage

Julia throws me the following error:

Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
error(s::String) at error.jl:33
assertscalar(op::String) at indexing.jl:53
getindex(xs::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, I::Int64) at indexing.jl:86
copyto_unaliased!(deststyle::IndexLinear, dest::SubArray{Float32, 1, Matrix{Float32}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, srcstyle::IndexLinear, src::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}) at abstractarray.jl:1018
copyto!(dest::SubArray{Float32, 1, Matrix{Float32}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, src::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}) at abstractarray.jl:998
_unsafe_mul!(y::SubArray{Float32, 1, Matrix{Float32}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, A::LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#9#11"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#10#12"}}, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}) at functionmap.jl:114
mul! at LinearMaps.jl:163 [inlined]
_generic_mapvec_mul!(y::SubArray{Float32, 1, Matrix{Float32}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, A::LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#9#11"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#10#12"}}, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, α::Int64, β::Int64) at LinearMaps.jl:204
_unsafe_mul! at LinearMaps.jl:249 [inlined]
mul! at LinearMaps.jl:198 [inlined]
lm_mul! at multiply.jl:88 [inlined]
lmao_mul!(Y::Array{Float32, 3}, Lm::LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#9#11"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#10#12"}}, X::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, α::Int64, β::Int64; idim::Tuple{Int64, Int64, Int64}, odim::Tuple{Int64, Int64, Int64}) at multiply.jl:153
(::LinearMapsAA.var"#lmao_mul!##kw")(::NamedTuple{(:idim, :odim), Tuple{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}}}, ::typeof(LinearMapsAA.lmao_mul!), Y::Array{Float32, 3}, Lm::LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#9#11"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#10#12"}}, X::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, α::Int64, β::Int64) at multiply.jl:136
lmax_mul(A::LinearMapAO{Float32, 3, 3, LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#9#11"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#10#12"}}, NamedTuple{(), Tuple{}}}, X::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}) at multiply.jl:212
*(A::LinearMapAO{Float32, 3, 3, LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#9#11"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#10#12"}}, NamedTuple{(), Tuple{}}}, X::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}) at multiply.jl:191
top-level scope at Cuautodiff.jl:36
eval at boot.jl:373 [inlined]
@JeffFessler
Copy link
Owner

The error is taking place in LinearMaps.jl:163 so we probably need to make a MWE to report to that repo.

@ZongyuLi-umich
Copy link
Author

I did a simple test:

using CUDA
CUDA.allowscalar(false)
using LinearMapsAA
nx = 8; ny = nx
nz = 6
idim = (nx, ny, nz)
Cuimage = CuArray(rand(Float32, nx, ny, nz))
forw1 = x -> 2 * x
back1 = y -> 0.5 * y
B = LinearMapAA(forw1, back1,
               (prod(idim), prod(idim));
               idim = idim,
               odim = idim,
               T = Float32)

And running

B * Cuimage

throws an error:

Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
error(s::String) at error.jl:33
assertscalar(op::String) at indexing.jl:53
getindex(xs::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, I::Int64) at indexing.jl:86
copyto_unaliased!(deststyle::IndexLinear, dest::SubArray{Float32, 1, Matrix{Float32}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, srcstyle::IndexLinear, src::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}) at abstractarray.jl:1018
copyto! at abstractarray.jl:998 [inlined]
_unsafe_mul!(y::SubArray{Float32, 1, Matrix{Float32}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, A::LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#29#30"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#31#32"}}, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}) at functionmap.jl:114
mul! at LinearMaps.jl:163 [inlined]
_generic_mapvec_mul!(y::SubArray{Float32, 1, Matrix{Float32}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, A::LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#29#30"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#31#32"}}, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, α::Int64, β::Int64) at LinearMaps.jl:204
_unsafe_mul! at LinearMaps.jl:249 [inlined]
mul! at LinearMaps.jl:198 [inlined]
lm_mul! at multiply.jl:88 [inlined]
lmao_mul!(Y::Array{Float32, 3}, Lm::LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#29#30"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#31#32"}}, X::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, α::Int64, β::Int64; idim::Tuple{Int64, Int64, Int64}, odim::Tuple{Int64, Int64, Int64}) at multiply.jl:153
(::LinearMapsAA.var"#lmao_mul!##kw")(::NamedTuple{(:idim, :odim), Tuple{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}}}, ::typeof(LinearMapsAA.lmao_mul!), Y::Array{Float32, 3}, Lm::LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#29#30"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#31#32"}}, X::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, α::Int64, β::Int64) at multiply.jl:136
lmax_mul(A::LinearMapAO{Float32, 3, 3, LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#29#30"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#31#32"}}, NamedTuple{(), Tuple{}}}, X::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}) at multiply.jl:212
*(A::LinearMapAO{Float32, 3, 3, LinearMaps.FunctionMap{Float32, LinearMapsAA.var"#6#10"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#29#30"}, LinearMapsAA.var"#8#12"{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, var"#31#32"}}, NamedTuple{(), Tuple{}}}, X::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}) at multiply.jl:191
top-level scope at Cuautodiff.jl:65
eval at boot.jl:373 [inlined]

@JeffFessler
Copy link
Owner

We need a test with LinearMaps.jl directly, without LinearMapsAA.jl

@ZongyuLi-umich
Copy link
Author

Yes, here is the test code:

using LinearMaps
nx = 8; ny = nx
nz = 6
idim = (nx, ny, nz)
Cuimage = CuArray(rand(Float32, idim))
forw1 = x -> 2 * x
back1 = y -> 0.5 * y
A = LinearMapAA(forw1, back1, (prod(idim), prod(idim));
               odim = idim, idim = idim)
B = LinearMap(forw1, back1, prod(idim))
C = LinearMapAA(forw1, back1, (prod(idim), prod(idim)))

Running

A * Cuimage

gives me the scalar indexing error.
But running

B * vec(Cuimage)
C * vec(Cuimage)

throws no error.

@JeffFessler
Copy link
Owner

Hopefully fixed by #44.
I should add more tests to be sure, so leaving this open for now.

@JeffFessler
Copy link
Owner

Closed by #46

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants