Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broadcasting and reshaped, transposed, CuArrays #228

Open
mcabbott opened this issue Jun 17, 2020 · 6 comments
Open

Broadcasting and reshaped, transposed, CuArrays #228

mcabbott opened this issue Jun 17, 2020 · 6 comments
Labels
bug Something isn't working cuda array Stuff about CuArray. help wanted Extra attention is needed

Comments

@mcabbott
Copy link
Contributor

Some broadcasting operations work fine on reshape(transpose(cu( objects, but some fail:

using CUDA
CUDA.allowscalar(false)
mat = cu(rand(2,2))

reshape(transpose(mat),2,1,2) .+ mat # ok

exp.(reshape(transpose(mat), 2,1,2)) # ERROR: scalar getindex is disallowed
CUDA.exp.(reshape(transpose(mat), 2,1,2)) # FATAL ERROR: Symbol "__nv_expf"not found

Something like this appears to be the cause of mcabbott/TensorCast.jl#25, where the failure happens only when broadcasting two such objects, just one is fine:

C = cu(ones(10,2))
L = cu(ones(10,3))

reshape(C,1,2,10) .+ reshape(L', 3,1,10) # ok
reshape(C',1,2,10) .+ reshape(L, 3,1,10) # ok
reshape(C',1,2,10) .+ reshape(L', 3,1,10) # ERROR: scalar getindex is disallowed

This is with CUDA v0.1.0, I get the same errors on CuArrays v2.2.1, and on CuArrays v1.7.2 the only change is this:

julia> CuArrays.CUDAnative.exp.(reshape(transpose(mat), 2,1,2)) #
ERROR: LLVM error: Program used external function '___nv_expf' which could not be resolved!
@maleadt
Copy link
Member

maleadt commented Jun 17, 2020

Dup of JuliaGPU/Adapt.jl#21. With how Adapt.jl currently works, it's much too complicated/expensive to extend our support for array wrappers to doubly-wrapped arrays (here, ReshapedArray of Transpose of CuArray).

@mcabbott
Copy link
Contributor Author

OK, sorry I didn't search there.

Is there a workaround? Like some mechanism to tell it to keep unwrapping, even if this can't be triggered automatically?

I think what's happening in my second example is that one singly-wrapped CuArray pushes the whole broadcast to be done by CUDA, and then it's happy... is there or should there be a way to make this happen?

zed = similar(mat,1) .= 0
zed .+ CUDA.exp.(reshape(transpose(mat), 2,1,2)) # ok!

@maleadt
Copy link
Member

maleadt commented Jun 22, 2020

Is there a workaround? Like some mechanism to tell it to keep unwrapping, even if this can't be triggered automatically?

Unwrapping would be materializing the wrapper using collect, and that's probably too expensive?

Anyway, you've demonstrated a workaround yourself: make sure there's an actual CuArray (or a single-wrapped one) participating in the broadcast, The resulting broadcast style will then be the GPU one. When you do exp.(reshape(transpose(mat), 2,1,2)), there's nothing with a GPU array style in there so it'll fall back to the default, indexing one. Changing exp to CUDAnative.exp there doesn't fundamentally change the broadcast, but will just try to apply that GPU function to what is already determined will be executing on the CPU (and hence crash).

An alternative workaround is to override the broadcaststyle:

julia> C = cu(ones(10,2));

julia> L = cu(ones(10,3));


julia> Meta.@lower reshape(C',1,2,10) .+ reshape(L', 3,1,10)
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope`
1%1 = var"'"(C)
│   %2 = reshape(%1, 1, 2, 10)
│   %3 = var"'"(L)
│   %4 = reshape(%3, 3, 1, 10)
│   %5 = Base.broadcasted(+, %2, %4)
│   %6 = Base.materialize(%5)
└──      return %6

julia> a = reshape(C',1,2,10);

julia> b = reshape(L', 3,1,10);

julia> bc = Base.broadcasted(+, a, b);

julia> CUDA.allowscalar(false)

julia> Broadcast.materialize(bc)
ERROR: scalar getindex is disallowed
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] assertscalar(::String) at /home/tim/Julia/pkg/GPUArrays/src/host/indexing.jl:41
 [3] getindex at /home/tim/Julia/pkg/GPUArrays/src/host/indexing.jl:96 [inlined]
 [4] _getindex at ./abstractarray.jl:1066 [inlined]
 [5] getindex at ./abstractarray.jl:1043 [inlined]
 [6] getindex at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/adjtrans.jl:190 [inlined]
 [7] _unsafe_getindex_rs at ./reshapedarray.jl:249 [inlined]
 [8] _unsafe_getindex at ./reshapedarray.jl:246 [inlined]
 [9] getindex at ./reshapedarray.jl:234 [inlined]
 [10] _getindex at ./abstractarray.jl:1083 [inlined]
 [11] getindex at ./abstractarray.jl:1043 [inlined]
 [12] _broadcast_getindex at ./broadcast.jl:614 [inlined]
 [13] _getindex at ./broadcast.jl:644 [inlined]
 [14] _broadcast_getindex at ./broadcast.jl:620 [inlined]
 [15] getindex at ./broadcast.jl:575 [inlined]
 [16] macro expansion at ./broadcast.jl:932 [inlined]
 [17] macro expansion at ./simdloop.jl:77 [inlined]
 [18] copyto! at ./broadcast.jl:931 [inlined]
 [19] copyto! at ./broadcast.jl:886 [inlined]
 [20] copy at ./broadcast.jl:862 [inlined]
 [21] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{3},Nothing,typeof(+),Tuple{Base.ReshapedArray{Float32,3,LinearAlgebra.Adjoint{Float32,CuArray{Float32,2,Nothing}},Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}},Base.ReshapedArray{Float32,3,LinearAlgebra.Adjoint{Float32,CuArray{Float32,2,Nothing}},Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}}}) at ./broadcast.jl:837
 [22] top-level scope at REPL[51]:1

julia> bc = Base.broadcasted(CUDA.CuArrayStyle{2}(), +, a, b);

julia> Broadcast.materialize(bc);

Maybe with some tooling this could be made practical for you (Broadcast.@withstyle C reshape(C',1,2,10) .+ reshape(L', 3,1,10)?)

@mcabbott
Copy link
Contributor Author

Thanks for having a look, haven't got back to this.

Re tooling, I was wondering whether cu shouldn't be overloaded so that cu.(arrays...) changes the broadcast. Something similar was done by LazyArrays, with lazy.(x .+ y) delaying materialization. (It started roughly here: JuliaLang/julia#19198 (comment) )

But I also wonder why broadcasting can't see this itself. It is easy to recursively call parent on an array until you get something === itself, this extracts the underlying storage type. I thought there was some similar recursion in broadcasting styles depending on the style of the parent, although am not sure. (I meant to dig a bit, but not yet.)

@maleadt
Copy link
Member

maleadt commented Jul 2, 2020

Re tooling, I was wondering whether cu shouldn't be overloaded so that cu.(arrays...) changes the broadcast. Something similar was done by LazyArrays, with lazy.(x .+ y) delaying materialization. (It started roughly here: JuliaLang/julia#19198 (comment) )

That's an interesting thought! I'm wary of overloading cu though, I have been mainly stripping if of its features to make code somewhat more explicit (it used to perform much more conversions, you could do stuff like cu[1,2], etc), but for broadcast it might make sense.

@maleadt maleadt added bug Something isn't working help wanted Extra attention is needed cuda array Stuff about CuArray. labels Jul 2, 2020
@mcabbott
Copy link
Contributor Author

mcabbott commented Jul 2, 2020

OK, right I don't know if cu is perfect, but it might be neat.

For the original issue here, mcabbott/TensorCast.jl#25, the ideal logic would be something that digs through wrappers, and returns an Array or a CuArray depending on what it sees.

But the other use is broadcasting over things like ranges or CartesianIndices, in which case it cannot guess, you will have to specify if you want a CuArray.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuda array Stuff about CuArray. help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants