-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assignment using logical indexing #131
Comments
A minimal example of this would be a = cu(rand(100))
cond = a .> 0.5
b = cu(rand(size(a[cond])...))
a[cond] .= b |
This should be a small modification of JuliaGPU/CuArrays.jl#290 |
This is what I am currently doing function kernel_place!(a, b, c, x)
i = (blockIdx().x-1) * blockDim().x + threadIdx().x
if i < length(c) && c[i]
a[i] = b[x[i]]
end
return nothing
end
a = cu(rand(10))
cond = a .> 0.5
b = cu(rand(size(a[cond])...))
idx = cumsum(cond);
@cuda threads=12 kernel_place!(a, b, cond, idx) |
Great, so now just wrap that into a |
I think it would probably be better to do |
If function kernel_place!(a, b, c)
i = (blockIdx().x-1) * blockDim().x + threadIdx().x
if i < length(c) && c[i]
a[i] = b[i]
end
return nothing
end But |
You can do |
No actually |
So should I do a PR for size of |
Yes, please go ahead and make that PR with |
+1 for support for logical indexing into CuArrays the workaround is to use |
@avik-pal i would really find this PR useful. is there a reason you didn't follow through? mind if i pick it up if you don't have time? |
@bjarthur sure take it up. It just slipped my mind. |
@dpsanders it doesn't seem that
i'm trying to hack together a CUDA-specific
suggestions would be much appreciated. thanks! |
Ah, interesting -- it seems vectorised assignment is different, as I guess it must be. Apologies for the mis-direction. |
Vectorized assignment calls broadcast, which should work on the GPU already. So you don't need to implement |
@maleadt but it doesn't work. in a fresh julia session without the
i'd really like to get this working. any advice would be appreciated. |
OK, this turned out a little more intricate than I expected. Let's start with the expansion of this broadcast: julia> Meta.@lower a[i] .= 1
:($(Expr(:thunk, CodeInfo(
@ none within `top-level scope'
1 ─ %1 = Base.dotview(a, i)
│ %2 = Base.broadcasted(Base.identity, 1)
│ %3 = Base.materialize!(%1, %2)
└── return %3
))))
julia> x1 = Base.dotview(a, i)
1-element view(::CuArray{Int64, 1}, [2]) with eltype Int64:
2
julia> dump(x1)
SubArray{Int64, 1, CuArray{Int64, 1}, Tuple{Vector{Int64}}, false}
parent: CuArray{Int64, 1}
baseptr: CuPtr{Nothing} CuPtr{Nothing}(0x00007f83bac00000)
offset: Int64 0
dims: Tuple{Int64}
1: Int64 3
state: CUDA.ArrayState CUDA.ARRAY_MANAGED
ctx: CuContext
handle: Ptr{Nothing} @0x0000000003b838d0
indices: Tuple{Vector{Int64}}
1: Array{Int64}((1,)) [2]
offset1: Int64 0
stride1: Int64 0 The problem is in the returned SubArray by julia> dump(Base.unsafe_view(a, CuArray([1])))
SubArray{Int64, 1, CuArray{Int64, 1}, Tuple{CuArray{Int64, 1}}, false}
parent: CuArray{Int64, 1}
baseptr: CuPtr{Nothing} CuPtr{Nothing}(0x00007f83bac00000)
offset: Int64 0
dims: Tuple{Int64}
1: Int64 3
state: CUDA.ArrayState CUDA.ARRAY_MANAGED
ctx: CuContext
handle: Ptr{Nothing} @0x0000000003b838d0
indices: Tuple{CuArray{Int64, 1}}
1: CuArray{Int64, 1}
baseptr: CuPtr{Nothing} CuPtr{Nothing}(0x00007f83bac01c00)
offset: Int64 0
dims: Tuple{Int64}
1: Int64 1
state: CUDA.ArrayState CUDA.ARRAY_MANAGED
ctx: CuContext
handle: Ptr{Nothing} @0x0000000003b838d0
offset1: Int64 0
stride1: Int64 0
julia> dump(Base.unsafe_view(a, Base.LogicalIndex(CuArray([true]))))
SubArray{Int64, 1, CuArray{Int64, 1}, Tuple{Vector{Int64}}, false}
parent: CuArray{Int64, 1}
baseptr: CuPtr{Nothing} CuPtr{Nothing}(0x00007f83bac00000)
offset: Int64 0
dims: Tuple{Int64}
1: Int64 3
state: CUDA.ArrayState CUDA.ARRAY_MANAGED
ctx: CuContext
handle: Ptr{Nothing} @0x0000000003b838d0
indices: Tuple{Vector{Int64}}
1: Array{Int64}((1,)) [1]
offset1: Int64 0
stride1: Int64 0 That function just calls the Base.ensure_indexable(I::Tuple{Base.LogicalIndex{<:Any,<:CuArray}, Vararg{Any}}) = (I[1], Base.ensure_indexable(Base.tail(I))...) Things then crash when launching the kernel because creating a LogicalIndex{CuDeviceArray} performs an operation on the device array, which is not a valid thing to do. We can hack around this using: @eval function adapt_structure(to, A::Base.LogicalIndex{T}) where {T}
# LogicalIndex's constructor performs a costly (and sometimes impossible) `count`,
# so recreate the struct using low-level Expr(:new).
mask = adapt(to, A.mask)
$(Expr(:new, :(Base.LogicalIndex{T, typeof(mask)}), :mask, :(A.sum)))
end Turns out SubArray does the same (how have we not run into this?), so: @eval function adapt_structure(to, A::SubArray{T,N,<:Any,<:Any,L}) where {T,N,L}
parent = adapt(to, A.parent)
indices = adapt(to, A.indices)
$(Expr(:new, :(SubArray{T, N, typeof(parent), typeof(indices), L}), :parent, :indices, :(A.offset1), :(A.stride1)))
end This gets us to a kernel, but it fails to compile:
And that's basically because:
So our broadcast implementation can't handle Base.ensure_indexable(I::Tuple{Base.LogicalIndex{<:Any,<:CuArray}, Vararg{Any}}) = (findall(I[1].mask), Base.ensure_indexable(Base.tail(I))...) Et voila: julia> a=CuArray([1,2,3])
3-element CuArray{Int64, 1}:
1
2
3
julia> i=CuArray([false,true,false])
3-element CuArray{Bool, 1}:
0
1
0
julia> a[i].=7
1-element view(::CuArray{Int64, 1}, [2]) with eltype Int64:
7
julia> a
3-element CuArray{Int64, 1}:
1
7
3 TL;DR: we need |
@maleadt i think you might not understand that we're talking about logical indexing here with non-contiguous views. my attempt to make a new
the good news is that i now have a
i think this should be useful to others, but there is more work that needs to be done to make it work for more than one dimension. let me know if you concur and i'll work up a PR. |
Sure I do, and no: rewriting
i.e. there's a CPU array in there. |
@maleadt thanks for the detailed investigation! my last post was made near simultaneously, so i had not read yours yet. i'm concerned about the performance degradation by your suggestion to use a |
True, it might make for a good optimization. It should depend on the other fixes I mentioned though, because now you're doing a memory copy to the CPU when doing |
turns out my bespoke thanks again so much for diving into this. i hope to see your solution above merged in to the codebase at some point for the cases where my workarounds are not applicable. |
i just noticed that @maleadt 's solution above does not work for 2-D arrays :(
|
oops, my last comment was not about assigment, but just N-D logical indexing, which already has an issue filed. |
This seem not fixed for me, i'm using CUDA.jl 2.6.2 with Julia 1.6 stable. According to the changelog this pr should be merged already. a = CuArray([1,2,3,4,5]) this still wont compile, while cpu version works. a = [1,2,3,4,5] the error messege is this:
|
Is your feature request related to a problem? Please describe.
I am trying to assign to a cuarray in the following manner. It works on cpu but fails to compile on gpu.
Describe the solution you'd like
a[cond] .= b
a -> CuArray of size x
cond -> Bool CuArray of size x
b -> CuArray of size (a[cond])
Describe alternatives you've considered
Transferring to CPU is not a viable alternative as the transfer overhead is massive due to the size of a being ~ 100000.
The text was updated successfully, but these errors were encountered: