`gather` on empty CUDA array by YichengDWu · Pull Request #413 · FluxML/NNlib.jl

YichengDWu · 2022-05-31T13:39:39Z

Close #411

mcabbott · 2022-05-31T14:00:34Z

It's possible that a shortcut like that should be added here, for the CPU case. But #411 is specifically about the GPU case, and I think will need to be fixed in NNlibCUDA.jl .

One thing that deserves a little thought is that this should probably remain an error (and ideally be tested):

julia> NNlib.gather(rand(0, 32), [2, 17, 33])
ERROR: BoundsError: attempt to access 0×32 Matrix{Float64} at index [1:0, 33]

And while here, it would be good if empty cases like this could also be checked:

julia> NNlib.gather(rand(7, 32), Int[])
7×0 Matrix{Float64}

YichengDWu · 2022-05-31T14:18:05Z

BTW, I found it useful when I have a bunch of things like NN(vcat(u,x)), where x can sometimes just be empty.

YichengDWu · 2022-05-31T14:38:52Z

I think will need to be fixed in NNlibCUDA.jl .

Exactly, I misread ur comment. Sorry for my stupid attempt. Let me try again.

YichengDWu · 2022-05-31T16:31:35Z

Here is another strange behavior

julia> src = CuArray(rand(2,3))
2×3 CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}:
 0.18921   0.00229533  0.440581
 0.140795  0.638388    0.751325

julia> NNlib.gather(src,cu[1,4])
2×2 CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}:
 0.18921   0.0
 0.140795  0.0

julia> NNlib.gather(src,[1,4])
ERROR: BoundsError: attempt to access 2×3 CuArray{Float64, 2, CUDA.Mem.DeviceBuffer} at index [1:2, 4]
Stacktrace:
 [1] throw_boundserror(A::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, I::Tuple{Base.Slice{Base.OneTo{Int64}}, Int64})
   @ Base .\abstractarray.jl:691
 [2] checkbounds
   @ .\abstractarray.jl:656 [inlined]
 [3] view
   @ C:\Users\Luffy\.julia\packages\CUDA\qAl31\src\array.jl:617 [inlined]
 [4] _view(X::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, colons::Tuple{Colon}, k::Int64)
   @ NNlib C:\Users\Luffy\.julia\packages\NNlib\hydo3\src\scatter.jl:38
 [5] gather!(dst::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, src::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, idx::Vector{Int64})
   @ NNlib C:\Users\Luffy\.julia\packages\NNlib\hydo3\src\gather.jl:27
 [6] gather(src::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, idx::Vector{Int64})
   @ NNlib C:\Users\Luffy\.julia\packages\NNlib\hydo3\src\gather.jl:77
 [7] top-level scope
   @ REPL[155]:1
 [8] top-level scope
   @ C:\Users\Luffy\.julia\packages\CUDA\qAl31\src\initialization.jl:52
julia> @which NNlib.gather(src,[1,4])
gather(src::AbstractArray{Tsrc, Nsrc}, idx::AbstractArray{Tidx, Nidx}) where {Tsrc, Nsrc, Nidx, Tidx} in NNlib at C:\Users\Luffy\.julia\packages\NNlib\hydo3\src\gather.jl:70

julia> @which NNlib.gather(src,cu[1,4])
gather(src::AbstractArray{Tsrc, Nsrc}, idx::AbstractArray{Tidx, Nidx}) where {Tsrc, Nsrc, Nidx, Tidx} in NNlib at C:\Users\Luffy\.julia\packages\NNlib\hydo3\src\gather.jl:70

@mcabbott
We can successfully check the bounds if the index is on CPU, but not GPU? I know this could be fixed by adding a checkbounds function when srt and idx both live on GPU, but I don't understand why here only one works.

YichengDWu · 2022-05-31T17:06:28Z

Do we even need to move the index to GPU in any cases? Everything seems just fine if we keep it on CPU.

So one benefit is the speed

julia> @btime NNlib.gather($a,$idx)
  400.234 ms (450005 allocations: 28.99 MiB)
10×50000 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 0.691634   0.873781   0.0642042  0.229622  0.0919693  0.88087     …  0.331066  0.401409    0.486825   0.0794942  0.0328875  0.538867
 0.382004   0.269381   0.652076   0.814351  0.0334432  0.949356       0.981492  0.486789    0.538543   0.0939153  0.0317709  0.738783
 0.64629    0.794243   0.704002   0.662857  0.938788   0.917456       0.11773   0.0184704   0.6812     0.699423   0.94094    0.298974
 0.693101   0.311339   0.281524   0.146332  0.459633   0.00642134     0.918656  0.225414    0.0749762  0.92406    0.272871   0.5222
 0.716304   0.708746   0.911626   0.912521  0.773224   0.549931       0.574009  0.00617868  0.715682   0.441322   0.0636071  0.310628
 0.550515   0.0741554  0.588083   0.106769  0.785537   0.391265    …  0.900756  0.357758    0.724709   0.727474   0.281789   0.758916
 0.0764654  0.534348   0.079201   0.758459  0.424882   0.173804       0.136813  0.982668    0.0272927  0.803712   0.524672   0.287456
 0.133258   0.908437   0.23821    0.859476  0.0171796  0.580579       0.637107  0.69076     0.0927079  0.7699     0.433433   0.787452
 0.847404   0.561789   0.846151   0.928472  0.327801   0.75679        0.107646  0.128363    0.685991   0.99785    0.818783   0.8978
 0.642568   0.383775   0.24313    0.823817  0.80557    0.814838       0.488513  0.706078    0.158599   0.904244   0.670385   0.977725

julia> @btime NNlib.gather($a,$idx_gpu)
  4.917 μs (10 allocations: 512 bytes)
10×50000 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 0.691634   0.873781   0.0642042  0.229622  0.0919693  0.88087     …  0.331066  0.401409    0.486825   0.0794942  0.0328875  0.538867
 0.382004   0.269381   0.652076   0.814351  0.0334432  0.949356       0.981492  0.486789    0.538543   0.0939153  0.0317709  0.738783
 0.64629    0.794243   0.704002   0.662857  0.938788   0.917456       0.11773   0.0184704   0.6812     0.699423   0.94094    0.298974
 0.693101   0.311339   0.281524   0.146332  0.459633   0.00642134     0.918656  0.225414    0.0749762  0.92406    0.272871   0.5222
 0.716304   0.708746   0.911626   0.912521  0.773224   0.549931       0.574009  0.00617868  0.715682   0.441322   0.0636071  0.310628
 0.550515   0.0741554  0.588083   0.106769  0.785537   0.391265    …  0.900756  0.357758    0.724709   0.727474   0.281789   0.758916
 0.0764654  0.534348   0.079201   0.758459  0.424882   0.173804       0.136813  0.982668    0.0272927  0.803712   0.524672   0.287456
 0.133258   0.908437   0.23821    0.859476  0.0171796  0.580579       0.637107  0.69076     0.0927079  0.7699     0.433433   0.787452
 0.847404   0.561789   0.846151   0.928472  0.327801   0.75679        0.107646  0.128363    0.685991   0.99785    0.818783   0.8978
 0.642568   0.383775   0.24313    0.823817  0.80557    0.814838       0.488513  0.706078    0.158599   0.904244   0.670385   0.977725

gather on empty CUDA array

9df9c5c

YichengDWu changed the title ~~gather on empty CUDA array~~ gather on empty CUDA array May 31, 2022

CarloLucibello closed this May 31, 2022

CarloLucibello reopened this May 31, 2022

YichengDWu closed this May 31, 2022

lenianiva mentioned this pull request Jan 12, 2026

fix: gather and scatter on empty arrays #668

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`gather` on empty CUDA array#413

`gather` on empty CUDA array#413
YichengDWu wants to merge 1 commit intoFluxML:masterfrom
YichengDWu:master

YichengDWu commented May 31, 2022

Uh oh!

mcabbott commented May 31, 2022 •

edited

Loading

Uh oh!

YichengDWu commented May 31, 2022

Uh oh!

YichengDWu commented May 31, 2022

Uh oh!

YichengDWu commented May 31, 2022

Uh oh!

YichengDWu commented May 31, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

YichengDWu commented May 31, 2022

Uh oh!

mcabbott commented May 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YichengDWu commented May 31, 2022

Uh oh!

YichengDWu commented May 31, 2022

Uh oh!

YichengDWu commented May 31, 2022

Uh oh!

YichengDWu commented May 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mcabbott commented May 31, 2022 •

edited

Loading

YichengDWu commented May 31, 2022 •

edited

Loading