Conversation
|
It's possible that a shortcut like that should be added here, for the CPU case. But #411 is specifically about the GPU case, and I think will need to be fixed in NNlibCUDA.jl . One thing that deserves a little thought is that this should probably remain an error (and ideally be tested): julia> NNlib.gather(rand(0, 32), [2, 17, 33])
ERROR: BoundsError: attempt to access 0×32 Matrix{Float64} at index [1:0, 33]And while here, it would be good if empty cases like this could also be checked: julia> NNlib.gather(rand(7, 32), Int[])
7×0 Matrix{Float64} |
|
BTW, I found it useful when I have a bunch of things like |
Exactly, I misread ur comment. Sorry for my stupid attempt. Let me try again. |
|
Here is another strange behavior julia> src = CuArray(rand(2,3))
2×3 CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}:
0.18921 0.00229533 0.440581
0.140795 0.638388 0.751325
julia> NNlib.gather(src,cu[1,4])
2×2 CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}:
0.18921 0.0
0.140795 0.0
julia> NNlib.gather(src,[1,4])
ERROR: BoundsError: attempt to access 2×3 CuArray{Float64, 2, CUDA.Mem.DeviceBuffer} at index [1:2, 4]
Stacktrace:
[1] throw_boundserror(A::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, I::Tuple{Base.Slice{Base.OneTo{Int64}}, Int64})
@ Base .\abstractarray.jl:691
[2] checkbounds
@ .\abstractarray.jl:656 [inlined]
[3] view
@ C:\Users\Luffy\.julia\packages\CUDA\qAl31\src\array.jl:617 [inlined]
[4] _view(X::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, colons::Tuple{Colon}, k::Int64)
@ NNlib C:\Users\Luffy\.julia\packages\NNlib\hydo3\src\scatter.jl:38
[5] gather!(dst::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, src::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, idx::Vector{Int64})
@ NNlib C:\Users\Luffy\.julia\packages\NNlib\hydo3\src\gather.jl:27
[6] gather(src::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, idx::Vector{Int64})
@ NNlib C:\Users\Luffy\.julia\packages\NNlib\hydo3\src\gather.jl:77
[7] top-level scope
@ REPL[155]:1
[8] top-level scope
@ C:\Users\Luffy\.julia\packages\CUDA\qAl31\src\initialization.jl:52
julia> @which NNlib.gather(src,[1,4])
gather(src::AbstractArray{Tsrc, Nsrc}, idx::AbstractArray{Tidx, Nidx}) where {Tsrc, Nsrc, Nidx, Tidx} in NNlib at C:\Users\Luffy\.julia\packages\NNlib\hydo3\src\gather.jl:70
julia> @which NNlib.gather(src,cu[1,4])
gather(src::AbstractArray{Tsrc, Nsrc}, idx::AbstractArray{Tidx, Nidx}) where {Tsrc, Nsrc, Nidx, Tidx} in NNlib at C:\Users\Luffy\.julia\packages\NNlib\hydo3\src\gather.jl:70@mcabbott |
|
Do we even need to move the index to GPU in any cases? Everything seems just fine if we keep it on CPU. So one benefit is the speed julia> @btime NNlib.gather($a,$idx)
400.234 ms (450005 allocations: 28.99 MiB)
10×50000 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
0.691634 0.873781 0.0642042 0.229622 0.0919693 0.88087 … 0.331066 0.401409 0.486825 0.0794942 0.0328875 0.538867
0.382004 0.269381 0.652076 0.814351 0.0334432 0.949356 0.981492 0.486789 0.538543 0.0939153 0.0317709 0.738783
0.64629 0.794243 0.704002 0.662857 0.938788 0.917456 0.11773 0.0184704 0.6812 0.699423 0.94094 0.298974
0.693101 0.311339 0.281524 0.146332 0.459633 0.00642134 0.918656 0.225414 0.0749762 0.92406 0.272871 0.5222
0.716304 0.708746 0.911626 0.912521 0.773224 0.549931 0.574009 0.00617868 0.715682 0.441322 0.0636071 0.310628
0.550515 0.0741554 0.588083 0.106769 0.785537 0.391265 … 0.900756 0.357758 0.724709 0.727474 0.281789 0.758916
0.0764654 0.534348 0.079201 0.758459 0.424882 0.173804 0.136813 0.982668 0.0272927 0.803712 0.524672 0.287456
0.133258 0.908437 0.23821 0.859476 0.0171796 0.580579 0.637107 0.69076 0.0927079 0.7699 0.433433 0.787452
0.847404 0.561789 0.846151 0.928472 0.327801 0.75679 0.107646 0.128363 0.685991 0.99785 0.818783 0.8978
0.642568 0.383775 0.24313 0.823817 0.80557 0.814838 0.488513 0.706078 0.158599 0.904244 0.670385 0.977725
julia> @btime NNlib.gather($a,$idx_gpu)
4.917 μs (10 allocations: 512 bytes)
10×50000 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
0.691634 0.873781 0.0642042 0.229622 0.0919693 0.88087 … 0.331066 0.401409 0.486825 0.0794942 0.0328875 0.538867
0.382004 0.269381 0.652076 0.814351 0.0334432 0.949356 0.981492 0.486789 0.538543 0.0939153 0.0317709 0.738783
0.64629 0.794243 0.704002 0.662857 0.938788 0.917456 0.11773 0.0184704 0.6812 0.699423 0.94094 0.298974
0.693101 0.311339 0.281524 0.146332 0.459633 0.00642134 0.918656 0.225414 0.0749762 0.92406 0.272871 0.5222
0.716304 0.708746 0.911626 0.912521 0.773224 0.549931 0.574009 0.00617868 0.715682 0.441322 0.0636071 0.310628
0.550515 0.0741554 0.588083 0.106769 0.785537 0.391265 … 0.900756 0.357758 0.724709 0.727474 0.281789 0.758916
0.0764654 0.534348 0.079201 0.758459 0.424882 0.173804 0.136813 0.982668 0.0272927 0.803712 0.524672 0.287456
0.133258 0.908437 0.23821 0.859476 0.0171796 0.580579 0.637107 0.69076 0.0927079 0.7699 0.433433 0.787452
0.847404 0.561789 0.846151 0.928472 0.327801 0.75679 0.107646 0.128363 0.685991 0.99785 0.818783 0.8978
0.642568 0.383775 0.24313 0.823817 0.80557 0.814838 0.488513 0.706078 0.158599 0.904244 0.670385 0.977725 |
Close #411