-
Notifications
You must be signed in to change notification settings - Fork 258
Open
Labels
cuda arrayStuff about CuArray.Stuff about CuArray.enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers
Description
As discussed in the forum, I'd like to use a vector of ranges as indices to apply an operation over segments of another array, but it seems this is not currently supported.
Something like this:
using CUDA
CUDA.allowscalar(false)
# example: sum only a part of an array
rangesum(x, r::UnitRange) = sum(x[r])
# broadcast over the ranges
rangesum(x, rr::AbstractVector{UnitRange}) = map(r -> rangesum(x, r), rr)
x = collect(1:100) |> cu
r = [1:10, 33:37, 50:80]
# this works fine if r is on cpu
rangesum(x, r) # results in a Vector{Int}
# but it fails if r in on the gpu
rangesum(x, cu(r))The last line throws:
ERROR: InvalidIRError: compiling kernel broadcast_kernel(CUDA.CuKernelContext, CuDeviceVector{Int64, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, var"#1#2"{CuDeviceVector{Int64, 1}}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{UnitRange{Int64}, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to print_to_string(xs...) in Base at strings/io.jl:124)
Stacktrace:
[1] string
@ ./strings/io.jl:174
[2] throw_checksize_error
@ ./multidimensional.jl:881
[3] _unsafe_getindex
@ ./multidimensional.jl:845
[4] _getindex
@ ./multidimensional.jl:832
[5] getindex
@ ./abstractarray.jl:1170
[6] rangesum
@ ./REPL[3]:2
[7] #1
@ ./REPL[4]:2
[8] _broadcast_getindex_evalf
@ ./broadcast.jl:648
[9] _broadcast_getindex
@ ./broadcast.jl:621
[10] getindex
@ ./broadcast.jl:575
[11] broadcast_kernel
@ ~/.julia/packages/GPUArrays/3sW6s/src/host/broadcast.jl:59
Reason: unsupported call through a literal pointer (call to )
Stacktrace:
[1] Array
@ ./boot.jl:448
[2] Array
@ ./boot.jl:457
[3] similar
@ ./abstractarray.jl:750
[4] similar
@ ./abstractarray.jl:740
[5] _unsafe_getindex
@ ./multidimensional.jl:844
[6] _getindex
@ ./multidimensional.jl:832
[7] getindex
@ ./abstractarray.jl:1170
[8] rangesum
@ ./REPL[3]:2
[9] #1
@ ./REPL[4]:2
[10] _broadcast_getindex_evalf
@ ./broadcast.jl:648
[11] _broadcast_getindex
@ ./broadcast.jl:621
[12] getindex
@ ./broadcast.jl:575
[13] broadcast_kernel
@ ~/.julia/packages/GPUArrays/3sW6s/src/host/broadcast.jl:59
Stacktrace:
[1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceVector{Int64, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, var"#1#2"{CuDeviceVector{Int64, 1}}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{UnitRange{Int64}, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}, args::LLVM.Module)
@ GPUCompiler ~/.julia/packages/GPUCompiler/9rK1I/src/validation.jl:111
[2] macro expansion
@ ~/.julia/packages/GPUCompiler/9rK1I/src/driver.jl:333 [inlined]
[3] macro expansion
@ ~/.julia/packages/TimerOutputs/SSeq1/src/TimerOutput.jl:252 [inlined]
[4] macro expansion
@ ~/.julia/packages/GPUCompiler/9rK1I/src/driver.jl:331 [inlined]
[5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
@ GPUCompiler ~/.julia/packages/GPUCompiler/9rK1I/src/utils.jl:62
[6] cufunction_compile(job::GPUCompiler.CompilerJob)
@ CUDA ~/.julia/packages/CUDA/Xt3hr/src/compiler/execution.jl:326
[7] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
@ GPUCompiler ~/.julia/packages/GPUCompiler/9rK1I/src/cache.jl:89
[8] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Int64, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, var"#1#2"{CuDeviceVector{Int64, 1}}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{UnitRange{Int64}, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ CUDA ~/.julia/packages/CUDA/Xt3hr/src/compiler/execution.jl:297
[9] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Int64, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, var"#1#2"{CuDeviceVector{Int64, 1}}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{UnitRange{Int64}, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}})
@ CUDA ~/.julia/packages/CUDA/Xt3hr/src/compiler/execution.jl:291
[10] macro expansion
@ ~/.julia/packages/CUDA/Xt3hr/src/compiler/execution.jl:102 [inlined]
[11] #launch_heuristic#234
@ ~/.julia/packages/CUDA/Xt3hr/src/gpuarrays.jl:17 [inlined]
[12] copyto!
@ ~/.julia/packages/GPUArrays/3sW6s/src/host/broadcast.jl:65 [inlined]
[13] copyto!
@ ./broadcast.jl:936 [inlined]
[14] copy
@ ~/.julia/packages/GPUArrays/3sW6s/src/host/broadcast.jl:47 [inlined]
[15] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Nothing, var"#1#2"{CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}}, Tuple{CuArray{UnitRange{Int64}, 1, CUDA.Mem.DeviceBuffer}}})
@ Base.Broadcast ./broadcast.jl:883
[16] map(::Function, ::CuArray{UnitRange{Int64}, 1, CUDA.Mem.DeviceBuffer})
@ GPUArrays ~/.julia/packages/GPUArrays/3sW6s/src/host/broadcast.jl:90
[17] rangesum(x::CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}, rr::CuArray{UnitRange{Int64}, 1, CUDA.Mem.DeviceBuffer})
@ Main ./REPL[4]:2
[18] top-level scope
@ REPL[8]:2
[19] top-level scope
@ ~/.julia/packages/CUDA/Xt3hr/src/initialization.jl:52
I would like to do this entire operation GPU-wise as a part of a bigger computation.
Metadata
Metadata
Assignees
Labels
cuda arrayStuff about CuArray.Stuff about CuArray.enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers