Skip to content
This repository has been archived by the owner on Mar 12, 2021. It is now read-only.

Invalid IR for reductions along trivial dimension #542

Closed
PhilipVinc opened this issue Dec 16, 2019 · 12 comments · Fixed by #575
Closed

Invalid IR for reductions along trivial dimension #542

PhilipVinc opened this issue Dec 16, 2019 · 12 comments · Fixed by #575
Labels

Comments

@PhilipVinc
Copy link

Describe the bug
Performing reduction operations along some dimensions, where the destination array has fewer dimensions than the source produces invalid IR. This works with Base arrays.

example:

julia> using CuArrays
julia> src = CuArrays.rand(10,10)
julia> dst_1 = similar(src, 10)
julia> dst_2 = similar(src, 10, 1)

julia> sum!(dst_2, src) # works
10×1 CuArray{Float32,2,Nothing}:
...

julia> sum!(dst_1, src) # fails

julia> sum!(collect(dst_2), collect(src)) # works
julia> sum!(collect(dst_1), collect(src)) # works

Both those methods work with Base arrays.

To Reproduce
The Minimal Working Example (MWE) for this bug:

using CuArrays
src = CuArrays.rand(10,10)
dst_1 = similar(src, 10)

# I guess the errors stems from Base.mapreduce, so...
Base.mapreducedim!(identity, +, dst_1, src)

Environment details (please complete this section)
Details on Julia:

julia> versioninfo()
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_DEPOT_PATH = :/opt/julia/global_depot/
  JULIA_LOAD_PATH = :/opt/julia/global_depot/environments/globalenv/


Julia packages:
 - CuArrays.jl: 1.5.0
 - CUDAnative.jl: 2.6.0
@PhilipVinc PhilipVinc added the bug label Dec 16, 2019
@PhilipVinc
Copy link
Author

If you can point me in the right direction, I would not mind fixing this

@vchuravy
Copy link
Member

The mapreducedim lives AFAIK in https://github.com/JuliaGPU/GPUArrays.jl/blob/master/src/mapreduce.jl

You can also try to follow the dispatch chain with @which, but it should lead you there.

@PhilipVinc
Copy link
Author

I just realised I had forgotten to include the error, so here it goes.

julia> # I guess the errors stems from Base.mapreduce, so...
       Base.mapreducedim!(identity, +, dst_1, src)
ERROR: InvalidIRError: compiling mapreducedim_kernel_serial(typeof(identity), typeof(+), CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global}, Tuple{Nothing,Nothing}) resulted in invalid LLVM IR
Reason: unsupported call to the Julia runtime (call to jl_f_getfield)
Stacktrace:
 [1] getindex at tuple.jl:24
 [2] map at tuple.jl:162 (repeats 2 times)
 [3] mapreducedim_kernel_serial at /home/vicentinif/.julia/packages/CuArrays/ZYCpV/src/mapreduce.jl:5
Stacktrace:
 [1] check_ir(::CUDAnative.CompilerJob, ::LLVM.Module) at /home/vicentinif/.julia/packages/CUDAnative/RhbZ0/src/compiler/validation.jl:114
 [2] macro expansion at /home/vicentinif/.julia/packages/CUDAnative/RhbZ0/src/compiler/driver.jl:188 [inlined]
 [3] macro expansion at /opt/julia/global_depot/packages/TimerOutputs/7Id5J/src/TimerOutput.jl:228 [inlined]
 [4] #codegen#156(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.codegen), ::Symbol, ::CUDAnative.CompilerJob) at /home/vicentinif/.julia/packages/CUDAnative/RhbZ0/src/compiler/driver.jl:186
 [5] #codegen at ./none:0 [inlined]
 [6] #compile#155(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.compile), ::Symbol, ::CUDAnative.CompilerJob) at /home/vicentinif/.julia/packages/CUDAnative/RhbZ0/src/compiler/driver.jl:47
 [7] #compile at ./none:0 [inlined]
 [8] #compile#154 at /home/vicentinif/.julia/packages/CUDAnative/RhbZ0/src/compiler/driver.jl:28 [inlined]
 [9] #compile at ./none:0 [inlined] (repeats 2 times)
 [10] macro expansion at /home/vicentinif/.julia/packages/CUDAnative/RhbZ0/src/execution.jl:403 [inlined]
 [11] #cufunction#202(::Nothing, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(CUDAnative.cufunction), ::typeof(CuArrays.mapreducedim_kernel_serial), ::Type{Tuple{typeof(identity),typeof(+),CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Nothing,Nothing}}}) at /home/vicentinif/.julia/packages/CUDAnative/RhbZ0/src/execution.jl:368
 [12] cufunction(::Function, ::Type) at /home/vicentinif/.julia/packages/CUDAnative/RhbZ0/src/execution.jl:368
 [13] _mapreducedim!(::Function, ::Function, ::CuArray{Float32,1,Nothing}, ::CuArray{Float32,2,Nothing}) at /home/vicentinif/.julia/packages/CUDAnative/RhbZ0/src/execution.jl:176
 [14] mapreducedim!(::Function, ::Function, ::CuArray{Float32,1,Nothing}, ::CuArray{Float32,2,Nothing}) at ./reducedim.jl:274
 [15] top-level scope at REPL[4]:2

@PhilipVinc
Copy link
Author

So according to the above, the culprit is mapreducedim_kernel_serial from CuArrays/src/mapreduce.jl

function mapreducedim_kernel_serial(f, op, R, A, range)
    I = @cuindex R
    newrange = map((r, i) -> r === nothing ? i : r, range, I)
    for I′ in CartesianIndices(newrange)
        @inbounds R[I...] = op(R[I...], f(A[I′]))
    end
    return
end

and in particular this line

newrange = map((r, i) -> r === nothing ? i : r, range, I)

@PhilipVinc
Copy link
Author

Ok, wild guess here because I have no idea on how to properly debug device codegen...

@cuindex A essentially calls CuArrays.ind2sub_

So...

julia> I = CuArrays.ind2sub_(rand(10,10), 1)
(1, 1)

julia> I = CuArrays.ind2sub_(rand(10), 1)
(1,)

# and therefore...
julia> newrange = map((r, i) -> r === nothing ? i : r, (nothing, nothing), CuArrays.ind2sub_(rand(10,10), 1))
(1, 1)

Julia> newrange = map((r, i) -> r === nothing ? i : r, (nothing, nothing), CuArrays.ind2sub_(rand(10), 1))
ERROR: BoundsError: attempt to access ()
  at index [1]
Stacktrace:
 [1] getindex(::Tuple, ::Int64) at ./tuple.jl:24
 [2] map at ./tuple.jl:162 [inlined] (repeats 2 times)
 [3] top-level scope at REPL[24]:1

If you wonder how I got the (nothing, nothing) tuple is from the signature of the mapreducedim_kernel_serial error in the previous message.

@maleadt Do you have any idea on how to proceed?

@maleadt
Copy link
Member

maleadt commented Dec 23, 2019

@maleadt Do you have any idea on how to proceed?

I didn't write this implementation, so I'm not sure how it works. But recent profiling has shown that even the parallel implementation is pretty damn slow, so we should just reimplement the mapreduce implementation from scratch. In the meantime, I suggest disabling the serial version and always using the parallel one, which seems to work here. Could you verify that works for your use case, and open a PR?

@PhilipVinc
Copy link
Author

Indeed, commenting out

        #if x_thr >= 8
            blk, thr = (Rlength - 1) ÷ y_thr + 1, (x_thr, y_thr, 1)
            parallel_kernel(parallel_kargs...; threads=thr, blocks=blk)
        #else
        #    # not enough work, fall back to serial reduction
        #    range = ifelse.(length.(axes(R)) .== 1, axes(A), nothing)
        #    blk, thr = cudims(R)
        #    @cuda(blocks=blk, threads=thr, mapreducedim_kernel_serial(f, op, R, A, range))
        #end
    end

in Base._mapreducedim! makes the error go away.

@PhilipVinc
Copy link
Author

I can open a PR, but are you sure this won't lead to slowdowns elsewhere?

@maleadt
Copy link
Member

maleadt commented Jan 7, 2020

When performance matters, the user will be using large arrays and not using the serial fallback, so I think this is a safe thing to do.

@PhilipVinc
Copy link
Author

This is still not solved.

julia> a=CuArrays.rand(5,10)
5×10 CuArray{Float32,2,Nothing}:
 0.0421446  0.267358   0.630052   0.990972    0.152812   0.687188  0.15777
 0.73055    0.0208062  0.529826   0.879257     0.0119244  0.302241  0.567767
 0.939997   0.833147   0.812721   0.999592     0.172571   0.807677  0.950438
 0.843176   0.721778   0.0592983  0.206773     0.478356   0.407127  0.449898
 0.61159    0.608743   0.479727   0.43722      0.805054   0.751322  0.982895

julia> b=CuArrays.rand(5)
5-element CuArray{Float32,1,Nothing}:
 0.9893592
 0.069014914
 0.7996987
 0.6109471
 0.31840062

julia> mean!(b, a)
ERROR: InvalidIRError: compiling mapreducedim_kernel_serial(typeof(identity), typeof(Base.add_sum), CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global}, Tuple{Nothing,Nothing}) resulted in invalid LLVM IR
Reason: unsupported call to the Julia runtime (call to jl_f_getfield)
Stacktrace:
 [1] getindex at tuple.jl:24
 [2] map at tuple.jl:162 (repeats 2 times)
 [3] mapreducedim_kernel_serial at /home/vicentinif/.julia/packages/CuArrays/A6GUx/src/mapreduce.jl:7
Stacktrace:
 [1] check_ir(::CUDAnative.CompilerJob, ::LLVM.Module) at /home/vicentinif/.julia/packages/CUDAnative/hfulr/src/compiler/validation.jl:116
 [2] macro expansion at /home/vicentinif/.julia/packages/CUDAnative/hfulr/src/compiler/driver.jl:193 [inlined]
 [3] macro expansion at /home/vicentinif/.julia/packages/TimerOutputs/7Id5J/src/TimerOutput.jl:228 [inlined]
 [4] #codegen#156(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.codegen), ::Symbol, ::CUDAnative.CompilerJob) at /home/vicentinif/.julia/packages/CUDAnative/hfulr/src/compiler/driver.jl:191
 [5] #codegen at ./none:0 [inlined]
 [6] #compile#155(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.compile), ::Symbol, ::CUDAnative.CompilerJob) at /home/vicentinif/.julia/packages/CUDAnative/hfulr/src/compiler/driver.jl:52
 [7] #compile at ./none:0 [inlined]
 [8] #compile#154 at /home/vicentinif/.julia/packages/CUDAnative/hfulr/src/compiler/driver.jl:33 [inlined]
 [9] #compile at ./none:0 [inlined] (repeats 2 times)
 [10] macro expansion at /home/vicentinif/.julia/packages/CUDAnative/hfulr/src/execution.jl:393 [inlined]
 [11] #cufunction#200(::Nothing, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(CUDAnative.cufunction), ::typeof(CuArrays.mapreducedim_kernel_serial), ::Type{Tuple{typeof(identity),typeof(Base.add_sum),CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Nothing,Nothing}}}) at /home/vicentinif/.julia/packages/CUDAnative/hfulr/src/execution.jl:360
 [12] cufunction(::Function, ::Type) at /home/vicentinif/.julia/packages/CUDAnative/hfulr/src/execution.jl:360
 [13] _mapreducedim!(::Function, ::Function, ::CuArray{Float32,1,Nothing}, ::CuArray{Float32,2,Nothing}) at /home/vicentinif/.julia/packages/CUDAnative/hfulr/src/execution.jl:179
 [14] mapreducedim! at ./reducedim.jl:274 [inlined]
 [15] #sum!#599 at ./reducedim.jl:674 [inlined]
 [16] #sum! at ./none:0 [inlined]
 [17] #sum!#600 at ./reducedim.jl:676 [inlined]
 [18] #sum! at ./none:0 [inlined]
 [19] mean!(::CuArray{Float32,1,Nothing}, ::CuArray{Float32,2,Nothing}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/Statistics/src/Statistics.jl:126
 [20] top-level scope at REPL[26]:1

with

(neural_dev) pkg> st
    Status `~/neural_dev/Project.toml`
  [c5f51814] CUDAdrv v6.0.0
  [3a865a2d] CuArrays v1.7.3
  [587475ba] Flux v0.10.3
  [872c559c] NNlib v0.6.6
  [eb923273] NeuralQuantum v0.2.0 #master (https://github.com/PhilipVinc/NeuralQuantum.jl/)
  [e88e6eb3] Zygote v0.4.9

@maleadt
Copy link
Member

maleadt commented Mar 9, 2020

It has. Not part of a release yet.

@PhilipVinc
Copy link
Author

ah! sorry

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants