Inlining-related performance regressions #6981

simonster · 2014-05-27T02:46:20Z

I benchmarked a simple loop that does nothing but sum x*conj(x) for values in a large vector, and the slowdown is very nearly a factor of 2.

This is inlined in Julia 0.2. It looks like the regression happened in e805fd6. I know that there were some tweaks to the inlining threshold in #6605 / 24011fc, but I don't see this getting inlined with that commit or master. cc @vtjnash

The text was updated successfully, but these errors were encountered:

vtjnash · 2014-05-27T03:54:43Z

basing this on the last time i checked (while working on tweaking that threshold), the issue is that it takes a lot of instructions (and a lot more julia box/unbox intrinsics) to describe the behavior of a complex operator

as such, i'm looking for suggestions (and evidence) on more advanced heuristics.

current thoughts include:

inlining next/done more aggressively
detecting loops and inlining more aggressively in predicted hotspots
inlining arithmetic functions more aggressively
inline functions taking lots of arguments more aggressively
incorporating information on whether the function uses only immutable types
inline functions returning new boxed values less aggressively

altering this should be relatively simple, if you want to propose some smarter heuristics, since it is handled through inline_worthy & it's call-sites

JeffBezanson · 2014-05-27T14:53:57Z

@simonster it would be good to post the exact code used, and/or add a perf test for this.

simonster · 2014-05-27T15:46:36Z

Here's the code I tested:

function f(a)
    b = 0.0+0.0im
    for x in a
        b += x*conj(x)
    end
    b
end
a = complex(rand(10^8), rand(10^8))
f(a)
@time f(a)

On my MacBook Pro, this takes ~0.18 seconds on Julia 0.2.1 and ~0.35 seconds for current master.

simonster · 2014-06-01T15:18:16Z

Another inlining-related regression case:

function f(a)
    x = 0
    for v in a
        x += v
    end
    x
end
a = randbool(1000000000)
f(a)

On my system this takes ~0.44 seconds in Julia 0.2.1 and ~2.6 seconds on current master. The culprit seems to be inlining of next for BitArrays. This could presumably be solved by the first item in @vtjnash's list above.

timholy · 2014-06-01T17:13:47Z

Since next also wasn't inlined in Julia 0.2.1, what do you think the difference is? Did randbool by any chance return an Array{Uint8,1} rather than BitArray{1}?

simonster · 2014-06-01T17:16:51Z

I believe that next was inlined in Julia 0.2.1:

julia> code_typed(f, (BitArray{1},))
1-element Array{Any,1}:
 :($(Expr(:lambda, {:a}, {{:x,:#s9,:#s8,:v,:_var0,:_var1},{{:a,BitArray{1},0},{:x,Int64,2},{:#s9,Int64,2},{:#s8,(Bool,Int64),18},{:v,Bool,18},{:_var0,Bool,2},{:_var1,Int64,2}},{}}, quote  # none, line 2:
        x = 0 # line 3:
        #s9 = 0
        1: 
        unless top(box)(Bool,top(not_int)(top(sle_int)(top(getfield)(a::BitArray{1},:len)::Int64,#s9::Int64)::Bool))::Bool goto 2
        _var0 = top(box)(Bool,top(not_int)(top(box)(Bool,top(and_int)(top(sle_int)(0,0)::Bool,===(top(box)(Uint64,top(and_int)(arrayref(top(getfield)(a::BitArray{1},:chunks)::Array{Uint64,1},top(box)(Int64,top(add_int)(top(box)(Int64,top(lshr_int)(#s9::Int64,top(box)($(Int32),top(trunc_int)($(Int32),6))::Int32))::Int64,1))::Int64)::Uint64,top(box)(Uint64,top(shl_int)(top(box)($(Uint64),1)::Uint64,top(box)($(Int32),top(trunc_int)($(Int32),top(box)(Int64,top(and_int)(#s9::Int64,63))::Int64))::Int32))::Uint64))::Uint64,top(box)($(Uint64),0)::Uint64)::Bool))::Bool))::Bool
        _var1 = top(box)(Int64,top(add_int)(#s9::Int64,1))::Int64
        v = _var0::Bool
        #s9 = _var1::Int64 # line 4:
        x = top(box)(Int64,top(add_int)(x::Int64,top(box)($(Int64),top(zext_int)($(Int64),v::Bool))::Int64))::Int64
        3: 
        goto 1
        2: 
        0:  # line 6:
        return x::Int64
    end)))

timholy · 2014-06-01T17:28:35Z

I see. Good catch. It seems small enough to be worthy of inlining even without taking into account that it happens to be called next.

JeffBezanson · 2014-06-01T17:48:39Z

Yes; it used to be that any "single expression" function was inlined. While clearly not sufficient, this was actually a surprisingly good heuristic.

timholy · 2014-06-01T18:00:33Z

Presumably an || isa_single_expr(ex) would give us the best of all worlds?

vtjnash · 2014-06-01T18:05:10Z

I had worked on tweaking the performance threshold, but forgot it wasn't on master yet: #6677

no, you don't want a heuristic that only considers SLOC. as part of my changes to inlining, I also significantly improved what is possible to inline into a single line, causing a very significant performance impact if you have it think a single expr is always worth inlining (the example code in the issue would have taken days to compile)

JeffBezanson · 2014-06-10T03:27:32Z

fixed by #7075

JeffBezanson added performance labels May 27, 2014

JeffBezanson added this to the 0.3 milestone May 27, 2014

simonster mentioned this issue Jun 1, 2014

Implement broadcasting for DataArrays/PooledDataArrays JuliaStats/DataArrays.jl#94

Merged

simonster changed the title ~~Complex multiplication not inlined~~ Inlining-related performance regressions Jun 2, 2014

JeffBezanson mentioned this issue Jun 9, 2014

inline inexact method matches, repeated #7075

Merged

JeffBezanson closed this as completed Jun 10, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inlining-related performance regressions #6981

Inlining-related performance regressions #6981

simonster commented May 27, 2014

vtjnash commented May 27, 2014

JeffBezanson commented May 27, 2014

simonster commented May 27, 2014

simonster commented Jun 1, 2014

timholy commented Jun 1, 2014

simonster commented Jun 1, 2014

timholy commented Jun 1, 2014

JeffBezanson commented Jun 1, 2014

timholy commented Jun 1, 2014

vtjnash commented Jun 1, 2014

JeffBezanson commented Jun 10, 2014

Inlining-related performance regressions #6981

Inlining-related performance regressions #6981

Comments

simonster commented May 27, 2014

vtjnash commented May 27, 2014

JeffBezanson commented May 27, 2014

simonster commented May 27, 2014

simonster commented Jun 1, 2014

timholy commented Jun 1, 2014

simonster commented Jun 1, 2014

timholy commented Jun 1, 2014

JeffBezanson commented Jun 1, 2014

timholy commented Jun 1, 2014

vtjnash commented Jun 1, 2014

JeffBezanson commented Jun 10, 2014