Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inline invoke #9642

Closed
wants to merge 14 commits into from
Closed

inline invoke #9642

wants to merge 14 commits into from

Conversation

yuyichao
Copy link
Contributor

@yuyichao yuyichao commented Jan 6, 2015

This is my attempt to improve issue #9608. I'm pretty sure there is sth I'm still missing though....

Benchmark with the following script

#!/usr/bin/julia

function f(a::Any, b::Any)
    global c = a + b * 2
end

function f(a::Integer, b::Integer)
    global c = a + b * 3
end

function f(a::Int, b::Int)
    global c = a + b * 4
end

macro timing(ex)
    quote
        println($(Expr(:quote, ex)))
        gc()
        @time for i in 1:10000000
            $(esc(ex))
        end
    end
end

function call_any()
    f(1.2, 3.4)
end

function call_integer()
    f(Int32(1), Int32(2))
end

function call_int()
    f(1, 2)
end

const f_any = (@which f(1.2, 3.4)).func
const f_integer = (@which f(Int32(1), Int32(2))).func
const f_int = (@which f(1, 2)).func

function meth_any()
    f_any(1.2, 3.4)
end

function meth_integer()
    f_integer(Int32(1), Int32(2))
end

function meth_int()
    f_int(1, 2)
end

function invoke_any()
    invoke(f, (Float64, Float64), 1.2, 3.4)
end

function invoke_integer()
    invoke(f, (Int32, Int32), Int32(1), Int32(2))
end

function invoke_int()
    invoke(f, (Int, Int), 1, 2)
end

function invoke_any_int()
    invoke(f, (Any, Any), 1, 2)
end

function invoke_integer_int()
    invoke(f, (Integer, Integer), 1, 2)
end

@timing call_any()
@timing call_integer()
@timing call_int()
println()
@timing meth_any()
@timing meth_integer()
@timing meth_int()
println()
@timing invoke_any()
@timing invoke_integer()
@timing invoke_int()
println()
@timing invoke_any_int()
@timing invoke_integer_int()
println()

Before,

call_any()
elapsed time: 0.150808637 seconds (160000000 bytes allocated, 37.06% gc time)
call_integer()
elapsed time: 0.041882309 seconds (0 bytes allocated)
call_int()
elapsed time: 0.041811296 seconds (0 bytes allocated)

meth_any()
elapsed time: 0.679259604 seconds (320001472 bytes allocated, 16.23% gc time)
meth_integer()
elapsed time: 0.4619528 seconds (5424 bytes allocated)
meth_int()
elapsed time: 0.452719865 seconds (1472 bytes allocated)

invoke_any()
elapsed time: 2.057187642 seconds (320002880 bytes allocated, 5.51% gc time)
invoke_integer()
elapsed time: 1.650398796 seconds (3136 bytes allocated)
invoke_int()
elapsed time: 1.255498639 seconds (2624 bytes allocated)

invoke_any_int()
elapsed time: 1.695137055 seconds (2112 bytes allocated)
invoke_integer_int()
elapsed time: 1.562752127 seconds (2112 bytes allocated)

After

call_any()
elapsed time: 0.140487996 seconds (160000000 bytes allocated, 35.90% gc time)
call_integer()
elapsed time: 0.04067296 seconds (0 bytes allocated)
call_int()
elapsed time: 0.041081744 seconds (0 bytes allocated)

meth_any()
elapsed time: 0.636057861 seconds (320001472 bytes allocated, 16.59% gc time)
meth_integer()
elapsed time: 0.364873117 seconds (5424 bytes allocated)
meth_int()
elapsed time: 0.389887527 seconds (1472 bytes allocated)

invoke_any()
elapsed time: 0.139580889 seconds (160000000 bytes allocated, 39.45% gc time)
invoke_integer()
elapsed time: 0.040794216 seconds (0 bytes allocated)
invoke_int()
elapsed time: 0.041527671 seconds (0 bytes allocated)

invoke_any_int()
elapsed time: 0.062740047 seconds (0 bytes allocated)
invoke_integer_int()
elapsed time: 0.063173429 seconds (0 bytes allocated)

@ViralBShah ViralBShah added the performance Must go faster label Jan 6, 2015
@JeffBezanson
Copy link
Sponsor Member

Wow, this is impressive. Very good work. I'll review it in more detail later.

A related change that should help a lot, and is likely easier to implement than this, is to add a case to emit_known_call for jl_f_invoke in codegen.cpp. The code there should do the method lookup at compile time, and generate a direct call, as we do now for jl_apply_generic. This will eliminate most of the overhead for calls that can't be inlined.

@yuyichao
Copy link
Contributor Author

yuyichao commented Jan 6, 2015

@JeffBezanson Yeah, I did it this way only because this is slightly easier to do for me... I tried to replace it with anonymous function like this after trying to inline it but somehow it segfault at compile time (of sysimg).

if meth.tvars == () && !meth.isstage
    return Expr(:call, meth.func, argexprs...)
end

I know this is totaly a hack but as I was testing with meth_* in the benchmark it should be faster than calling invoke and it shouldn't crash the compiler.......

@yuyichao
Copy link
Contributor Author

yuyichao commented Jan 7, 2015

I guess I'm still not very familiar with the order of these passes it still cannot inline invoke if it is wrapped....

yuyichao% cat invoke-inline.jl 
#!/usr/bin/julia -f

invoke_wrap(f::Function, ts::Tuple, args...) = invoke(f, ts, args...)

f(a, b) = a + b

g_invoke() = invoke(f, (Any, Any), 1, 2)
g_invoke_wrap() = invoke_wrap(f, (Any, Any), 1, 2)

println(@code_typed g_invoke())
println(@code_typed g_invoke_wrap())
yyc2:~/projects/explore/julia/invoke
yuyichao% ./invoke-inline.jl 
Any[:($(Expr(:lambda, Any[], Any[Any[:_var0,:_var1],Any[Any[:_var0,Int64,18],Any[:_var1,Int64,18]],Any[]], :(begin  # /home/yuyichao/projects/explore/julia/invoke/invoke-inline.jl, line 7:
        _var0 = 1
        _var1 = 2
        return (top(box))(Int64,(top(add_int))(_var0::Int64,_var1::Int64))
    end::Int64))))]
Any[:($(Expr(:lambda, Any[], Any[Any[:_var0,:_var1],Any[Any[:_var0,(Int64,Int64),0],Any[:_var1,Function,18]],Any[]], :(begin  # /home/yuyichao/projects/explore/julia/invoke/invoke-inline.jl, line 8:
        _var1 = f::ANY
        return invoke(_var1::F,(Any,Any),1,2)::ANY
    end::ANY))))]

atypes_l = Type[atypes...]
err_label = genlabel(sv)
after_err_label = genlabel(sv)
for i in 1:length(atypes_l)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when inlining arguments, you need to go in reverse order to preserve the order-of-execution (see https://github.com/yuyichao/julia/blob/inline-invoke/base/inference.jl#L2770-L2771 for an example)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is what I'm trying to do here. I've seen that part but I didn't really understand why. Isn't the arguments evaluated in the order they appears in the code?

Also the code generated seems to be currect here and that's why I didn't bother too much before sending the PR

julia> @noinline function get_next()
           global counter
           counter = counter + 1
           return counter
       end
get_next (generic function with 1 method)

julia> f(a, b) = (a, b)
f (generic function with 1 method)

julia> g() = invoke(f, (Integer, Integer), get_next(), get_next())
g (generic function with 1 method)

julia> @code_typed g()
1-element Array{Any,1}:
 :($(Expr(:lambda, Any[], Any[Any[:_var0,:_var1],Any[Any[:_var0,Any,18],Any[:_var1,Any,18]],Any[]], :(begin  # none, line 1:
        _var0 = get_next()::Any
        _var1 = get_next()::Any
        unless (isa)(_var0,Integer)::Bool goto 1
        unless (isa)(_var1,Integer)::Bool goto 1
        goto 2
        1: 
        (error)("invoke: argument type error")::Union()
        2: 
        return (top(tuple))(_var0::Integer,_var1::Integer)::(Integer,Integer)
    end::(Integer,Integer)))))

julia> g()
ERROR: counter not defined
 in get_next at ./no file:3
 in g at ./none:1

julia> counter = 0
0

julia> g()
(1,2)

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm waiting for a build now, but what if you change the second argument to invoke to (Any, Integer)?

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, i wasn't paying attention to the fact that you are always copying the argument to a temporary variable. there's no need to do that if the argument is effect_free / affect_free (the difference linguistically is subtle: effect-free means that it does not cause an effect on surrounding code (e.g. pure), whereas affect-free means it is not affected by surrounding code (e.g. immutable))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type check is done after all arguments are evaluated so changing to (Any, Integer) does not affect the evaluation of the arguments at all. (which is the same schematics with calling invoke function)

(Actually I think I might miss the case where evaluating the second argument to invoke has side effect)....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops, forgot to paste the output................

julia> @noinline function get_next()
           global counter
           counter = counter + 1
           return counter
       end
get_next (generic function with 1 method)

julia> counter = 0
0

julia> f(a, b) = (a, b)
f (generic function with 1 method)

julia> g() = invoke(f, (Any, Integer), get_next(), get_next())
g (generic function with 1 method)

julia> @code_typed g()
1-element Array{Any,1}:
g() :($(Expr(:lambda, Any[], Any[Any[:_var0,:_var1],Any[Any[:_var0,Any,18],Any[:_var1,Any,18]],Any[]], :(begin  # none, line 1:
        _var0 = get_next()::Any
        _var1 = get_next()::Any
        unless (isa)(_var1,Integer)::Bool goto 1
        goto 2
        1: 
        (error)("invoke: argument type error")::Union()
        2: 
        return (top(tuple))(_var0::Any,_var1::Integer)::(Any,Integer)
    end::(Any,Integer)))))

julia> g()
(1,2)

@yuyichao
Copy link
Contributor Author

yuyichao commented Jan 7, 2015

Actually it seems that the inlining of function is not very smart in some cases either (when the function is assigned to a variable) even if the variable is a const

yuyichao% cat invoke-inline.jl
#!/usr/bin/julia -f

invoke_wrap(f::Function, ts::Tuple, args...) = invoke(f, ts, args...)

f(a, b) = a + b

g_invoke() = invoke(f, (Any, Any), 1, 2)
g_invoke_wrap() = invoke_wrap(f, (Any, Any), 1, 2)
function g_invoke2()
    const tmp_f = f
    invoke(tmp_f, (Any, Any), 1, 2)
end
function g_call2()
    const tmp_f = f
    tmp_f(1, 2)
end

println(@code_typed g_invoke())
println(@code_typed g_invoke_wrap())
println(@code_typed g_invoke2())
println(@code_typed g_call2())
yyc2:~/projects/explore/julia/invoke
yuyichao% ./invoke-inline.jl
Any[:($(Expr(:lambda, Any[], Any[Any[:_var0,:_var1],Any[Any[:_var0,Int64,18],Any[:_var1,Int64,18]],Any[]], :(begin  # /home/yuyichao/projects/explore/julia/invoke/invoke-inline.jl, line 7:
        _var0 = 1
        _var1 = 2
        return (top(box))(Int64,(top(add_int))(_var0::Int64,_var1::Int64))
    end::Int64))))]
Any[:($(Expr(:lambda, Any[], Any[Any[:_var0,:_var1],Any[Any[:_var0,(Int64,Int64),0],Any[:_var1,Function,18]],Any[]], :(begin  # /home/yuyichao/projects/explore/julia/invoke/invoke-inline.jl, line 8:
        _var1 = f::ANY
        return invoke(_var1::F,(Any,Any),1,2)::ANY
    end::ANY))))]
Any[:($(Expr(:lambda, Any[], Any[Any[:tmp_f],Any[Any[:tmp_f,Function,18]],Any[]], :(begin  # /home/yuyichao/projects/explore/julia/invoke/invoke-inline.jl, line 10:
        const tmp_f::ANY
        tmp_f = f::ANY # line 11:
        return invoke(tmp_f::F,(Any,Any),1,2)::ANY
    end::ANY))))]
Any[:($(Expr(:lambda, Any[], Any[Any[:tmp_f],Any[Any[:tmp_f,Function,18]],Any[]], :(begin  # /home/yuyichao/projects/explore/julia/invoke/invoke-inline.jl, line 14:
        const tmp_f::ANY
        tmp_f = f::ANY # line 15:
        return (tmp_f::F)(1,2)::ANY
    end::ANY))))]

@yuyichao yuyichao force-pushed the inline-invoke branch 3 times, most recently from aae9fa1 to 33d8fe7 Compare January 7, 2015 13:13
@yuyichao
Copy link
Contributor Author

yuyichao commented Jan 8, 2015

@JeffBezanson Hopefully the last commit addresses some of your suggestions. There are some copy-paste's and I've also changed expr_type a little for supporint tuple types although it adds TypeVar at the same time....

julia> @noinline f(a, b) = a + b
f (generic function with 1 method)

julia> g() = invoke(f, (Any, Any), 1, 2)
g (generic function with 1 method)

julia> @code_typed g()
1-element Array{Any,1}:
 :($(Expr(:lambda, Any[], Any[Any[:_var0,:_var1],Any[Any[:_var0,Int64,18],Any[:_var1,Int64,18]],Any[]], :(begin  # none, line 1:
        return invoke(f,(Any,Any),1,2)::Int64
    end::Int64))))

julia> @code_llvm g();

define i64 @julia_g_42944() {
top:
  %0 = call i64 @julia_f_42945(i64 1, i64 2), !dbg !8
  ret i64 %0, !dbg !8
}

@yuyichao
Copy link
Contributor Author

yuyichao commented Jan 8, 2015

Another issue is that the lookup is done several times during compilation (3 times for a non-inlineable invoke). Not sure what is the best way to solve this...

@yuyichao
Copy link
Contributor Author

yuyichao commented Jan 9, 2015

P.S. any objection if I make a JL_GC_RETURN macro that does JL_GC_POP and return? It is how 80-90% of JL_GC_POP are used.... (yes I counted...)

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jan 9, 2015

i thought most would be value = compute(); JL_GC_POP; return value;, which is slightly more difficult to combine into a macro (since C doesn't have gensym). however, you might be able to write that as a static inline function inside julia.h (although then you might have issues declaring the return type in a general enough way).

regardless, I wouldn't object.

Another issue is that the lookup is done several times during compilation (3 times for a non-inlineable invoke). Not sure what is the best way to solve this...

I think method lookup does this too. perhaps not the best, but perhaps unavoidable

@yuyichao
Copy link
Contributor Author

yuyichao commented Jan 9, 2015

i thought most would be value = compute(); JL_GC_POP; return value;,

I was thinking about just combinine the pop and return.

which is slightly more difficult to combine into a macro (since C doesn't

using a prefix in this case is probably good enough. It won't be worse than
the JL_TRY etc case. I'm not sure whether it is ok to use the GNU typeof
extension though. Otherwise it won't be possible to declare a varable with
the correct type in C.

have gensym). however, you might be able to write that as a static inline
function inside julia.h (although then you might have issues declaring the
return type in a general enough way).

Would be possible in C++, although in that case there's much better way to
manage the GC stack.....

regardless, I wouldn't object.

Another issue is that the lookup is done several times during compilation
(3 times for a non-inlineable invoke). Not sure what is the best way to
solve this...

I think method lookup does this too.

Yes it is. And probably not less than invoke, which is why I didn't bother
doing anything fancier for invoke either. However, IIRC, replacing the
really slow lookup in invoke_tfunc with the new C-api actually have a
noticeable improvement in compilation (type inference) time. I'm wondering
if it would be beneficial to improve this.....


Reply to this email directly or view it on GitHub
#9642 (comment).

@yuyichao yuyichao force-pushed the inline-invoke branch 2 times, most recently from a9bbe0a to 18ed87a Compare January 9, 2015 15:25
@tkelman
Copy link
Contributor

tkelman commented Jan 9, 2015

cc @vtjnash our CI is failing assertions on mac and segfaulting on Windows. Any ideas how to fix?

@yuyichao
Copy link
Contributor Author

yuyichao commented Jan 9, 2015

@tkelman It might have sth to do with this PR but AFAIK it happens after I did a rebase on the current master.....

Also is there a reason that expr_type does not support tuple constant value? Adding it seems to break current code but I have no idea why....

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jan 9, 2015

We've seen those on master also. Best strategy is probably to replace that assert with jlbacktrace and dump a little more info about the environment

@yuyichao
Copy link
Contributor Author

yuyichao commented Jan 9, 2015

I see.. I checked the master CI just now and thought it was fine..... Now I saw this on the other PR I have... which really shouldn't have anything to do with it.....

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jan 10, 2015

Also is there a reason that expr_type does not support tuple constant value? Adding it seems to break current code but I have no idea why....

If I had to guess, it's probably because codegen assumes that if it comes across a type-tuple, it should be unboxed. but that isn't true of the current tuple implementation

@yuyichao yuyichao force-pushed the inline-invoke branch 2 times, most recently from d0a6cce to b250712 Compare January 10, 2015 13:34
@yuyichao
Copy link
Contributor Author

Just noticed another issue. Since the inference of _apply does pass on the arguments, type infering of _apply(invoke, ...) fails although the call can now be inlined....

julia> f(a, b) = a + b
f (generic function with 1 method)

julia> g(args...) = invoke(f, (Any, Any), args...)
g (generic function with 1 method)

julia> k() = g(1, 2)
k (generic function with 1 method)

julia> @code_typed k()
1-element Array{Any,1}:
 :($(Expr(:lambda, Any[], Any[Any[:_var2,:_var3,:_var4],Any[Any[:_var2,(Int64,Int64),0],Any[:_var3,Int64,18],Any[:_var4,Int64,18]],Any[]], :(begin  # none, line 1:
        (Any,Any)
        _var3 = 1
        _var4 = 2
        return (top(box))(Int64,(top(add_int))(_var3::Int64,_var4::Int64))
    end::Any))))

@dhoegh
Copy link
Contributor

dhoegh commented Jan 11, 2015

@yuyichao will your pull-request also improve the following?

function test1(f, n) 
    for i=1:n 
        f(i) 
    end 
end 
function test2(n) 
    for i=1:n 
        g(i) 
    end 
end 
function test3(f, n) 
    for i=1:n 
        invoke(f, (Int,), i) 
    end 
end 
g(i) = i 

test1(g,10) 
@time test1(g,10_000_000) 
test2(10) 
@time test2(10_000_000) 
test3(g,10) 
@time test3(g,10_000_000) 

Output:

elapsed time: 0.38846745 seconds (320032108 bytes allocated, 35.61% gc time) 
elapsed time: 1.259e-6 seconds (80 bytes allocated) 
elapsed time: 0.814524146 seconds (319983728 bytes allocated, 17.09% gc time) 

It will obviously help test3 but will it also help test1?. The related discussion on https://groups.google.com/forum/m/?fromgroups#!topic/julia-users/10n-50dOYQA

@yuyichao
Copy link
Contributor Author

I don't think it can improve any of these. The invoke inliner is not smarter than the generic function inliner and I haven't change anything wrt the normal function inliner.

It would make it easier to improve test3 if test1 is improved though .....

@yuyichao
Copy link
Contributor Author

Close this one in favor of #10964 (and possibly other ones) since this pr needs major refactor to keep up with the current master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants