Speculative inlining/devirtualization #10805

yuyichao · 2015-04-13T22:34:03Z

AFAIK, julia only optimize/inline functions when the type inference can narrow down the arguments to a single type. However, in some cases this is effectively throughing away the hard work of the type inference if it had narrow down the types to a few (>1). In the following example, manually doing type assertion leading to much better code generated.

(It is when I was exploring this possibility that I discovered bug #9765)

I thought this might not be too important since (at least for numerical calculation) people that care about performance should probably ensure type stability in julia. However, I came across this blog post series and changed my mind. It seems that this is a technique used in mordern compilers and can have a non-negligible effect even for not so dumb code.

I thought it shouldn't be hard to implement this and was going to code sth up a few month ago. However, I also have another generic question about implementing sth like this (also what I was trying to do in #9642) in julia. It seems that for a non-inlined function call, function lookup is done at least three times, typeinf, codegen and in jl_apply_generic. Is there any method/plan to keep this info and not do repeating work?

julia> f(::Float64) = 3
f (generic function with 1 method)

julia> f(::Int) = 2
f (generic function with 2 methods)

julia> k1(x) = begin
       a = x > 0 ? 1 : 1.0
       f(a)
       end
k1 (generic function with 1 method)

julia> k2(x) = begin
       a = x > 0 ? 1 : 1.0
       isa(a, Int) ? f(a::Int) : f(a::Float64)
       end
k2 (generic function with 1 method)

julia> @code_typed k1(1)
1-element Array{Any,1}:
 :($(Expr(:lambda, Any[:x], Any[Any[:a],Any[Any[:x,Int64,0],Any[:a,Any,2]],Any[],Any[]], :(begin  # none, line 1: # none, line 2:
        unless (top(slt_int))(0,x::Int64)::Bool goto 0
        a = 1
        goto 1
        0: 
        a = 1.0
        1:  # line 3:
        return f(a::Union(Float64,Int64))::Int64
    end::Int64))))

julia> @code_typed k2(1)
1-element Array{Any,1}:
 :($(Expr(:lambda, Any[:x], Any[Any[:a],Any[Any[:x,Int64,0],Any[:a,Any,2]],Any[],Any[]], :(begin  # none, line 1: # none, line 2:
        unless (top(slt_int))(0,x::Int64)::Bool goto 0
        a = 1
        goto 1
        0: 
        a = 1.0
        1:  # line 3:
        unless isa(a::Union(Float64,Int64),Int)::Bool goto 2
        return 2
        2: 
        return 3
    end::Int64))))

The text was updated successfully, but these errors were encountered:

vtjnash · 2015-04-13T22:47:50Z

actually, most of the code for optimizing bits of this in the generated code already exists (from #7075). it had to be temporarily disabled due to an issue with the representation of tuple types, but I think that may be resolved by #10380 in the near future.

yuyichao · 2015-04-25T21:38:30Z

@vtjnash Now that #10380 is merged. Is it possible to re-enable those optmizations now? (Is it a matter of setting inline_incompletematch_allowed to true?)

vtjnash · 2015-04-26T18:44:41Z

sure, it seems to work for me (with some corrections in 965561a and d3843d0 that I'm preparing for pushing to master) except for #11015

carnaval · 2015-05-17T17:23:56Z

@vtjnash correct me if I'm wrong but this handle the case |matches| == 1. It is good infrastructure to have when we want to implement what @yuyichao wants which is inline method caching. The problem with this technique is that it will make #265 worse, so it will have to wait until that one is fixed IMO.

What I'm proposing at https://github.com/JuliaLang/julialang.github.com/blob/master/gsoc/2015/index.md#project-specialized-call-site-method-caching is easier since there is an addtional layer of indirection so we can update those small caches when method def are added. It should take care of the biggest chunk of the overhead which is dynamic dispatch and not the actual function call.

yuyichao · 2015-05-17T17:28:56Z

@carnaval (Posted twice?)

Yeah I realized that any more inlining (including what I am trying for invoke) will make #265 worse..... Although when can we really solve that one?

carnaval · 2015-05-17T17:36:07Z

(yeah wifi is flaky here)

We can solve it whenever someone does it :). The problem is that while the implementation itself is conceptually not hard (record every callsite "assumptions" made by inference and whether it was inlined or not, invert the graph, invalidate code transitively on method def and stop when the assumptions are not violated & method was not inlined), the two big problems I see are :

function pointers could be scattered in a lot of places pointing to old code
it will make perf worse in general especially with a naive implementation, so it's a bit unrewarding

Keno · 2018-07-03T01:02:39Z

We basically do this now.

yuyichao added the performance Must go faster label Jun 30, 2015

quinnj mentioned this issue May 28, 2017

Performance of Union{T1, T2} #22097

Closed

Keno closed this as completed Jul 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speculative inlining/devirtualization #10805

Speculative inlining/devirtualization #10805

yuyichao commented Apr 13, 2015

vtjnash commented Apr 13, 2015

yuyichao commented Apr 25, 2015

vtjnash commented Apr 26, 2015

carnaval commented May 17, 2015

yuyichao commented May 17, 2015

carnaval commented May 17, 2015

Keno commented Jul 3, 2018

Speculative inlining/devirtualization #10805

Speculative inlining/devirtualization #10805

Comments

yuyichao commented Apr 13, 2015

vtjnash commented Apr 13, 2015

yuyichao commented Apr 25, 2015

vtjnash commented Apr 26, 2015

carnaval commented May 17, 2015

yuyichao commented May 17, 2015

carnaval commented May 17, 2015

Keno commented Jul 3, 2018