optimizer: inline abstract union-split callsite #44512

aviatesk · 2022-03-08T09:20:36Z

Currently the optimizer handles abstract callsite only when there is a
single dispatch candidate (in most cases), and so inlining and static-dispatch
are prohibited when the callsite is union-split (in other word, union-split
happens only when all the dispatch candidates are concrete).

However, there are certain patterns of code (most notably our Julia-level compiler code)
that inherently need to deal with abstract callsite.
The following example is taken from Core.Compiler utility:

julia> @inline isType(@nospecialize t) = isa(t, DataType) && t.name === Type.body.name
isType (generic function with 1 method)

julia> code_typed((Any,)) do x # abstract, but no union-split, successful inlining
           isType(x)
       end |> only
CodeInfo(
1 ─ %1 = (x isa Main.DataType)::Bool
└──      goto #3 if not %1
2 ─ %3 = π (x, DataType)
│   %4 = Base.getfield(%3, :name)::Core.TypeName
│   %5 = Base.getfield(Type{T}, :name)::Core.TypeName
│   %6 = (%4 === %5)::Bool
└──      goto #4
3 ─      goto #4
4 ┄ %9 = φ (#2 => %6, #3 => false)::Bool
└──      return %9
) => Bool

julia> code_typed((Union{Type,Nothing},)) do x # abstract, union-split, unsuccessful inlining
           isType(x)
       end |> only
CodeInfo(
1 ─ %1 = (isa)(x, Nothing)::Bool
└──      goto #3 if not %1
2 ─      goto #4
3 ─ %4 = Main.isType(x)::Bool
└──      goto #4
4 ┄ %6 = φ (#2 => false, #3 => %4)::Bool
└──      return %6
) => Bool

(note that this is a limitation of the inlining algorithm, and so any
user-provided hints like callsite inlining annotation doesn't help here)

This commit enables inlining and static dispatch for abstract union-split callsite.
The core idea here is that we can simulate our dispatch semantics by
generating isa checks in order of the specialities of dispatch candidates:

julia> code_typed((Union{Type,Nothing},)) do x # union-split, unsuccessful inlining
                  isType(x)
              end |> only
CodeInfo(
1 ─ %1  = (isa)(x, Nothing)::Bool
└──       goto #3 if not %1
2 ─       goto #9
3 ─ %4  = (isa)(x, Type)::Bool
└──       goto #8 if not %4
4 ─ %6  = π (x, Type)
│   %7  = (%6 isa Main.DataType)::Bool
└──       goto #6 if not %7
5 ─ %9  = π (%6, DataType)
│   %10 = Base.getfield(%9, :name)::Core.TypeName
│   %11 = Base.getfield(Type{T}, :name)::Core.TypeName
│   %12 = (%10 === %11)::Bool
└──       goto #7
6 ─       goto #7
7 ┄ %15 = φ (#5 => %12, #6 => false)::Bool
└──       goto #9
8 ─       Core.throw(ErrorException("fatal error in type inference (type bound)"))::Union{}
└──       unreachable
9 ┄ %19 = φ (#2 => false, #7 => %15)::Bool
└──       return %19
) => Bool

Inlining/static-dispatch of abstract union-split callsite will improve
the performance in such situations (and so this commit will improve the
latency of our JIT compilation). Especially, this commit helps us avoid
excessive specializations of Core.Compiler code by statically-resolving
@nospecialized callsites, and as the result, the # of precompiled
statements is now reduced from 1956 (master) to 1901 (this commit).

And also, as a side effect, the implementation of our inlining algorithm
gets much simplified now since we no longer need the previous special
handlings for abstract callsites.

One possible drawback would be increased code size.
This change seems to certainly increase the size of sysimage,
but I think these numbers are in an acceptable range:

master

❯ du -sh usr/lib/julia/*
 17M    usr/lib/julia/corecompiler.ji
188M    usr/lib/julia/sys-o.a
164M    usr/lib/julia/sys.dylib
 23M    usr/lib/julia/sys.dylib.dSYM
101M    usr/lib/julia/sys.ji

this commit

❯ du -sh usr/lib/julia/*
 17M    usr/lib/julia/corecompiler.ji
190M    usr/lib/julia/sys-o.a
166M    usr/lib/julia/sys.dylib
 23M    usr/lib/julia/sys.dylib.dSYM
102M    usr/lib/julia/sys.ji

aviatesk · 2022-03-08T09:21:03Z

@nanosoldier runbenchmarks("inference", vs=":master")

nanosoldier · 2022-03-08T09:54:35Z

Your benchmark job has completed - no performance regressions were detected. A full report can be found here.

aviatesk · 2022-03-08T10:08:41Z

Ok, the "inference" benchmark result looks good. Can I ask your reviews @Keno and @vtjnash ?

@nanosoldier runbenchmarks(!"scalar", vs=":master")

aviatesk · 2022-03-08T10:19:18Z

base/compiler/ssair/inlining.jl

-    if handled_all_cases && revisit_idx !== nothing
-        # If there's only one case that's not a dispatchtuple, we can
-        # still unionsplit by visiting all the other cases first.
-        # This is useful for code like:
-        #   foo(x::Int) = 1
-        #   foo(@nospecialize(x::Any)) = 2
-        # where we where only a small number of specific dispatchable
-        # cases are split off from an ::Any typed fallback.
-        (i, j) = revisit_idx
-        match = infos[i].results[j]
-        handled_all_cases &= handle_match!(match, argtypes, flag, state, cases, true)


This PR subsumes the improvement of #44421 , so I just deleted this special handling. The test case added in #44421 is untouched and succeeds to run on this PR.

nanosoldier · 2022-03-08T16:07:31Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

base/compiler/ssair/inlining.jl

aviatesk · 2022-03-09T05:30:47Z

@nanosoldier runbenchmarks("inference", vs=":master")

nanosoldier · 2022-03-09T06:04:19Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

base/compiler/ssair/inlining.jl

aviatesk · 2022-03-14T12:53:25Z

@nanosoldier runbenchmarks("inference", vs=":master")

nanosoldier · 2022-03-14T13:26:38Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

Keno · 2022-03-14T15:44:45Z

Are we sure that this actually works in general? Method matching is very complicated, e.g. when matching against diagonal signatures that I'm pretty sure the current isa-generating code will not handle properly. We should be able to split all cases that inference decided to unionsplit, but we'd have to remember what arguments it split on and then generate an appropriate isa nest for those. Unless I'm missing something.

Keno · 2022-03-14T15:47:09Z

E.g., I just tried the simplest possible case I could think of here:

julia> f(x::Int, y::Int) = 1
f (generic function with 3 methods)

julia> f(x::Any, y::Int) = 2
f (generic function with 3 methods)

julia> f(x::Int, y::Any) = 3
f (generic function with 3 methods)

julia> g(x,y) = f(x,y)
g (generic function with 1 method)

julia> code_typed(g, Tuple{Any, Any})
ERROR: AssertionError: invalid order of dispatch candidate
Stacktrace:
  [1] handle_cases!(ir::Core.Compiler.IRCode, idx::Int64, stmt::Expr, atype::Any, cases::Vector{Core.Compiler.InliningCase}, fully_covered::Bool, todo::Vector{Pair{Int64, Any}}, params::Core.Compiler.OptimizationParams)
    @ Core.Compiler ./compiler/ssair/inlining.jl:1309

aviatesk · 2022-03-16T03:55:31Z

@nanosoldier runbenchmarks(!"scalar", vs=":master")

nanosoldier · 2022-03-16T09:49:35Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

aviatesk · 2022-03-17T06:57:57Z

Thanks @Keno for pointing out that assertion is not valid.
Now I concluded we can just process dispatch candidates in order without doing the assertion, since we should meet the conditions that are necessary for the isa-checks to simulate the semantic dispatch semantics. Especially because:

we bail out from cases when there are any ambiguity
ml_matches already sorted matches in order of their specificity when there are no ambiguity
we excluded case.sig::UnionAll cases and so we don't need to account for type-equality constraint from type variables

I also added this comment describing the transformation of ir_inline_union_split!:

ir_inline_unionsplit!
The core idea of this function is to simulate the dispatch semantics by generating
(flat) isa-checks corresponding to the signatures of union-split dispatch candidates,
and then inline their bodies into each isa-conditional block.

This isa-based virtual dispatch requires some pre-conditions to hold in order to simulate
the actual semantics correctly.

The first one is that these dispatch candidates need to be processed in order of their specificity,
and the corresponding isa-checks should reflect the method specificities, since now their
signatures are not necessarily concrete.
Fortunately, ml_matches should already sorted them in that way, except cases when there is
any ambiguity, from which we already bail out at this point.

Another consideration is type equality constraint from type variables: the isa-checks are
not enough to simulate the dispatch semantics in cases like:

Given a definition:
f(x::T, y::T) where T<:Integer = ...
Transform a callsite:
(x::Any, y::Any)
Into the optimized form:
if isa(x, Integer) && isa(y, Integer)
    f(x::Integer, y::Integer)
else
    f(x::Integer, y::Integer)
end
But again, we should already bail out from such cases at this point, essentially by
excluding cases where case.sig::UnionAll.

In short, here we can process the dispatch candidates in order, assuming we haven't changed
their order somehow somewhere up to this point.

Now I think we can move this forward once we confirm that this PR doesn't come with any regression, which may happen from performance penalty of isa-checks with abstract types, as @vtjnash pointed out to me on a meeting.

aviatesk · 2022-03-17T07:03:29Z

@nanosoldier runbenchmarks("linalg" || "simd" || "inference", vs=":master")

nanosoldier · 2022-03-17T09:28:16Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

Keno · 2022-03-17T10:32:24Z

I'm still quite skeptical that this works, because in general, method matching cannot necessarily be compiled down to an isa nest. Maybe there's some argument here that I'm missing for why it should work here, but I'm skeptical. I guess we'll run PkgEval and see if that comes up with any good examples.

nanosoldier · 2022-03-24T10:12:16Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

nanosoldier · 2022-03-24T14:42:11Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

vtjnash · 2022-03-24T16:15:10Z

@nanosoldier runbenchmarks("comprehension", vs=":master")

nanosoldier · 2022-03-24T16:30:51Z

Your benchmark job has completed - no performance regressions were detected. A full report can be found here.

nanosoldier · 2022-03-24T19:40:27Z

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

aviatesk · 2022-03-24T23:44:45Z

@nanosoldier runtests(["BasicInterpolators", "CategoricalDistributions", "Diagonalizations", "Evolutionary", "GeometryPrimitives", "MLJSerialization", "OptimizationAlgorithms", "PointwiseKDEs", "PolyaGammaSamplers", "SimpleBufferStream", "SquidGame", "Caesar", "DroneSurveillance", "EclipsingBinaryStars", "PowerFlows", "SBMLToolkit", "Thebes"], vs = ":master")

nanosoldier · 2022-03-25T03:24:28Z

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

nanosoldier · 2022-03-25T03:36:30Z

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

aviatesk · 2022-03-25T03:39:10Z

@nanosoldier runtests(["Diagonalizations", "Evolutionary", "PointwiseKDEs", "PolyaGammaSamplers", "SquidGame"], vs = ":master")

nanosoldier · 2022-03-25T03:54:22Z

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

aviatesk · 2022-03-25T04:27:06Z

The only real failure is from SquidGame.jl. Posted a reproducer here.

aviatesk · 2022-03-29T14:08:06Z

@nanosoldier runbenchmarks(!"scalar", vs=":master")

nanosoldier · 2022-03-29T20:00:24Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@inline

Currently the optimizer handles abstract callsite only when there is a single dispatch candidate (in most cases), and so inlining and static-dispatch are prohibited when the callsite is union-split (in other word, union-split happens only when all the dispatch candidates are concrete). However, there are certain patterns of code (most notably our Julia-level compiler code) that inherently need to deal with abstract callsite. The following example is taken from `Core.Compiler` utility: ```julia julia> @inline isType(@nospecialize t) = isa(t, DataType) && t.name === Type.body.name isType (generic function with 1 method) julia> code_typed((Any,)) do x # abstract, but no union-split, successful inlining isType(x) end |> only CodeInfo( 1 ─ %1 = (x isa Main.DataType)::Bool └── goto #3 if not %1 2 ─ %3 = π (x, DataType) │ %4 = Base.getfield(%3, :name)::Core.TypeName │ %5 = Base.getfield(Type{T}, :name)::Core.TypeName │ %6 = (%4 === %5)::Bool └── goto #4 3 ─ goto #4 4 ┄ %9 = φ (#2 => %6, #3 => false)::Bool └── return %9 ) => Bool julia> code_typed((Union{Type,Nothing},)) do x # abstract, union-split, unsuccessful inlining isType(x) end |> only CodeInfo( 1 ─ %1 = (isa)(x, Nothing)::Bool └── goto #3 if not %1 2 ─ goto #4 3 ─ %4 = Main.isType(x)::Bool └── goto #4 4 ┄ %6 = φ (#2 => false, #3 => %4)::Bool └── return %6 ) => Bool ``` (note that this is a limitation of the inlining algorithm, and so any user-provided hints like callsite inlining annotation doesn't help here) This commit enables inlining and static dispatch for abstract union-split callsite. The core idea here is that we can simulate our dispatch semantics by generating `isa` checks in order of the specialities of dispatch candidates: ```julia julia> code_typed((Union{Type,Nothing},)) do x # union-split, unsuccessful inlining isType(x) end |> only CodeInfo( 1 ─ %1 = (isa)(x, Nothing)::Bool └── goto #3 if not %1 2 ─ goto #9 3 ─ %4 = (isa)(x, Type)::Bool └── goto #8 if not %4 4 ─ %6 = π (x, Type) │ %7 = (%6 isa Main.DataType)::Bool └── goto #6 if not %7 5 ─ %9 = π (%6, DataType) │ %10 = Base.getfield(%9, :name)::Core.TypeName │ %11 = Base.getfield(Type{T}, :name)::Core.TypeName │ %12 = (%10 === %11)::Bool └── goto #7 6 ─ goto #7 7 ┄ %15 = φ (#5 => %12, #6 => false)::Bool └── goto #9 8 ─ Core.throw(ErrorException("fatal error in type inference (type bound)"))::Union{} └── unreachable 9 ┄ %19 = φ (#2 => false, #7 => %15)::Bool └── return %19 ) => Bool ``` Inlining/static-dispatch of abstract union-split callsite will improve the performance in such situations (and so this commit will improve the latency of our JIT compilation). Especially, this commit helps us avoid excessive specializations of `Core.Compiler` code by statically-resolving `@nospecialize`d callsites, and as the result, the # of precompiled statements is now reduced from `1956` ([`master`](dc45d77)) to `1901` (this commit). And also, as a side effect, the implementation of our inlining algorithm gets much simplified now since we no longer need the previous special handlings for abstract callsites. One possible drawback would be increased code size. This change seems to certainly increase the size of sysimage, but I think these numbers are in an acceptable range: > [`master`](dc45d77) ``` ❯ du -sh usr/lib/julia/* 17M usr/lib/julia/corecompiler.ji 188M usr/lib/julia/sys-o.a 164M usr/lib/julia/sys.dylib 23M usr/lib/julia/sys.dylib.dSYM 101M usr/lib/julia/sys.ji ``` > this commit ``` ❯ du -sh usr/lib/julia/* 17M usr/lib/julia/corecompiler.ji 190M usr/lib/julia/sys-o.a 166M usr/lib/julia/sys.dylib 23M usr/lib/julia/sys.dylib.dSYM 102M usr/lib/julia/sys.ji ```

aviatesk · 2022-03-30T01:33:58Z

Going to merge this if CI runs successfully.

timholy · 2022-04-29T09:19:55Z

Probably worth cross-referencing #45051 (comment) from here.

aviatesk added compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) compiler:latency Compiler latency labels Mar 8, 2022

aviatesk force-pushed the avi/abstractinline branch from 15f74c7 to 1725b3b Compare March 8, 2022 10:07

aviatesk commented Mar 8, 2022

View reviewed changes

vtjnash reviewed Mar 8, 2022

View reviewed changes

base/compiler/ssair/inlining.jl Outdated Show resolved Hide resolved

aviatesk force-pushed the avi/abstractinline branch 2 times, most recently from bf346fb to d76f2a4 Compare March 9, 2022 05:28

aviatesk force-pushed the avi/abstractinline branch from d76f2a4 to c72f482 Compare March 14, 2022 01:57

aviatesk commented Mar 14, 2022

View reviewed changes

base/compiler/ssair/inlining.jl Outdated Show resolved Hide resolved

aviatesk force-pushed the avi/abstractinline branch from c72f482 to 466b592 Compare March 14, 2022 12:53

aviatesk force-pushed the avi/abstractinline branch 2 times, most recently from 5cf29ed to b0eca9a Compare March 16, 2022 03:55

aviatesk force-pushed the avi/abstractinline branch from b0eca9a to 2305a52 Compare March 17, 2022 06:50

aviatesk force-pushed the avi/abstractinline branch from 2305a52 to fb08dcb Compare March 17, 2022 06:58

aviatesk mentioned this pull request Mar 26, 2022

more robust test/examples for optimization analysis aviatesk/JET.jl#336

Merged

aviatesk force-pushed the avi/44705 branch from 8e61753 to c988bcc Compare March 28, 2022 03:34

aviatesk force-pushed the avi/abstractinline branch from 7a30a51 to c1306b9 Compare March 28, 2022 03:35

aviatesk force-pushed the avi/44705 branch from c988bcc to d5f5d52 Compare March 29, 2022 14:07

aviatesk force-pushed the avi/abstractinline branch from c1306b9 to 7d4cc1f Compare March 29, 2022 14:07

Base automatically changed from avi/44705 to master March 30, 2022 00:27

aviatesk force-pushed the avi/abstractinline branch from 7d4cc1f to 408c140 Compare March 30, 2022 01:14

aviatesk merged commit 3e1ae47 into master Mar 30, 2022

aviatesk deleted the avi/abstractinline branch March 30, 2022 11:46

ianatol mentioned this pull request Apr 27, 2022

Allow inlining methods with unmatched type parameters #45062

Merged

timholy mentioned this pull request Apr 27, 2022

Reduce invalidations from convert methods #45051

Closed

aviatesk mentioned this pull request Aug 31, 2022

Semi-concrete IR interpreter #44803

Merged

gbaraldi mentioned this pull request Aug 18, 2023

Union split can cause assertion in debug mode #50964

Closed

vtjnash mentioned this pull request Jan 2, 2024

Wrong call of DataType method with UnionAll argument causes segfault #52644

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimizer: inline abstract union-split callsite #44512

optimizer: inline abstract union-split callsite #44512

aviatesk commented Mar 8, 2022 •

edited

aviatesk commented Mar 8, 2022

nanosoldier commented Mar 8, 2022

aviatesk commented Mar 8, 2022

aviatesk Mar 8, 2022

nanosoldier commented Mar 8, 2022

aviatesk commented Mar 9, 2022

nanosoldier commented Mar 9, 2022

aviatesk commented Mar 14, 2022

nanosoldier commented Mar 14, 2022

Keno commented Mar 14, 2022

Keno commented Mar 14, 2022

aviatesk commented Mar 16, 2022

nanosoldier commented Mar 16, 2022

aviatesk commented Mar 17, 2022

aviatesk commented Mar 17, 2022

nanosoldier commented Mar 17, 2022

Keno commented Mar 17, 2022

nanosoldier commented Mar 24, 2022

nanosoldier commented Mar 24, 2022

vtjnash commented Mar 24, 2022

nanosoldier commented Mar 24, 2022

nanosoldier commented Mar 24, 2022

aviatesk commented Mar 24, 2022

nanosoldier commented Mar 25, 2022

nanosoldier commented Mar 25, 2022

aviatesk commented Mar 25, 2022

nanosoldier commented Mar 25, 2022

aviatesk commented Mar 25, 2022

aviatesk commented Mar 29, 2022

nanosoldier commented Mar 29, 2022

aviatesk commented Mar 30, 2022

timholy commented Apr 29, 2022

optimizer: inline abstract union-split callsite #44512

optimizer: inline abstract union-split callsite #44512

Conversation

aviatesk commented Mar 8, 2022 • edited

aviatesk commented Mar 8, 2022

nanosoldier commented Mar 8, 2022

aviatesk commented Mar 8, 2022

aviatesk Mar 8, 2022

Choose a reason for hiding this comment

nanosoldier commented Mar 8, 2022

aviatesk commented Mar 9, 2022

nanosoldier commented Mar 9, 2022

aviatesk commented Mar 14, 2022

nanosoldier commented Mar 14, 2022

Keno commented Mar 14, 2022

Keno commented Mar 14, 2022

aviatesk commented Mar 16, 2022

nanosoldier commented Mar 16, 2022

aviatesk commented Mar 17, 2022

aviatesk commented Mar 17, 2022

nanosoldier commented Mar 17, 2022

Keno commented Mar 17, 2022

nanosoldier commented Mar 24, 2022

nanosoldier commented Mar 24, 2022

vtjnash commented Mar 24, 2022

nanosoldier commented Mar 24, 2022

nanosoldier commented Mar 24, 2022

aviatesk commented Mar 24, 2022

nanosoldier commented Mar 25, 2022

nanosoldier commented Mar 25, 2022

aviatesk commented Mar 25, 2022

nanosoldier commented Mar 25, 2022

aviatesk commented Mar 25, 2022

aviatesk commented Mar 29, 2022

nanosoldier commented Mar 29, 2022

aviatesk commented Mar 30, 2022

timholy commented Apr 29, 2022

aviatesk commented Mar 8, 2022 •

edited