SSAIR: improve inlining performance with in-place IR-inflation #45404

aviatesk · 2022-05-21T05:01:24Z

This commit improves the performance of a huge hot-spot within inflate_ir
by using the in-place version of it (inflate_ir!) and avoiding some
unnecessary allocations.
For NativeInterpreter, CodeInfo-IR passed to inflate_ir can come
from two ways:

from global cache: uncompressed from compressed format
from local cache: inferred CodeInfo as-is managed by InferenceResult

And in the case of 1, an uncompressed CodeInfo is an newly-allocated
object already and thus we can use the in-place version safely. And it
turns out that this helps us avoid many unnecessary allocations.
The original non-destructive inflate_ir remains there for testing or
interactive purpose.

@nanosoldier runbenchmarks("inference", vs=":master")

nanosoldier · 2022-05-21T05:33:14Z

Your benchmark job has completed - no performance regressions were detected. A full report can be found here.

Keno · 2022-05-21T05:48:40Z

This seems reasonable to me. I usually profile with Cthulhu when I profile inference, which doesn't go through the compression/decompression cycle, so I don't have a good idea of the usual performance impact, but the nanosolider results look favorable, so I'm in favor. My only concern would be ending up with corrupted CodeInfos in some of the generated function cases. I've run into hard to debug bugs of that kind before, but certainly for the compression case, it's a no brainer to do the inflation destructively. Eventually we may want to even skip the decompression step and just inline directly from the compressed representation as we go along, but that's obviously future work.

aviatesk · 2022-05-21T06:42:14Z

My only concern would be ending up with corrupted CodeInfos in some of the generated function cases.

For the uncompressed case, this PR makes a copy of that CodeInfo and so I think it is also safe. Maybe CodeInstance.inferred manages CodeInfo directly for generated function case? (EDIT: that case happens for external AbstractInterpreters whose may_compress return false) It'd be handled in the same way anyway though.

julia/base/compiler/ssair/inlining.jl

Lines 916 to 926 in f34c577

    
           function InliningTodo(mi::MethodInstance, src::Union{CodeInfo, Vector{UInt8}}, effects::Effects) 
        
               if !isa(src, CodeInfo) 
        
                   src = ccall(:jl_uncompress_ir, Any, (Any, Ptr{Cvoid}, Any), mi.def, C_NULL, src::Vector{UInt8})::CodeInfo 
        
               else 
        
                   src = copy(src) 
        
               end 
        
               @timeit "inline IR inflation" begin 
        
                   ir = inflate_ir!(src, mi)::IRCode 
        
                   return InliningTodo(mi, ResolvedInliningSpec(ir, effects)) 
        
               end 
        
           end

@nanosoldier runtests(ALL, vs = ":master")

This commit improves the performance of a huge hot-spot within `inflate_ir` by using the in-place version of it (`inflate_ir!`) and avoiding some unnecessary allocations. For `NativeInterpreter`, `CodeInfo`-IR passed to `inflate_ir` can come from two ways: 1. from global cache: uncompressed from compressed format 2. from local cache: inferred `CodeInfo` as-is managed by `InferenceResult` And in the case of 1, an uncompressed `CodeInfo` is an newly-allocated object already and thus we can use the in-place version safely. And it turns out that this helps us avoid many unnecessary allocations. The original non-destructive `inflate_ir` remains there for testing or interactive purpose.

nanosoldier · 2022-05-21T19:57:06Z

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

aviatesk · 2022-05-21T21:00:42Z

@nanosoldier runtests(["AStarSearch", "ArchGDAL", "AugmentedGPLikelihoods", "AutocorrelationShell", "BigO", "BioStructures", "ClimateModels", "CollectiveSpins", "ConceptnetNumberbatch", "CovarianceMatrices", "CrypticCrosswords", "CryptoGroups", "DarkCurves", "ExactOptimalTransport", "FSimZoo", "Faker", "Ferrite", "GrayCoding", "Individual", "InfrastructureSystems", "Ipaper", "JSONSchema", "Karnak", "MDInclude", "MixedModelsPermutations", "NonconvexIpopt", "OhMyREPL", "PoissonRandom", "PolynomialBases", "PowerSimulations", "QuantumAlgebra", "SimpleGraphs", "SimpleTweaks", "SkipLists", "StringDistances", "Surrogates", "ToeplitzMatrices", "VertexFinder", "VoxelRayTracers", "BasicInterpolators", "BayesianQuadrature", "BitSAD", "BundlerIO", "CiteEXchange", "Evolutionary", "GraphMLDatasets", "JuliaCon", "Probably", "QuantumTomography", "ReadVTK", "Retriever", "SBML", "StochasticDelayDiffEq", "ImageTracking", "Tapestree"], vs = ":master")

nanosoldier · 2022-05-22T04:36:24Z

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

aviatesk · 2022-05-23T02:05:45Z

@nanosoldier runtests(["JSONSchema", "ToeplitzMatrices", "Tapestree"], vs = ":master")

nanosoldier · 2022-05-23T13:48:52Z

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

aviatesk · 2022-05-23T13:55:06Z

Just confirmed the JSONSchema test suite runs successfully on my local machine on this branch. Going to merge.

This commit improves the performance of a huge hot-spot within `inflate_ir` by using the in-place version of it (`inflate_ir!`) and avoiding some unnecessary allocations. For `NativeInterpreter`, `CodeInfo`-IR passed to `inflate_ir` can come from two ways: 1. from global cache: uncompressed from compressed format 2. from local cache: inferred `CodeInfo` as-is managed by `InferenceResult` And in the case of 1, an uncompressed `CodeInfo` is an newly-allocated object already and thus we can use the in-place version safely. And it turns out that this helps us avoid many unnecessary allocations. The original non-destructive `inflate_ir` remains there for testing or interactive purpose.

aviatesk requested a review from Keno May 21, 2022 05:01

aviatesk force-pushed the avi/inflate_ir branch from f34c577 to 4afac7b Compare May 21, 2022 08:14

Merge branch 'master' into avi/inflate_ir

e32beae

Merge branch 'master' into avi/inflate_ir

0bd5a18

aviatesk merged commit 335a9d8 into master May 23, 2022

aviatesk deleted the avi/inflate_ir branch May 23, 2022 13:55

aviatesk mentioned this pull request May 24, 2022

inference: refactor the core loops to use less memory #45276

Merged

aviatesk added a commit that referenced this pull request May 31, 2022

add news entry for #45276 and #45404

0bb78e5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SSAIR: improve inlining performance with in-place IR-inflation #45404

SSAIR: improve inlining performance with in-place IR-inflation #45404

aviatesk commented May 21, 2022

nanosoldier commented May 21, 2022

Keno commented May 21, 2022

aviatesk commented May 21, 2022 •

edited

Loading

nanosoldier commented May 21, 2022

aviatesk commented May 21, 2022

nanosoldier commented May 22, 2022

aviatesk commented May 23, 2022

nanosoldier commented May 23, 2022

aviatesk commented May 23, 2022

SSAIR: improve inlining performance with in-place IR-inflation #45404

SSAIR: improve inlining performance with in-place IR-inflation #45404

Conversation

aviatesk commented May 21, 2022

nanosoldier commented May 21, 2022

Keno commented May 21, 2022

aviatesk commented May 21, 2022 • edited Loading

nanosoldier commented May 21, 2022

aviatesk commented May 21, 2022

nanosoldier commented May 22, 2022

aviatesk commented May 23, 2022

nanosoldier commented May 23, 2022

aviatesk commented May 23, 2022

aviatesk commented May 21, 2022 •

edited

Loading