Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSAIR: improve inlining performance with in-place IR-inflation #45404

Merged
merged 3 commits into from
May 23, 2022

Conversation

aviatesk
Copy link
Sponsor Member

This commit improves the performance of a huge hot-spot within inflate_ir
by using the in-place version of it (inflate_ir!) and avoiding some
unnecessary allocations.
For NativeInterpreter, CodeInfo-IR passed to inflate_ir can come
from two ways:

  1. from global cache: uncompressed from compressed format
  2. from local cache: inferred CodeInfo as-is managed by InferenceResult

And in the case of 1, an uncompressed CodeInfo is an newly-allocated
object already and thus we can use the in-place version safely. And it
turns out that this helps us avoid many unnecessary allocations.
The original non-destructive inflate_ir remains there for testing or
interactive purpose.


@nanosoldier runbenchmarks("inference", vs=":master")

@aviatesk aviatesk requested a review from Keno May 21, 2022 05:01
@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - no performance regressions were detected. A full report can be found here.

@Keno
Copy link
Member

Keno commented May 21, 2022

This seems reasonable to me. I usually profile with Cthulhu when I profile inference, which doesn't go through the compression/decompression cycle, so I don't have a good idea of the usual performance impact, but the nanosolider results look favorable, so I'm in favor. My only concern would be ending up with corrupted CodeInfos in some of the generated function cases. I've run into hard to debug bugs of that kind before, but certainly for the compression case, it's a no brainer to do the inflation destructively. Eventually we may want to even skip the decompression step and just inline directly from the compressed representation as we go along, but that's obviously future work.

@aviatesk
Copy link
Sponsor Member Author

aviatesk commented May 21, 2022

My only concern would be ending up with corrupted CodeInfos in some of the generated function cases.

For the uncompressed case, this PR makes a copy of that CodeInfo and so I think it is also safe. Maybe CodeInstance.inferred manages CodeInfo directly for generated function case? (EDIT: that case happens for external AbstractInterpreters whose may_compress return false) It'd be handled in the same way anyway though.

function InliningTodo(mi::MethodInstance, src::Union{CodeInfo, Vector{UInt8}}, effects::Effects)
if !isa(src, CodeInfo)
src = ccall(:jl_uncompress_ir, Any, (Any, Ptr{Cvoid}, Any), mi.def, C_NULL, src::Vector{UInt8})::CodeInfo
else
src = copy(src)
end
@timeit "inline IR inflation" begin
ir = inflate_ir!(src, mi)::IRCode
return InliningTodo(mi, ResolvedInliningSpec(ir, effects))
end
end


@nanosoldier runtests(ALL, vs = ":master")

This commit improves the performance of a huge hot-spot within `inflate_ir`
by using the in-place version of it (`inflate_ir!`) and avoiding some
unnecessary allocations.
For `NativeInterpreter`, `CodeInfo`-IR passed to `inflate_ir` can come
from two ways:
1. from global cache: uncompressed from compressed format
2. from local cache: inferred `CodeInfo` as-is managed by `InferenceResult`

And in the case of 1, an uncompressed `CodeInfo` is an newly-allocated
object already and thus we can use the in-place version safely. And it
turns out that this helps us avoid many unnecessary allocations.
The original non-destructive `inflate_ir` remains there for testing or
interactive purpose.
@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

@aviatesk
Copy link
Sponsor Member Author

@nanosoldier runtests(["AStarSearch", "ArchGDAL", "AugmentedGPLikelihoods", "AutocorrelationShell", "BigO", "BioStructures", "ClimateModels", "CollectiveSpins", "ConceptnetNumberbatch", "CovarianceMatrices", "CrypticCrosswords", "CryptoGroups", "DarkCurves", "ExactOptimalTransport", "FSimZoo", "Faker", "Ferrite", "GrayCoding", "Individual", "InfrastructureSystems", "Ipaper", "JSONSchema", "Karnak", "MDInclude", "MixedModelsPermutations", "NonconvexIpopt", "OhMyREPL", "PoissonRandom", "PolynomialBases", "PowerSimulations", "QuantumAlgebra", "SimpleGraphs", "SimpleTweaks", "SkipLists", "StringDistances", "Surrogates", "ToeplitzMatrices", "VertexFinder", "VoxelRayTracers", "BasicInterpolators", "BayesianQuadrature", "BitSAD", "BundlerIO", "CiteEXchange", "Evolutionary", "GraphMLDatasets", "JuliaCon", "Probably", "QuantumTomography", "ReadVTK", "Retriever", "SBML", "StochasticDelayDiffEq", "ImageTracking", "Tapestree"], vs = ":master")

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

@aviatesk
Copy link
Sponsor Member Author

@nanosoldier runtests(["JSONSchema", "ToeplitzMatrices", "Tapestree"], vs = ":master")

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

@aviatesk
Copy link
Sponsor Member Author

Just confirmed the JSONSchema test suite runs successfully on my local machine on this branch. Going to merge.

@aviatesk aviatesk merged commit 335a9d8 into master May 23, 2022
@aviatesk aviatesk deleted the avi/inflate_ir branch May 23, 2022 13:55
pchintalapudi pushed a commit that referenced this pull request May 25, 2022
This commit improves the performance of a huge hot-spot within `inflate_ir`
by using the in-place version of it (`inflate_ir!`) and avoiding some
unnecessary allocations.
For `NativeInterpreter`, `CodeInfo`-IR passed to `inflate_ir` can come
from two ways:
1. from global cache: uncompressed from compressed format
2. from local cache: inferred `CodeInfo` as-is managed by `InferenceResult`

And in the case of 1, an uncompressed `CodeInfo` is an newly-allocated
object already and thus we can use the in-place version safely. And it
turns out that this helps us avoid many unnecessary allocations.
The original non-destructive `inflate_ir` remains there for testing or
interactive purpose.
aviatesk added a commit that referenced this pull request May 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants