Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding and reducing invalidation caused by CxxWrap #278

Open
fingolfin opened this issue Jan 6, 2021 · 4 comments
Open

Understanding and reducing invalidation caused by CxxWrap #278

fingolfin opened this issue Jan 6, 2021 · 4 comments

Comments

@fingolfin
Copy link
Contributor

So I run into https://discourse.julialang.org/t/new-tools-for-reducing-compiler-latency/52882 today and started reading https://julialang.org/blog/2021/01/precompile_tutorial/ as well as https://timholy.github.io/SnoopCompile.jl/stable/snoopr/#invalidations

Still got a lot to learn, but some basic tests already seem to indicate that CxxWrap has a lot room for improvement there -- note that this is not improvements that affect that load time of CxxWrap, but rather of packages using it. While that already got a lot better in Julia 1.6/1.7, there is still room to improvement.

Here's a sample run, listing 8226 invalidations:

$ julia-master
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.7.0-DEV.219 (2021-01-05)
 _/ |\__'_|_|_|\__'_|  |  Commit 399f8ba175* (0 days old master)
|__/                   |

julia> using SnoopCompileCore

julia> invalidations = @snoopr using CxxWrap;

julia> length(invalidations)
8226

julia> using SnoopCompile

julia> trees = invalidation_trees(invalidations);

julia> trees[end]
inserting convert(to_type::Type{Any}, x::CxxWrap.CxxWrapCore.SmartPointer{DerivedT}) where DerivedT in CxxWrap.CxxWrapCore at /Users/mhorn/.julia/packages/CxxWrap/ZOkSN/src/CxxWrap.jl:272 invalidated:
   backedges: 1: superseding convert(::Type{Any}, x) in Base at essentials.jl:204 with MethodInstance for convert(::Type{Any}, ::Any) (1229 children)

Turns out the top offender (which is at the end, and shown above) is a convert method for converting to Any, which was already removed by my PR #265 in September -- but apparently no CxxWrap release was made since then, meaning this performance improvement has not yet been released to the general public.

Again, fixing this may not affect CxxWrap's startup speed, but it does impact other code (see e.g. the issue linked in PR #265).

Here's another bad one:

julia> method_invalidations = trees[9]
inserting convert(::Type{T1}, p::CxxWrap.CxxWrapCore.SmartPointer{DerivedT}) where {BaseT, T1<:BaseT, DerivedT<:BaseT} in CxxWrap.CxxWrapCore at /Users/mhorn/.julia/packages/CxxWrap/ZOkSN/src/CxxWrap.jl:269 invalidated:
   mt_backedges:   1: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, Base.IRShow.var"#33#35"}(::Any, ::Any) (0 children)
                   2: signature Tuple{typeof(convert), Type{Base.IRShow.var"#33#35"}, Any} triggered
...
                 379: signature Tuple{typeof(convert), Type{REPL.LineEdit.ModeState}, Any} triggered MethodInstance for setindex!(::IdDict{REPL.LineEdit.TextInterface, REPL.LineEdit.ModeState}, ::Any, ::Any) (66 children)
                 380: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for merge(::NamedTuple{(), Tuple{}}, ::Base.Iterators.Pairs) (255 children)

This is caused by the following method:

function Base.convert(::Type{T1}, p::SmartPointer{DerivedT}) where {BaseT,T1 <: BaseT, DerivedT <: BaseT}
  return cxxupcast(T1, p[])[]
end

If I am not mistaken, one possible choice for BaseT is Any, so the above code is effectively equivalent to this:

function Base.convert(::Type{S}, p::SmartPointer{T}) where {S, T}
  return cxxupcast(S, p[])[]
end

And so in particular, it also covers this special case, which seem highly problematic for the same reasons as discussed in issue #258, and beyond:

function Base.convert(::Type{Any}, p::SmartPointer{T}) where {T}
  return cxxupcast(Any, p[])[]
end
@timholy
Copy link

timholy commented Jan 28, 2021

If you're interested in fixing these, one thing I've noted is that many people seem to think the fix is in the "offending" package. Occasionally that's true (#265 was a good example and an excellent change), but most often the fix lies in the target. Basically, anything that has poor inferrability is a vulnerable target, and improving inferrability makes it less vulnerable.

@fingolfin
Copy link
Contributor Author

Unfortunately PR #265 was reverted in 36c7f3a to fix issue #301.

@fingolfin
Copy link
Contributor Author

CxxWrap still seems to cause (??) lots of invalidations, e.g. in OpenBLAS32_jll but also Nemo and Hecke. E.g. loading Hecke without CxxWrap in current Julia master takes 1.8 seconds for me:

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.0-DEV.60 (2022-11-28)
 _/ |\__'_|_|_|\__'_|  |  Commit ea23403f11* (0 days old master)
|__/                   |

julia> ENV["HECKE_PRINT_BANNER"]=false ; @time @time_imports using Hecke
      0.2 ms  Requires
     11.7 ms  RandomExtensions
      2.7 ms  GroupsCore
      8.9 ms  MacroTools
    207.2 ms  AbstractAlgebra
     17.8 ms  Preferences
      0.3 ms  JLLWrappers
      0.9 ms  GMP_jll
      0.5 ms  MPFR_jll
      2.9 ms  OpenBLAS32_jll 74.14% compilation time
     10.4 ms  FLINT_jll 97.09% compilation time
      0.6 ms  Arb_jll
      0.4 ms  Antic_jll
      0.4 ms  Calcium_jll
    416.5 ms  Nemo 9.88% compilation time
   1077.7 ms  Hecke 3.82% compilation time (1% recompilation)
  1.814945 seconds (5.17 M allocations: 346.394 MiB, 7.16% gc time, 5.60% compilation time: <1% of which was recompilation)

If I load CxxWrap it goes up to 3.3 seconds:

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.0-DEV.60 (2022-11-28)
 _/ |\__'_|_|_|\__'_|  |  Commit ea23403f11* (0 days old master)
|__/                   |

julia> ENV["HECKE_PRINT_BANNER"]=false ; using CxxWrap ; @time @time_imports using Hecke
      0.2 ms  Requires
     13.0 ms  RandomExtensions
      2.9 ms  GroupsCore
    210.0 ms  AbstractAlgebra
      0.9 ms  GMP_jll
      0.6 ms  MPFR_jll
    179.1 ms  OpenBLAS32_jll 116.09% compilation time (86% recompilation)
      6.3 ms  FLINT_jll 86.51% compilation time
      0.5 ms  Arb_jll
      0.3 ms  Antic_jll
      0.4 ms  Calcium_jll
    523.1 ms  Nemo 26.01% compilation time (82% recompilation)
   1201.0 ms  Hecke 5.79% compilation time (78% recompilation)
  3.289684 seconds (7.87 M allocations: 529.053 MiB, 4.65% gc time, 53.76% compilation time: 79% of which was recompilation)

Looking at invalidations, in my report above, we had 8226 from loading CxxWrap. Now with Julia 1.8.3, it is 20750 (!), with 1.10 it is down to "just" 14284:

julia> using SnoopCompileCore

julia> invalidations = @snoopr using CxxWrap;

julia> length(invalidations)
14286

julia> invalidations2 = @snoopr using Hecke;

julia> length(invalidations2)
51128

julia> using SnoopCompile

julia> trees = invalidation_trees(invalidations);

julia> trees2 = invalidation_trees(invalidations2);

julia> trees[end]
inserting convert(::Type{Any}, x::CxxWrap.CxxWrapCore.SmartPointer) @ CxxWrap.CxxWrapCore ~/.julia/packages/CxxWrap/IdOJa/src/CxxWrap.jl:281 invalidated:
   backedges: 1: superseding convert(::Type{Any}, x) @ Base Base.jl:63 with MethodInstance for convert(::Type{Any}, ::Any) (5455 children)
   8 mt_cache


julia> trees2[end]
inserting ==(a::Union{ConstCxxRef, CxxRef}, b) @ CxxWrap.CxxWrapCore ~/.julia/packages/CxxWrap/IdOJa/src/CxxWrap.jl:240 invalidated:
   mt_backedges:  1: signature ==(x, y) @ Base Base.jl:127 (formerly ==(x, y) @ Base Base.jl:127) triggered MethodInstance for RandomExtensions.make_argument(::Expr) (0 children)
                  2: signature ==(x, y) @ Base Base.jl:127 (formerly ==(x, y) @ Base Base.jl:127) triggered MethodInstance for RandomExtensions.make_argument(::Any) (0 children)
                  3: signature ==(x, y) @ Base Base.jl:127 (formerly ==(x, y) @ Base Base.jl:127) triggered MethodInstance for isequal(::Any, ::Symbol) (1 children)
                  4: signature ==(x, y) @ Base Base.jl:127 (formerly ==(x, y) @ Base Base.jl:127) triggered MethodInstance for AbstractAlgebra.PrettyPrinting.canonicalizePlus(::Expr) (0 children)
                  5: signature ==(x, y) @ Base Base.jl:127 (formerly ==(x, y) @ Base Base.jl:127) triggered MethodInstance for AbstractAlgebra.PrettyPrinting.canonicalize(::Expr) (578 children)
                  6: signature ==(x, y) @ Base Base.jl:127 (formerly ==(x, y) @ Base Base.jl:127) triggered MethodInstance for isequal(::Any, ::AlgAss{gfp_fmpz_elem}) (1 children)
                  7: signature ==(x, y) @ Base Base.jl:127 (formerly ==(x, y) @ Base Base.jl:127) triggered MethodInstance for AbstractAlgebra.PrettyPrinting.isaExprOp(::Expr, ::Symbol) (0 children)
                  8: signature ==(x, y) @ Base Base.jl:127 (formerly ==(x, y) @ Base Base.jl:127) triggered MethodInstance for !=(::Any, ::AlgAss{gfp_elem}) (2 children)
                  9: signature ==(x, y) @ Base Base.jl:127 (formerly ==(x, y) @ Base Base.jl:127) triggered MethodInstance for !=(::Any, ::AlgAss{gfp_fmpz_elem}) (2 children)
                 10: signature ==(x, y) @ Base Base.jl:127 (formerly ==(x, y) @ Base Base.jl:127) triggered MethodInstance for AbstractAlgebra.PrettyPrinting.isaExprOp(::Any, ::Symbol) (2 children)
                 11: signature ==(x, y) @ Base Base.jl:127 (formerly ==(x, y) @ Base Base.jl:127) triggered MethodInstance for isequal(::Any, ::AlgAss{gfp_elem}) (1 children)
                 12: signature ==(x, y) @ Base Base.jl:127 (formerly ==(x, y) @ Base Base.jl:127) triggered MethodInstance for AbstractAlgebra.PrettyPrinting.canonicalizePlusFinal!(::Expr) (0 children)
                 13: signature ==(x, y) @ Base Base.jl:127 (formerly ==(x, y) @ Base Base.jl:127) triggered MethodInstance for AbstractAlgebra.PrettyPrinting.canonicalizeTimes(::Expr) (0 children)
                 14: signature ==(x, y) @ Base Base.jl:127 (formerly ==(x, y) @ Base Base.jl:127) triggered MethodInstance for AbstractAlgebra.PrettyPrinting.isaExprOp(::Expr, ::Symbol, ::Int64) (1 children)
                 15: signature ==(x, y) @ Base Base.jl:127 (formerly ==(x, y) @ Base Base.jl:127) triggered MethodInstance for AbstractAlgebra.PrettyPrinting.canonicalizeMinus(::Expr) (0 children)

These main invalidations go away with this patch (down to 1957):

diff --git a/src/CxxWrap.jl b/src/CxxWrap.jl
index 5847ba8..4eb8ed6 100644
--- a/src/CxxWrap.jl
+++ b/src/CxxWrap.jl
@@ -272,13 +272,13 @@ Base.setindex!(x::CxxBaseRef, val, i::Int) = Base.setindex!(x[], val, i)
 
 Base.convert(::Type{RT}, p::SmartPointer{T}) where {T, RT <: CxxBaseRef{T}} = p[]
 Base.cconvert(::Type{RT}, p::SmartPointer{T}) where {T, RT <: CxxBaseRef{T}} = p[]
-function Base.convert(::Type{T1}, p::SmartPointer) where {T1}
-  return cxxupcast(T1, p[])[]
-end
+#function Base.convert(::Type{T1}, p::SmartPointer) where {T1}
+#  return cxxupcast(T1, p[])[]
+#end
 function Base.convert(to_type::Type{<:Ref{T1}}, p::SmartPointer) where {T1}
   return to_type(convert(T1,p))
 end
-Base.convert(::Type{Any}, x::SmartPointer) = x
+#Base.convert(::Type{Any}, x::SmartPointer) = x
 
 Base.unsafe_convert(to_type::Type{<:CxxBaseRef}, x) = to_type(x.cpp_object)
 

Now of course this can't just be applied as-is, as it likely breaks some existing code... but I can't help to wonder if it is really a good idea to have all these automagical conversion if we pay the price for them in invalidations?

@fingolfin
Copy link
Contributor Author

Just as a minor update, the number of invalidations reported by running the code in my initial comment on this issue is as follows right now:

Julia 1.8 Julia master (a40e24e1)
CxxWrap main 20819 13863
with #335 21265 13735
with #337 20695 13785
with #335 + #337 21269 13657
with #338 2695 1480
with #337 + #338 2543 1402
with all three 3541 1278

I've made all three PRs based on tests done on 1.10. Now that I "benchmarked" them also against 1.8, it is obvious that PR #335 has a negative impact there. I have not yet tried to find out what causes this. But at least #337 and #338 have a positive impact both on their own and together, and also in both Julia 1.8.4 and Julia master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants