-
-
Notifications
You must be signed in to change notification settings - Fork 361
Description
I'm opening this to ask about what changes the devs here are willing to accept in order to reduce latency. As a case study, let's consider AbstractPlotting.draw_axis2d, which has a mere 43 arguments. If I run the test suite and then do
using MethodAnalysis
mis = methodinstances(AbstractPlotting.draw_axis2d)`it gives me 8 inferred MethodInstances. I've edited the output and stashed it in this gist, turning it into a compile-time benchmark. Here's what I get:
ulia> include("/tmp/draw_axis2d_demo.jl ")
0.000012 seconds (5 allocations: 976 bytes) # the all-Any MethodInstance
4.711192 seconds (11.49 M allocations: 681.750 MiB, 7.89% gc time, 100.00% compilation time)
0.000014 seconds (5 allocations: 976 bytes) # the mostly-Any MethodInstance
0.142401 seconds (232.42 k allocations: 12.758 MiB, 99.99% compilation time)
0.111768 seconds (138.38 k allocations: 6.993 MiB, 99.98% compilation time)
0.181831 seconds (349.69 k allocations: 19.633 MiB, 99.99% compilation time)
0.094713 seconds (176.63 k allocations: 9.740 MiB, 99.98% compilation time)
0.084915 seconds (105.29 k allocations: 5.205 MiB, 99.97% compilation time)
trueObviously, the first "real" one (not the all-Any) takes a lot of time because it's also compiling a bunch of dependent functions that can be partially reused on future compiles. But the noteworthy thing here is that later "real" instances each take ~100ms. And this is just one method (and its callees), one that is not horrifically huge by Makie/AbstractPlotting/Layout standards.
Let's try to get a more systematic view. This is complex, so let me just show you how to investigate this yourself. First, you need to be running the master branch of SnoopCompile. We're going to try this on a demo that matches the user experience, i.e., the first plot in a new session:
julia> using SnoopCompile, AbstractPlotting
julia> tinf = @snoopi_deep scatter(rand(10^4), rand(10^4))
Core.Compiler.Timings.Timing(InferenceFrameInfo for Core.Compiler.Timings.ROOT()) with 658 children
julia> tminfo = flatten_times(tinf);
julia> count_time = Dict{Method,Tuple{Int,Float64}}();
julia> for (t, info) in tminfo
mi = info.mi
if isa(mi.def, Method)
m = mi.def::Method
n, tm = get(count_time, m, (0, 0.0))
count_time[m] = (n + 1, tm + t)
end
end
julia> sort!(collect(count_time); by = pr -> pr.second[2])
2131-element Vector{Pair{Method, Tuple{Int64, Float64}}}:
< lots of output >This is a list of method => (number of instances, total time to compile *just* this method and not its callees) pairs. (To emphasize, that's different from what we measured above with draw_axis2d, where the time was inclusive of the compile time for the callees.) You'll see one object, ROOT, which measures all time outside of inference (including codegen). But more than half the time is inference.
I encourage you to run this yourself, there are some interesting takeaways. Here's a log plot of the time of each of these 2131 methods (measuring the self-inference time, in seconds):
You can see it's really dominated by a "few" bad players: there are 171 methods accounting for more than 0.01s of total inference time across all of their argument type combinations. Some of these are things like setproperty! which have hundreds of instances (and are expected to), but there are quite a few that are among the dominant ones that have relatively few MethodInstances (here I'm just showing the last, most dominant 171 methods):
Moreover, many of the ones that have too many instances to precompile might be precompiled by being precompiled into methods that have fewer instances (that works for as many calls as inference succeeds for, so YMMV).
There is a pretty natural solution, one I've discussed before: https://docs.julialang.org/en/v1/manual/style-guide/#Handle-excess-argument-diversity-in-the-caller. The idea is if you have
foo(x, y, z, ...)you can design it like this:
foo(x::TX, y::TY, z::TZ, ...) = # the "real" version, big and slow to compile
foo(x, y, z, ...) = foo(convert(TX, x)::TX, convert(TY, y)::TY, convert(TZ, z)::TZ, ...) # tiny and fast to compileObviously that's a pretty big change from the code base as it is right now. Certainly, the user's data might come in any number of variants, but a lot of AbstractPlotting's internals seems to consist of passing around data you've computed from it: things like axis limits, fonts, etc. These seem much more standardizable.
Anyway, I've decided this is more than I could tackle solo, so I'm not going to implement this without buy-in from Makie's developers. But if there is interest I'm happy to help, especially to teach the tools so that you can run these diagnostics yourself. I'm slowly getting towards a big new release of SnoopCompile, but I've realized I need a real-world test bed (beyond what I got in JuliaLang/julia#38906, since getting changes made to Base is probably off the table for many packages) and Makie seems like a good candidate.

