Presenter: restart kernel and clear output!

# Package development: improving engineering quality & latency

Tim Holy

Shuhei Kadowaki

JuliaCon 2021

# Part 1: background (a tutorial on Julia's inner workings)

In the first ~25min we'll introduce/review core features of Julia's design:

- methods, types, & dispatch
- runtime vs compiletime dispatch
- specialization and type inference
- MethodInstances
- invalidation, backedges, & recompilation
- "world-splitting"
- precompilation

Knowing something about these points is genuinely useful, but you do not have to master all to start improving packages.

We'll also introduce the [MethodAnalysis](https://github.com/timholy/MethodAnalysis.jl) package, which allows you to see a lot of this directly.

# Methods, types, & dispatch

```julia
myround(x::Integer) = x
myround(x::AbstractFloat) = round(x)
```

Key points:

- each definition is a separate method (use `methods(myround)` to see them all)
- methods differ in their type signatures and implementations (*implementation specialization*)
- Julia dispatches to the "most specific" matching method

# Compiler specialization and type inference

Many methods are written with abstract signatures:

```julia
myround(x::AbstractFloat) = round(x)
```

The actual bit-level implementation differs for `Float32` and `Float64`: Julia *must* generate different code.

Consequently, Julia also (automatically) performs *compiler specialization* whenever the method is invoked with a new concrete type:

```julia
myround(x::Float32)    # a `MethodInstance`, not a `Method`
myround(x::Float64)    # another `MethodInstance` generated from the same `Method`
```

In a multi-line function, there might be specializations needed for each call.

To support this need without requiring programmers to declare the type of each internal variable, Julia performs *type inference*. Effectively, it looks like this:

```julia
function mysum(list::Vector{Float32})
    s = zero(eltype(list::Vector{Float32})::Float32)::Float32
    for val::Float32 in list::Vector{Float32}
        s = (s::Float32 + val::Float32)::Float32
    end
    return s::Float32
end
```

Julia calculates the types of all the intermediates from the types of the input arguments.

# Runtime vs compiletime dispatch

A compiled function is a "blob" of native code living in a particular memory location.

Calling a function involves:
- preparing the arguments
- deciding *which* specific compiled blob to use. This is like looking up someone's phone number in the phone book. Julia literally scans through the method tables.

This decision can be made during *runtime* (when code is executing) or during *compiletime* (when Julia is compiling the function).

Schematic of a compiletime call in pseudo-Julia:
```julia
push!(execution_stack, args)
@goto compiled_blob_52383
```
(The blob will retrieve the argument values by [popping the execution stack](https://en.wikipedia.org/wiki/Call_stack).)

Schematic of a runtime call in pseudo-Julia:
```julia
# scan the method tables and their lists of compiled blobs for a match
# if the right blob hasn't been compiled yet, compile it now
blob = get_blob_for_argtypes(f, typesof(args))
# The rest looks the same as a compiletime call:
push!(execution_stack, args)
goto(blob)
```

An intermediate case is [Union-splitting](https://julialang.org/blog/2018/08/union-splitting/), where Julia can determine that there are only a few possible argument types:
```julia
argtypes = typesof(args)
push!(execution_stack, args)
if argtypes === Tuple{Int64,Bool}
    @goto compiled_blob_52383
else # the only other option is Tuple{Float64,Bool}
    @goto compiled_blob_52951
end
```
Note the absence of the need to call `get_blob_for_argtypes`. Union-splitting generalizes compiletime dispatch.

# "World-splitting"

*Note: this term is not in common use. It is intended to be reminiscent of Union-splitting.*

Julia will exploit the "state of the world" when evaluating the possibility for compiletime dispatch.

Suppose: 
- you have an internal variable `x` but Julia can't infer a concrete type for it
- Julia's next compilation task is to call `f(x)`
- you have one or a few methods with concrete signature (e.g., `f(x::Int)`)

Then Julia will hazard a guess that it will end up calling `f(x::Int)`:
```julia
push!(execution_stack, args)
if typesof(args) === Tuple{Int}
    @goto compiled_blob_39412
else
    # do the call by runtime dispatch
    ...
end
```

Julia tries to resolve as many dispatches as possible at compiletime using type information, otherwise the dispatch is delayed to runtime.

Looking up the proper blob at compiletime takes burden away from runtime, improving runtime performance.

Ballpark costs of runtime dispatch (depends on size of `f`'s method tables):
- single argument: 15-35ns
- two arguments: ~100ns
- ...


# MethodInstances

Compilation is *expensive*. You do not want to recompile the same method repeatedly for exactly the same types.

To eliminate recompilation (within a single Julia session), Julia *caches* the compiled code. These are `Core.MethodInstance`s (type-inferred code, mentioned above) and `Core.CodeInstance`s (native code). You can think of these caches as, e.g., `Dict(signature => methodinstance)`.

Julia makes it easy to perform introspection:

In [1]:
methods(searchsorted)

Using MethodAnalysis.jl, you can introspect further:

In [2]:
using MethodAnalysis
methodinstances(searchsorted)

3-element Vector{Core.MethodInstance}:
 MethodInstance for searchsorted(::[0mVector{String}, ::[0mString)
 MethodInstance for searchsorted(::[0mVector{String}, ::[0mString, ::[0mInt64, ::[0mInt64, ::[0mBase.Order.ForwardOrdering)
 MethodInstance for searchsorted(::[0mVector{String}, ::[0mString, ::[0mBase.Order.ForwardOrdering)

In [3]:
searchsorted(1:8, 7)

7:7

In [4]:
methodinstances(searchsorted)

5-element Vector{Core.MethodInstance}:
 MethodInstance for searchsorted(::[0mUnitRange{Int64}, ::[0mInt64, ::[0mBase.Order.ForwardOrdering)
 MethodInstance for searchsorted(::[0mVector{String}, ::[0mString)
 MethodInstance for searchsorted(::[0mUnitRange{Int64}, ::[0mInt64)
 MethodInstance for searchsorted(::[0mVector{String}, ::[0mString, ::[0mInt64, ::[0mInt64, ::[0mBase.Order.ForwardOrdering)
 MethodInstance for searchsorted(::[0mVector{String}, ::[0mString, ::[0mBase.Order.ForwardOrdering)

# Invalidation & backedges

Julia allows you to redefine a method. This *invalidates* all code that depended on the earlier `MethodInstance`s (discussed below).

To keep track of what needs to be invalidated, each `MethodInstance` keeps track of all *compiletime* callers.

These are called *backedges*:

In [7]:
mi = methodinstances(@which issorted([1, 3, 2]))[1]
println(mi)
direct_backedges(mi)               # also loaded from MethodAnalysis.jl

MethodInstance for issorted(::[0mVector{Symbol})


2-element Vector{Core.MethodInstance}:
 MethodInstance for unique!(::[0mVector{Symbol})
 MethodInstance for Parsers._precompile_()

In [6]:
all_backedges(mi)

530-element Vector{Core.MethodInstance}:
 MethodInstance for (::Main.anonymous.var"#1#5"{Set{String}, String})(::[0mString)
 MethodInstance for (::Core.var"#Type##kw")(::[0mAny, ::[0mType{LibGit2.RemoteCallbacks})
 MethodInstance for Pkg.API.develop(::[0mPkg.Types.Context, ::[0mVector{Pkg.Types.PackageSpec})
 MethodInstance for (::Pkg.Operations.var"#download_artifacts##kw")(::[0mNamedTuple{(:platform, :julia_version, :io), Tuple{Base.BinaryPlatforms.Platform, VersionNumber, Base.TTY}}, ::[0mtypeof(Pkg.Operations.download_artifacts), ::[0mPkg.Types.EnvCache)
 MethodInstance for Pkg.Types.registry_resolve!(::[0mVector{Pkg.Registry.RegistryInstance}, ::[0mVector{Pkg.Types.PackageSpec})
 MethodInstance for Pkg.Registry.verify_compressed_registry_toml(::[0mString)
 MethodInstance for Base.CoreLogging.current_logger_for_env(::[0mBase.CoreLogging.LogLevel, ::[0mNothing, ::[0mNothing)
 MethodInstance for Pkg.API.var"#precompile#221"(::[0mBool, ::[0mBool, ::[0mBase.Pairs{Symbo

In [8]:
using AbstractTrees     # MethodAnalysis.jl defines AbstractTree.jl methods for tree representation of backedges
print_tree(mi)

MethodInstance for issorted(::Vector{Symbol})
├─ MethodInstance for unique!(::Vector{Symbol})
│  ├─ MethodInstance for Base.CoreLogging.env_override_minlevel(::Symbol, ::Module)
│  │  └─ MethodInstance for Base.CoreLogging.current_logger_for_env(::LogLevel, ::Symbol, ::Module)
│  │     ├─ MethodInstance for Base.register_root_module(::Module)
│  │     │  └─ MethodInstance for Base._include_from_serialized(::String, ::Vector{Any})
│  │     │     ⋮
│  │     │     
│  │     ├─ MethodInstance for Base.include_package_for_output(::PkgId, ::String, ::Vector{String}, ::Vector{String}, ::Vector{String}, ::Vector{Pair{PkgId, UInt64}}, ::Nothing)
│  │     ├─ MethodInstance for Base.include_package_for_output(::PkgId, ::String, ::Vector{String}, ::Vector{String}, ::Vector{String}, ::Vector{Pair{PkgId, UInt64}}, ::String)
│  │     ├─ MethodInstance for Base.Docs.doc!(::Module, ::Binding, ::DocStr, ::Any)
│  │     ├─ MethodInstance for Base.Docs.doc!(::Module, ::Binding, ::DocStr, ::Type{Union{}})


│  │     │  │  
│  │     │  └─ MethodInstance for (::var"#download_verify##kw")(::NamedTuple{(:force, :verbose, :quiet_download), Tuple{Bool, Bool, Bool}}, ::typeof(download_verify), ::String, ::Nothing, ::String)
│  │     │     ⋮
│  │     │     
│  │     ├─ MethodInstance for Pkg.PlatformEngines.get_server_dir(::AbstractString, ::SubString{String})
│  │     │  ├─ MethodInstance for Pkg.PlatformEngines.get_server_dir(::AbstractString)
│  │     │  │  ⋮
│  │     │  │  
│  │     │  └─ MethodInstance for Pkg.PlatformEngines.get_metadata_headers(::AbstractString)
│  │     │     ⋮
│  │     │     
│  │     ├─ MethodInstance for Pkg.PlatformEngines.var"#get_auth_header#9"(::Bool, ::typeof(get_auth_header), ::AbstractString)
│  │     │  └─ MethodInstance for (::var"#get_auth_header##kw")(::NamedTuple{(:verbose,), Tuple{Bool}}, ::typeof(get_auth_header), ::AbstractString)
│  │     │     ⋮
│  │     │     
│  │     ├─ MethodInstance for Pkg.PlatformEngines.var"#download_verify#17"(::Bool, ::Bool, 

│  │     │  └─ MethodInstance for (::var"#download_verify##kw")(::NamedTuple{(:force, :verbose, :quiet_download), Tuple{Bool, Bool, Bool}}, ::typeof(download_verify), ::String, ::String, ::String)
│  │     │     ⋮
│  │     │     
│  │     ├─ MethodInstance for Pkg.PlatformEngines.var"#download_verify_unpack#23"(::Nothing, ::Bool, ::Bool, ::Bool, ::Bool, ::IO, ::typeof(download_verify_unpack), ::String, ::String, ::String)
│  │     │  └─ MethodInstance for (::var"#download_verify_unpack##kw")(::NamedTuple{(:ignore_existence, :verbose, :quiet_download, :io), _A} where _A<:Tuple{Bool, Bool, Bool, Any}, ::typeof(download_verify_unpack), ::String, ::String, ::String)
│  │     │     ⋮
│  │     │     
│  │     ├─ MethodInstance for Pkg.Artifacts.var"#download_artifact#19"(::Bool, ::Bool, ::IO, ::typeof(download_artifact), ::SHA1, ::String, ::String)
│  │     │  ├─ MethodInstance for (::var"#download_artifact##kw")(::NamedTuple{(:verbose, :quiet_download, :io), _A} where _A<:Tuple{Bool, Bool, 

│  │     ├─ MethodInstance for Pkg.PlatformEngines.var"#download_verify_unpack#23"(::Nothing, ::Bool, ::Bool, ::Bool, ::Bool, ::TTY, ::typeof(download_verify_unpack), ::String, ::String, ::String)
│  │     │  └─ MethodInstance for (::var"#download_verify_unpack##kw")(::NamedTuple{(:ignore_existence, :verbose, :quiet_download, :io), Tuple{Bool, Bool, Bool, TTY}}, ::typeof(download_verify_unpack), ::String, ::String, ::String)
│  │     │     ⋮
│  │     │     
│  │     ├─ MethodInstance for Pkg.Artifacts.var"#download_artifact#19"(::Bool, ::Bool, ::TTY, ::typeof(download_artifact), ::SHA1, ::String, ::String)
│  │     │  └─ MethodInstance for (::var"#download_artifact##kw")(::NamedTuple{(:verbose, :quiet_download, :io), Tuple{Bool, Bool, TTY}}, ::typeof(download_artifact), ::SHA1, ::String, ::Union{Nothing, String})
│  │     │     ⋮
│  │     │     
│  │     ├─ MethodInstance for Pkg.Operations.var"#install_archive#32"(::TTY, ::typeof(install_archive), ::Vector{Pair{String, Bool}}, ::SHA1,

It can get complicated quickly!

# Recompilation

If you redefine a `Method`, Julia will iterate through the `MethodInstance`s and their backedges and *invalidate* them all.

(For the experts: it does this by capping their *world age* at one less than the current age, making them uncallable in the future.)

The next time you call one of these methods, it must be recompiled:

In [9]:
f(x) = x^2
t = ntuple(identity, 15)
tstart = time(); map(f, t); time() - tstart

0.05223703384399414

(When you're measuring compile times, `@time` is dangerous because its arguments typically get compiled before the timer starts)

In [10]:
tstart = time(); map(f, t); time() - tstart   # fast on the second call!

0.0007140636444091797

In [11]:
f(x) = x^2            # redefinition
tstart = time(); map(f, t); time() - tstart   # slow again

0.042424917221069336

## Invalidation in "world-splitting"

But now suppose you define a new method `f(x::String)`. Suddenly the world is different, but the old compiled code doesn't acknowledge this.

Outcome: invalidation

Loading new code can invalidate old code. In egregious cases, it can invalidate *its own code for loading the next package* and needs to recompile the core loading machinery.

 Fixing most such cases was one of several contributions to reducing latency in Julia 1.6.

# Precompilation

Backedges have a second role: *precompilation*

Compilation is slow, so we cache the results. Why not make them available to the next session too? => cache to disk

But most `MethodInstance`s depend on a lot of other `MethodInstance`s, and it would be pretty useless if we only saved the top-level calls. Solution: cache the things you depend on too! 

When you "precompile" a package, Julia stores:
- lowered code
- type-inferred code
for all `Method`s defined in the package, and any needed `MethodInstance` that weren't already available.

Loading cached `MethodInstance`s takes some time. We can't cache every possible `MethodInstance`, so the package developer has to specify which ones should be cached.



Julia has a `precompile(f, argtypes)` function. But it forces type-inference, and does not directly save anything to disk.

Implication: execution during package build has similar consequences to calling `precompile`, as far as what ends up in the cache.

# Main takeaways

- method lookup is slow
- knowing all types allows Julia to move lookup to compiletime
- Julia exploits this by specializing methods for specific argument types (many compiled blobs for a single method)
- compilation is slow, so caching is desirable
- Julia's dynamism necessitates cache invalidation and recompilation


## Looking ahead

Compilation presents an opportunity for detailed introspection and analysis.

A recent development are packages that exploit this to analyze "code quality" and contributions to latency.

The rest of this workshop will teach you how to take advantage of this opportunity.