# Optimizing Performance (Single-Core)

At the heart of fast parallel code must be fast serial code. Parallelism can make a good serial code faster. But it can also make a bad code even worse. One can write terribly slow code in any language, including Julia. In this notebook we want to understand what makes Julia code slow and how to detect and avoid common pitfalls. This will lead to multiple concrete performance tips that will help you speed up your Julia code and to write more efficient code in the first place.

By far the most common reasons for slow Julia code are

* **break-down of type inference** ("type instabilities")
* **(unnecessary) allocations**

Once you have those under control you might want to care about

* **memory access optimizations** (spatial and temporal locality)
* **SIMD** (single-instruction multiple data)
* etc.

## Type inference

* **Type stability**: A function `f` is type stable if for a given set of input argument types the return type is always the same. In particular, it means that the type of the output of `f` cannot vary depending on the **values** of the inputs.


* **Type instability**: The return type of a function `f` is not predictable just from the type of the input arguments alone.

Instructive example: `f() = rand((1.23, 100, 1f0, "string"))`

More loosely speaking:
* type instability == `Any`'s appearing unexpectedly in `@code_warntype` output.

In [None]:
f() = rand((1.23, 100, 1f0, "string"))

In [None]:
f()

In [None]:
@code_warntype f()

### Example: Global scope

A typical cause of type instability are global variables. From a compiler perspective, variables defined in global scope **can change their value and even their type(!) any time**.

In [None]:
a_glob = 2.0
b_glob = 3.0

f() = 2 * a_glob + b_glob

In [None]:
f()

In [None]:
@code_llvm f()

In [None]:
@code_warntype f()

#### Fix 1: Make globals `const`ant

In [None]:
const a_glob_const = 2.0
const b_glob_const = 3.0

f() = 2 * a_glob_const + b_glob_const

f()

In [None]:
@code_warntype f() # type stable!

In [None]:
@code_llvm f()

This is fast. In fact, it's not just fast, but **as fast as it can be**! Julia has figured out the result of the calculation at compile-time.

#### Fix 2: Write self-contained functions

In [None]:
f(a,b) = 2a+b

In [None]:
@code_warntype f(2.0,3.0)

## Type inference: Avoid abstract field types

A common reason for type inference to break are **not-concretely typed fields** in `struct`s.

### Example

In [None]:
using BenchmarkTools

In [None]:
struct MyType
    x::Number
    y
end

f(a::MyType) = a.x^2 + sqrt(a.x)

In [None]:
a = MyType(3.0, "test")

@code_warntype f(a);

In [None]:
@btime f($a);

In [None]:
typeof(a)

#### Fix 1: Concrete typing

In [None]:
struct MyTypeConcrete
    x::Float64
    y::String
end

f(b::MyTypeConcrete) = b.x^2 + sqrt(b.x)

In [None]:
b = MyTypeConcrete(3.0, "test")
@code_warntype f(b)

In [None]:
@btime f($b);

#### Fix 2: Type parameters

But what if I want to accept any kind of, say, `Number` and `AbstractString` for our type?

In [None]:
struct MyTypeParametric{A<:Number, B<:AbstractString}
    x::A
    y::B
end

f(c::MyTypeParametric) = c.x^2 + sqrt(c.x)

In [None]:
c = MyTypeParametric(3.0, "test")

In [None]:
@code_warntype f(c)

From the type alone the compiler knows what the structure contains and can produce optimal code:

In [None]:
@btime f($c);

In [None]:
c = MyTypeParametric(Float32(3.0), SubString("test"))

In [None]:
@btime f($c);

## Type inference: Avoid untyped containers

### Example

In [None]:
function f(n)
    numbers = []
    for i in 1:n
        push!(numbers, i)
    end
    sum(numbers)
end

@btime f(100);

In [None]:
@code_warntype f(100)

In [None]:
typeof([])

#### Fix: Provide element type

In [None]:
function f_fixed(n)
    numbers = Int[]
    for i in 1:n
        push!(numbers, i)
    end
    sum(numbers)
end

@btime f_fixed(100);

In [None]:
@code_warntype f_fixed(100)

## Type inference: Avoid changing variable types

Variables should not change type.

### Example

In [None]:
function f()
    x = 1
    for i = 1:10
        x /= rand()
    end
    return x
end

In [None]:
@code_warntype f()

(On a side note: since the type can only vary between `Float64` and `Int64`, Julia can still produce reasonable code by [*union splitting*](https://julialang.org/blog/2018/08/union-splitting).)

#### Fix: Initialize with correct type

In [None]:
function f()
    x = 1.0
    for i = 1:10
        x /= rand()
    end
    return x
end

In [None]:
@code_warntype f()

## Type inference: Isolate unavoidable type instabilities

Type instabilities can occur very naturally, for example when reading unknown user files or user input. Hence, not every instability can be avoided.

If that's the case, isolate your expensive computation from the instability by putting it in a separate *kernel function* (also known as introducing a *function barrier*).

In [None]:
data = [rand((2, 3.4, 5.6f0, "7.8")) for i in 1:100] # random heterogeneous input data

In [None]:
function expsin_all(data)
    result = zeros(length(data))
    for i in eachindex(data)
        x = data[i]
        if data[i] isa AbstractString
            x = parse(Float64, x)
        end
        result[i] = exp(sin(x))
    end
    return result
end

In [None]:
@code_warntype expsin_all(data)

In [None]:
@btime expsin_all($data);

In [None]:
function expsin_all_barrier(data)
    result = zeros(length(data))
    for i in eachindex(data)
        x = data[i]
        result[i] = _expsin_kernel(x)
    end
    return result
end

# function barrier
_expsin_kernel(x::Number)::Float64 = exp(sin(x))
_expsin_kernel(x::AbstractString)::Float64 = exp(sin(parse(Float64, x)))

In [None]:
@btime expsin_all_barrier($data);

In [None]:
@code_warntype _expsin_kernel(data[1])

Note that the computational kernel function is fully type inferred.

In [None]:
@code_warntype expsin_all_barrier(data)

## Side note: [JET.jl](JET.jl](https://github.com/aviatesk/JET.jl)

**Static** code analyzer. (Doesn't execute the code!)

Important macros:
* `@report_opt`: check for potential optimization problems ([optimization analysis](https://aviatesk.github.io/JET.jl/stable/optanalysis/))
* `@report_call`: check for potential (general) errors ([error analysis](https://aviatesk.github.io/JET.jl/stable/jetanalysis/))

In [None]:
using JET

In [None]:
@report_opt expsin_all(data)

# Core messages of this Notebook

* **Wrap code in self-contained functions** in performance critical applications, i.e. avoid global scope.
* Write **type-stable code** (check with `@code_warntype` or `Cthulhu.jl`).
* **Types should always have concrete fields.** If you don't know them in advance, use type parameters.
* Isolate unavoidable type instabilities (function barrier).