# Tools to asses the performance of our functions
Main tools I use every day
* [BenchmarkTools.jl](https://github.com/JuliaCI/BenchmarkTools.jl) to benchmark the speed and memory allocations of our kernels
* Profilers to find bottlenecks (e.g. [ProfileCanvas.jl](https://github.com/pfitzseb/ProfileCanvas.jl))
* `@code_warntype` to check if our code is type stable
* [Cthulhu.jl](https://github.com/JuliaDebug/Cthulhu.jl)
Others
* [JET.jl](https://github.com/aviatesk/JET.jl)
* [Aqua.jl](https://github.com/JuliaTesting/Aqua.jl)

# BenchmarkTools.jl

In [1]:
using BenchmarkTools
f(x, n) = x^n
@btime f(1.2, 3.2)

  0.791 ns (0 allocations: 0 bytes)


1.7921732359737277

Be aware if you want to benchmark functions with non-literal arguments, you
need to use the dollar symbol:

In [2]:
x, n = 1.2, 3.2
@btime f($x, $n)

  10.803 ns (0 allocations: 0 bytes)


1.7921732359737277

# Type instabilities
A type instability when the compiler can't predict (infer) at compile time the types of a variable

In [3]:
function foo(x)
    v = 0.0
    for i in eachindex(x)
        v += x[i]
    end
    v
end

foo (generic function with 1 method)

If we benchmark the previous function with some funky argument, we will observe a non-expected behaviour

In [4]:
x = (1, 0.5, 34, 213.23f0)
@btime foo($x)

  63.903 ns (6 allocations: 224 bytes)


248.72999572753906

The best way to detect if and where we have type instabilities is with the `@code_warntype` macro

In [5]:
using InteractiveUtils
@code_warntype foo(x)

MethodInstance for Main.var"##557".foo(::Tuple{Int64, Float64, Int64, Float32})
  from foo(x) @ Main.var"##557" ~/Desktop/Seminars/JuliaMainz23/basic_perftools.ipynb:1
Arguments
  #self#::Core.Const(Main.var"##557".foo)
  x::Tuple{Int64, Float64, Int64, Float32}
Locals
  @_3::UNION{NOTHING, TUPLE{INT64, INT64}}
  v::Float64
  i::Int64
Body::Float64
1 ─       (v = 0.0)
│   %2  = Main.var"##557".eachindex(x)::Core.Const(Base.OneTo(4))
│         (@_3 = Base.iterate(%2))
│   %4  = (@_3::Core.Const((1, 1)) === nothing)::Core.Const(false)
│   %5  = Base.not_int(%4)::Core.Const(true)
└──       goto #4 if not %5
2 ┄ %7  = @_3::Tuple{Int64, Int64}
│         (i = Core.getfield(%7, 1))
│   %9  = Core.getfield(%7, 2)::Int64
│   %10 = v::Float64
│   %11 = Base.getindex(x, i)::UNION{FLOAT32, FLOAT64, INT64}
│         (v = %10 + %11)
│         (@_3 = Base.iterate(%2, %9))
│   %14 = (@_3 === nothing)::Bool
│   %15 = Base.not_int(%14)::Bool
└──       goto #4 if not %15
3 ─       goto #2
4 ┄       retur

If the argument is a homogeneous data container

In [6]:
x = (1.0, 0.5, 34.0, 213.23)
@code_warntype foo(x)

MethodInstance for Main.var"##557".foo(::NTuple{4, Float64})
  from foo(x) @ Main.var"##557" ~/Desktop/Seminars/JuliaMainz23/basic_perftools.ipynb:1
Arguments
  #self#::Core.Const(Main.var"##557".foo)
  x::NTuple{4, Float64}
Locals
  @_3::UNION{NOTHING, TUPLE{INT64, INT64}}
  v::Float64
  i::Int64
Body::Float64
1 ─       (v = 0.0)
│   %2  = Main.var"##557".eachindex(x)::Core.Const(Base.OneTo(4))
│         (@_3 = Base.iterate(%2))
│   %4  = (@_3::Core.Const((1, 1)) === nothing)::Core.Const(false)
│   %5  = Base.not_int(%4)::Core.Const(true)
└──       goto #4 if not %5
2 ┄ %7  = @_3::Tuple{Int64, Int64}
│         (i = Core.getfield(%7, 1))
│   %9  = Core.getfield(%7, 2)::Int64
│   %10 = v::Float64
│   %11 = Base.getindex(x, i)::Float64
│         (v = %10 + %11)
│         (@_3 = Base.iterate(%2, %9))
│   %14 = (@_3 === nothing)::Bool
│   %15 = Base.not_int(%14)::Bool
└──       goto #4 if not %15
3 ─       goto #2
4 ┄       return v



# Profilers
Profilers can help you find bottlenecks in your code.

In [7]:
f1(x,y,n1,n2,a,b,c) = x^n1 * y^(n2-1) * exp((a + b) / c)
function f1()
    A = zeros(64, 64)
    for i in eachindex(A)
        x, y, n1, n2, a, b, c = rand(7)
        A[i] = f1(x,y,n1,n2,a,b,c)
    end
    return A
end
@btime f1();

  236.042 μs (4098 allocations: 480.05 KiB)


As you can see the performance is not great. Lets see where the bottleneck is with the profiler

In [8]:
using ProfileCanvas
ProfileCanvas.@profview for _ in 1:100 f1() end

In [9]:
function f2()
    A = zeros(64, 64)
    for i in eachindex(A)
        x, y, n1, n2, a, b, c = ntuple( _ -> rand(), Val(7))
        A[i] = f1(x, y, n1, n2, a, b, c)
    end
    return A
end
@btime f2();

  134.958 μs (2 allocations: 32.05 KiB)


In [10]:
ProfileCanvas.@profview for _ in 1:100 f2() end

---

*This notebook was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*