In [27]:
using BenchmarkTools
using StaticArrays
using Random

### Benchmarking Code

Julia is just in time (JIT) compiled, so the first time you run code there will be a penalty to first compile the code before it can be run. We can use the `@time` macro to see this effect. (If you run the cell more than once the effect will go away since it has already been compiled!)

In [2]:
x = rand(1000);
function sum_global()
    s = 0.0
    for i in x
        s += i
    end
    return s
end;

In [3]:
@time sum_global();
@time sum_global();

  0.010468 seconds (3.68 k allocations: 78.109 KiB, 98.42% compilation time)
  0.000129 seconds (3.49 k allocations: 70.156 KiB)


The `@time` macro is convenient, but only executes the function once. We can use the `BenchmarkTools.jl` library to investigate the performance of our function with multiple trials. The `@benchmark` macro will run 10,000 trials or 5 seconds whichever comes first. (These can be configured). There is also the `@btime` macro which runs the same number of trials as `@benchmark` but has output similar to `@time`.

In [4]:
@benchmark sum_global()

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m74.100 μs[22m[39m … [35m 2.955 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 95.43%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m77.100 μs              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m79.154 μs[22m[39m ± [32m54.491 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m2.33% ±  3.51%

  [39m [39m [39m [39m▁[39m▂[39m▁[39m [39m [39m▁[39m▆[34m█[39m[39m▄[39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▂[39m▃[39m▆[39m█[39m█[39m█

If your function takes parameters, be sure to "interpolate" them with a `$` when using `@benchmark` this tells `BenchmarkTools.jl` to ignore the allocation and time required to allocate the parameters.

In [16]:
# No Interpolation
@btime inv($(rand(6,6)));
# Parameter is Interpolated
@btime inv(rand(6,6));

  1.020 μs (4 allocations: 3.59 KiB)
  1.370 μs (5 allocations: 3.95 KiB)


### Reducing Memory Allocations
One of the easiest ways to speed up your code is by removing all unnecessary memory allocations. In Python the end-user is not able to manage memory easily and the interprer or library (e.g. numpy) is trusted to handle things.

Julia can also operate like Python, where we just trust the compiler to "do the right thing". This is good for prototyping, but can result in inefficient code. Below we will look at the basics of memory allocations and how to reduce them.

-----------------------------------------------
Lets start by allocating a small array of 10,000 Float64s. Each Float64 is 8 bytes (64 bits), so the total size should be 80,000 bytes or 80kB (78.125 kiB). Note there will be a slight overhead for some internal machinery associated with each allocation (e.g. a pointer to the array).

In [23]:
@time ones(Float64, 100, 100);

  0.000011 seconds (2 allocations: 78.172 KiB)


In Julia all arrays are `heap` allocated since they are re-sizeable. This just means that the array is further away from the CPU and it can be slow(er) to allocate and access. The first strategy to mitigate this penalty is to simply allocate memory as little as possible. To check this we can use our `BenchmarkTools.jl` library to understand the allocations in our function.

In [28]:
function allocates_alot()
    res = 0.0
    for i in 1:100
        x = rand(100,100) # ALLOCATION EVERY LOOP ITERATION
        res += sum(x)
    end
    return res
end

function allocates_once()
    res = 0.0
    storage = zeros(100,100)
    for i in 1:100
        #The ! point means it mutates the input
        # In this case it over writes storage with new random values
        rand!(storage)
        res += sum(storage)
    end
    return res
end


allocates_once (generic function with 1 method)

Using the `@btime` macro we can measure the performace uplift and the number of allocations from each function. You'll find that the function which allocates a new array each loop iteration will be slower and allocate far more memory.

In [30]:
@btime allocates_alot();
@btime allocates_once();

  1.003 ms (200 allocations: 7.63 MiB)
  407.100 μs (2 allocations: 78.17 KiB)


A more subtle, yet common place where allocations occur is array slices (e.g. `arr[1:10]`). In Julia these slices allocate a new array by default. We can get around this by telling Julia that we only want to read the data inside the arr. To do this we use an `array view` which can be acheive with the `view(...)` function or the `@views` macro.

In [33]:
function slices_allocate(x)
    N = Int(length(x) / 2)
    res = sum(x[1:N])
    return res
end

function view_slices(x)
    N = Int(length(x) / 2)
    #The view function takes the array
    # and the indicies you want a view of.
    res = sum(view(x, 1:N))
    return res
end

function view_macro_slices(x)
    N = Int(length(x) / 2)
    # The @views macro tells Julia all array slices in
    # this line should be array views.
    @views res = sum(x[1:N])
    return res
end

view_macro_slices (generic function with 1 method)

Again we see a large speed increase and the number of allocations drop!

In [36]:
x = rand(100000)
@btime slices_allocate(x);
@btime view_slices(x);
@btime view_macro_slices(x);

  37.000 μs (3 allocations: 390.69 KiB)
  5.333 μs (1 allocation: 16 bytes)
  5.317 μs (1 allocation: 16 bytes)


### StaticArrays
Earlier we mentioned that Julia arrays are `heap` allocated because they have variable length (i.e., you can append to them). If you know the length of your array beforehand, you can use the `StaticArrays.jl` library to `stack` allocate your data (if its small enough).

### Type Stability
Julia will do its best to infer the data types of your variables, but if the compiler cannot infer the type of all your variables this leads to type instability and will slow down your code.