# Chapter 11: Performance Management
This notebook contains the sample source code explained in the book *Hands-On Julia Programming, Sambit Kumar Dash, 2021, bpb Publications. All Rights Reserved*.


In [1]:
using Pkg
pkg"activate ."
pkg"instantiate"

[32m[1m  Activating[22m[39m environment at `~/work/books/HOPJ/Chapter-11/Project.toml`


## 11.1 Introduction

Optimization is an art. There is no absolute general purpose optimization for all the domains. 

### The Right Level of Optimization

Sometimes you need to decide on optimization vs. accuracy trade offs. They can be domain specific. In the example below, we compute the value of `sin θ` by approximating `sin δ` for a small value `δ=15°`. We use the formula
```sin (A+B) = sin A cos B + cos A sin B``` to interpolate for 6 angles 15° apart i.e. from 0° to 90°.

In [2]:
n = 6
δ = pi/180*(90/n)
sinδ, cosδ = δ, (1 - δ*δ/2) # <--- Approximation from Taylor's equation

(0.2617993877991494, 0.9657305402739953)

In [3]:
m = Matrix{Float64}(undef, (n+1, 2))
m[1, 1], m[1, 2] = 0.0, 1.0
for i = 1:n
    m[i+1, 1] = m[i, 1]*cosδ + m[i, 2]*sinδ
    m[i+1, 2] = m[i, 2]*cosδ - m[i, 1]*sinδ
end
m

7×2 Matrix{Float64}:
 0.0        1.0
 0.261799   0.965731
 0.505655   0.864097
 0.714547   0.702104
 0.87387    0.490976
 0.97246    0.245371
 1.00337   -0.0176268

We compare the output obtained earlier with the computation from the Julia functions `sin` and `cos`. 

In [4]:
m1 = [[sin(i*δ) for i=0:n] [cos(i*δ) for i=0:n]] 

7×2 Matrix{Float64}:
 0.0       1.0
 0.258819  0.965926
 0.5       0.866025
 0.707107  0.707107
 0.866025  0.5
 0.965926  0.258819
 1.0       6.12323e-17

By comparing the mean and standard deviations of the both the tables, you can realize the approximation is a reasonable one. 

In [5]:
using Statistics
mean(m1-m), std(m1-m)

(0.0009570353971457324, 0.00784842483267987)

### Resources

However much resources you have, there will be one problem that will require way beyond the resources you can expend. 

### The Choice of Algorithm

Always use the most effective algorithm for your problem. The most effective algorithm may not be the one with the best asymptotic performance. 

### Optimize Wisely

Optimizing every piece of code is a waste of time and energy. Judiciously choose the code that will give the best returns for the time you spend fixing it. Profiling is a great way to discover such code. 

### Julia vs. Competition

Julia outperforms the competition in many micro-benchmark tests performed. The performance report can be obtained from https://julialang.org/benchmarks/

## 11.2 Benchmarking

`sin(x)` function assumes the variable `x` is in radians. However, we will use a function `sindeg(x)` where `x` is in degrees. We shall also evaluate if `sin(x)` function available in Julia Base is the one we should use or approximate with a table lookup scheme we discussed. 

We use `BenchmarkTools` to evaluate the time and space performance of the code. If the package is not installed, you can install the package using the command: ``` julia> ]add BenchmarkTools ```. 

In [6]:
using BenchmarkTools

In [7]:
deg2rad(x) = pi/180.0*x
sindeg = sin ∘ deg2rad
cosdeg = cos ∘ deg2rad

cos ∘ deg2rad

`@benchmark` shall run the code several times, aggregate the performance over several samples and report the statistics of the result. The minimum time is a better estimate as it is with the least system overheads or noise. 

In [8]:
@benchmark sindeg(52.0)

BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     25.044 ns (0.00% GC)
  median time:      34.075 ns (0.00% GC)
  mean time:        41.921 ns (1.23% GC)
  maximum time:     2.125 μs (96.15% GC)
  --------------
  samples:          10000
  evals/sample:     994

`@btime` reports the results like `@time`, reporting the allocations and minimum time elapsed in a single function call while running the function over multiple sample runs. We shall be using `@btime` in most of our examples. 

In [9]:
@btime sindeg(52.0)

  24.716 ns (1 allocation: 16 bytes)


0.788010753606722

We create a memoization lookup with 6 intermediate values from 0 to 90 degrees, each at 15 degrees interval. For values within an interval range we use the `sin(A + δ)` expression with approximated values for `sin δ` and `cos δ` as we have shown earlier. 

In [10]:
struct MemoLookup
    n
    lookup
    function MemoLookup(n)   
        step = 90/n
        δ = pi/180*step
        lookup = [[sin(i*δ) for i=0:n] [cos(i*δ) for i=0:n]]
        new(n, lookup)
    end
end
step(m::MemoLookup) = 90/m.n

step (generic function with 1 method)

In [11]:
m = MemoLookup(6)
function sindegmemo(m::MemoLookup, x)
    s, lookup = step(m), m.lookup
    i = 1
    while x >= s
        x -= s
        i += 1
    end
    iszero(x) && return lookup[i, 1]
    x = deg2rad(x)
    sinx, cosx = x, 1 - x*x/2
    return lookup[i, 1]*cosx + lookup[i, 2]*sinx
end

sindegmemo (generic function with 1 method)

In [12]:
@benchmark sindegmemo($m, 52) 

BenchmarkTools.Trial: 
  memory estimate:  208 bytes
  allocs estimate:  13
  --------------
  minimum time:     337.192 ns (0.00% GC)
  median time:      424.865 ns (0.00% GC)
  mean time:        531.337 ns (1.51% GC)
  maximum time:     13.200 μs (91.72% GC)
  --------------
  samples:          10000
  evals/sample:     208

The code is slower than what was earlier computed. There are 13 allocations and 208 bytes of additional memory needed for the computation. What can explain such allocations? Frequent yet small chunks of memory allocation can be a significant performance overhead. 

## 11.3 Code Generation Tools

We will look at the ASTs and intermediate code that will be generated to get an understanding of how the code will be processed. These are some of the best ways to debug and fix non-performing code. 

### Type Stability

We discussed about type stability earlier and know that non-deterministic types can lead to inefficient code. Let's evaluate if the code above has types that cannot be inferred deterministically. 

In [13]:
@code_warntype sindegmemo(m, 52)

Variables
  #self#[36m::Core.Const(sindegmemo)[39m
  m[36m::MemoLookup[39m
  x@_3[36m::Int64[39m
  cosx[91m[1m::Any[22m[39m
  sinx[91m[1m::Any[22m[39m
  i[36m::Int64[39m
  lookup[91m[1m::Any[22m[39m
  s[91m[1m::Any[22m[39m
  x@_9[91m[1m::Any[22m[39m

Body[91m[1m::Any[22m[39m
[90m1 ─[39m       (x@_9 = x@_3)
[90m│  [39m       Core.NewvarNode(:(cosx))
[90m│  [39m       Core.NewvarNode(:(sinx))
[90m│  [39m %4  = Main.step(m)[91m[1m::Any[22m[39m
[90m│  [39m %5  = Base.getproperty(m, :lookup)[91m[1m::Any[22m[39m
[90m│  [39m       (s = %4)
[90m│  [39m       (lookup = %5)
[90m└──[39m       (i = 1)
[90m2 ┄[39m %9  = (x@_9 >= s)[91m[1m::Any[22m[39m
[90m└──[39m       goto #4 if not %9
[90m3 ─[39m       (x@_9 = x@_9 - s)
[90m│  [39m       (i = i + 1)
[90m└──[39m       goto #2
[90m4 ─[39m %14 = Main.iszero(x@_9)[91m[1m::Any[22m[39m
[90m└──[39m       goto #6 if not %14
[90m5 ─[39m %16 = Base.getindex(lookup, i, 1)[91m[

There are many variables assigned to the type `Any`. This essentially means there will be an allocation for the location of memory and indirection to actual `Float64` value during computation. We will add a type to the declaration for the parameter `x`. 

In [14]:
function sindegmemo(m::MemoLookup, x::Float64)
    s, lookup = step(m), m.lookup
    i = 1
    while x >= s
        x -= s
        i += 1
    end
    iszero(x) && return lookup[i, 1]
    x = deg2rad(x)
    sinx, cosx = x, 1 - x*x/2
    return lookup[i, 1]*cosx + lookup[i, 2]*sinx
end

sindegmemo (generic function with 2 methods)

In [15]:
@code_warntype sindegmemo(m, 52.0)

Variables
  #self#[36m::Core.Const(sindegmemo)[39m
  m[36m::MemoLookup[39m
  x@_3[36m::Float64[39m
  cosx[91m[1m::Any[22m[39m
  sinx[91m[1m::Any[22m[39m
  i[36m::Int64[39m
  lookup[91m[1m::Any[22m[39m
  s[91m[1m::Any[22m[39m
  x@_9[91m[1m::Any[22m[39m

Body[91m[1m::Any[22m[39m
[90m1 ─[39m       (x@_9 = x@_3)
[90m│  [39m       Core.NewvarNode(:(cosx))
[90m│  [39m       Core.NewvarNode(:(sinx))
[90m│  [39m %4  = Main.step(m)[91m[1m::Any[22m[39m
[90m│  [39m %5  = Base.getproperty(m, :lookup)[91m[1m::Any[22m[39m
[90m│  [39m       (s = %4)
[90m│  [39m       (lookup = %5)
[90m└──[39m       (i = 1)
[90m2 ┄[39m %9  = (x@_9 >= s)[91m[1m::Any[22m[39m
[90m└──[39m       goto #4 if not %9
[90m3 ─[39m       (x@_9 = x@_9 - s)
[90m│  [39m       (i = i + 1)
[90m└──[39m       goto #2
[90m4 ─[39m %14 = Main.iszero(x@_9)[91m[1m::Any[22m[39m
[90m└──[39m       goto #6 if not %14
[90m5 ─[39m %16 = Base.getindex(lookup, i, 1)[91m

A few variables got a deterministic type assignment, yet we have quite a few still assigned with the type `Any`. They are mostly associated with the attribute `lookup` of the `MemoLookup` type. 

In [16]:
struct MemoLookup2
    n::Int
    lookup::Matrix{Float64}
    function MemoLookup2(n)
        step = 90/n
        δ = pi/180*step
        lookup = [[sin(i*δ) for i=0:n] [cos(i*δ) for i=0:n]]
        new(n, lookup)
    end
end
step(m::MemoLookup2) = 90/m.n

step (generic function with 2 methods)

In [17]:
m2 = MemoLookup2(6)
function sindegmemo(m::MemoLookup2, x::Float64)
    s, lookup = step(m), m.lookup
    i = 1
    while x >= s
        x -= s
        i += 1
    end
    iszero(x) && return lookup[i, 1]
    x = deg2rad(x)
    sinx, cosx = x, 1 - x*x/2
    return lookup[i, 1]*cosx + lookup[i, 2]*sinx
end

sindegmemo (generic function with 3 methods)

By eliminating the type ambiguities in the `MemoLookup2` type, we removed all the type ambiguities. 

In [18]:
@code_warntype sindegmemo(m2, 52.0)

Variables
  #self#[36m::Core.Const(sindegmemo)[39m
  m[36m::MemoLookup2[39m
  x@_3[36m::Float64[39m
  cosx[36m::Float64[39m
  sinx[36m::Float64[39m
  i[36m::Int64[39m
  lookup[36m::Matrix{Float64}[39m
  s[36m::Float64[39m
  x@_9[36m::Float64[39m

Body[36m::Float64[39m
[90m1 ─[39m       (x@_9 = x@_3)
[90m│  [39m       Core.NewvarNode(:(cosx))
[90m│  [39m       Core.NewvarNode(:(sinx))
[90m│  [39m %4  = Main.step(m)[36m::Float64[39m
[90m│  [39m %5  = Base.getproperty(m, :lookup)[36m::Matrix{Float64}[39m
[90m│  [39m       (s = %4)
[90m│  [39m       (lookup = %5)
[90m└──[39m       (i = 1)
[90m2 ┄[39m %9  = (x@_9 >= s)[36m::Bool[39m
[90m└──[39m       goto #4 if not %9
[90m3 ─[39m       (x@_9 = x@_9 - s)
[90m│  [39m       (i = i + 1)
[90m└──[39m       goto #2
[90m4 ─[39m %14 = Main.iszero(x@_9)[36m::Bool[39m
[90m└──[39m       goto #6 if not %14
[90m5 ─[39m %16 = Base.getindex(lookup, i, 1)[36m::Float64[39m
[90m└──[39m       retu

The execution performance has improved and there are no additional allocations.

In [19]:
@btime sindegmemo($m2, 52.0)  

  16.225 ns (0 allocations: 0 bytes)


0.788218944092369

In [20]:
@btime sindeg(52.0)

  25.171 ns (1 allocation: 16 bytes)


0.788010753606722

In [21]:
@code_warntype sindeg(52.0)

Variables
  c[36m::Core.Const(sin ∘ deg2rad)[39m
  x[36m::Tuple{Float64}[39m

Body[36m::Float64[39m
[90m1 ─[39m %1 = Base.getproperty(c, :outer)[36m::Core.Const(sin)[39m
[90m│  [39m %2 = Base.getproperty(c, :inner)[36m::Core.Const(deg2rad)[39m
[90m│  [39m %3 = Core._apply_iterate(Base.iterate, %2, x)[36m::Float64[39m
[90m│  [39m %4 = (%1)(%3)[36m::Float64[39m
[90m└──[39m      return %4


`sindeg` is a variable that is assigned an anonymous function. Since the anonymous function has a variable argument, that has to be created in every run as a tuple with the parameter value (52.0 is this case) of the function.

In [22]:
typeof(sindeg)

ComposedFunction{typeof(sin), typeof(deg2rad)}

We will rewrite as simple function call. 

In [23]:
sindeg2(x) = sin(deg2rad(x))
@btime sindeg2(30)

  1.882 ns (0 allocations: 0 bytes)


0.49999999999999994

The equivalent function call in Julia `sind(x)` has almost similar performance. And we can consider that as a reasonable benchmark for our code. 

In [24]:
@btime sind(30)

  1.816 ns (0 allocations: 0 bytes)


0.5

`sindegmemo(m2, 52.0)` with 19.25ns elapsed time definitely is off from a reasonable performance. 

In [25]:
struct MemoLookupP{N}
    lookup::Matrix{Float64}
    function MemoLookupP{N}() where N
        δ = pi/2N
        lookup = [[sin(i*δ) for i=0:N] [cos(i*δ) for i=0:N]]
        new{N}(lookup)
    end
end
step(m::MemoLookupP{N}) where N = 90.0/N
const m3 = MemoLookupP{6}()

MemoLookupP{6}([0.0 1.0; 0.25881904510252074 0.9659258262890683; … ; 0.9659258262890682 0.25881904510252096; 1.0 6.123233995736766e-17])

In [26]:
function sindegmemo(m::MemoLookupP, x::Float64)
    s, lookup = step(m), m.lookup
    i = 1
    while x >= s
        x -= s
        i += 1
    end
    x, sini = deg2rad(x), lookup[i, 1]
    iszero(x) && return sini
    sinx, cosx, cosi = x, 1 - x*x/2, lookup[i, 2]
    return sini*cosx + cosi*sinx
end

sindegmemo (generic function with 4 methods)

19.2ns to 13ns is a significant improvement but still not close to the `sind(x)` performance. 

In [27]:
@btime sindegmemo(m3, 60.0)

  1.825 ns (0 allocations: 0 bytes)


0.8660254037844386

In [28]:
@code_llvm sindegmemo(m3, 60.0)

[90m;  @ In[26]:1 within `sindegmemo'[39m
[95mdefine[39m [36mdouble[39m [93m@julia_sindegmemo_3317[39m[33m([39m[33m[[39m[33m1[39m [0mx [33m{[39m[33m}[39m[0m*[33m][39m[0m* [95mnocapture[39m [95mnonnull[39m [95mreadonly[39m [95malign[39m [33m8[39m [95mdereferenceable[39m[33m([39m[33m8[39m[33m)[39m [0m%0[0m, [36mdouble[39m [0m%1[33m)[39m [33m{[39m
[91mtop:[39m
[90m;  @ In[26]:2 within `sindegmemo'[39m
[90m; ┌ @ Base.jl:33 within `getproperty'[39m
   [0m%2 [0m= [96m[1mgetelementptr[22m[39m [95minbounds[39m [33m[[39m[33m1[39m [0mx [33m{[39m[33m}[39m[0m*[33m][39m[0m, [33m[[39m[33m1[39m [0mx [33m{[39m[33m}[39m[0m*[33m][39m[0m* [0m%0[0m, [36mi64[39m [33m0[39m[0m, [36mi64[39m [33m0[39m
   [0m%3 [0m= [96m[1mload[22m[39m [95matomic[39m [33m{[39m[33m}[39m[0m*[0m, [33m{[39m[33m}[39m[0m** [0m%2 [95munordered[39m[0m, [95malign[39m [33m8[39m
[90m; └[39m
[90m;  @ In[26]:4 w

Look at the code generated for bounds checking in the above code. Is there a way to improve upon those?

In [29]:
function sindegmemo(m::MemoLookupP, x::Float64)
    s, lookup = step(m), m.lookup
    i = 1
    while x >= s
        x -= s
        i += 1
    end
    x, sini = deg2rad(x), @inbounds lookup[i, 1]
    iszero(x) && return sini
    sinx, cosx, cosi = x, 1 - x*x/2, @inbounds lookup[i, 2]
    return sini*cosx + cosi*sinx
end

sindegmemo (generic function with 4 methods)

In [30]:
@btime sindegmemo($m3, 60.0)

  1.815 ns (0 allocations: 0 bytes)


0.8660254037844386

In [31]:
@btime sindegmemo($m3, 22.0)

  1.819 ns (0 allocations: 0 bytes)


0.3748975477461437

## 11.4 Profiling

Profiling is done to search for a poorly performing code in a system. 

In [32]:
using Profile

In [33]:
function profile_test(n)
    for i = 1:n
        A = randn(100,100,20)
        m = maximum(A)
        Am = mapslices(sum, A; dims=2)
        B = A[:,:,5]
        Bsort = mapslices(sort, B; dims=1)
        b = rand(100)
        C = B.*b
    end
end

profile_test (generic function with 1 method)

In [34]:
profile_test(1)
@profile profile_test(100)

In [35]:
Profile.print()

Overhead ╎ [+additional indent] Count File:Line; Function
   ╎329 @Base/task.jl:406; (::IJulia.var"#15#18")()
   ╎ 329 @IJulia/src/eventloop.jl:8; eventloop(socket::ZMQ.Socket)
   ╎  329 @Base/essentials.jl:706; invokelatest
   ╎   329 @Base/essentials.jl:708; #invokelatest#2
   ╎    329 .../execute_request.jl:67; execute_request(socket::ZMQ.So...
   ╎     329 .../SoftGlobalScope.jl:65; softscope_include_string(m::Mo...
   ╎    ╎ 329 @Base/loading.jl:1094; include_string(mapexpr::type...
  2╎    ╎  329 @Base/boot.jl:360; eval
   ╎    ╎   79  In[33]:3; profile_test(n::Int64)
   ╎    ╎    79  ...dom/src/normal.jl:229; randn
   ╎    ╎     79  ...om/src/normal.jl:223; randn
   ╎    ╎    ╎ 7   @Base/boot.jl:464; Array
  7╎    ╎    ╎  7   @Base/boot.jl:452; Array
   ╎    ╎    ╎ 15  ...om/src/normal.jl:209; randn!(rng::Random.Mersen...
   ╎    ╎    ╎  15  ...m/src/Random.jl:266; rand!
   ╎    ╎    ╎   15  ...dom/src/RNGs.jl:589; rand!
   ╎    ╎    ╎    15  ...dom/src/RNGs.jl:583; _rand!
   ╎ 

In [36]:
Profile.clear()

In [37]:
using ProfileView

Gtk-Message: 02:30:03.820: Failed to load module "canberra-gtk-module"
Gtk-Message: 02:30:03.821: Failed to load module "canberra-gtk-module"


In [38]:
@profview profile_test(100)

Gtk.GtkWindowLeaf(name="", parent, width-request=-1, height-request=-1, visible=TRUE, sensitive=TRUE, app-paintable=FALSE, can-focus=FALSE, has-focus=FALSE, is-focus=FALSE, focus-on-click=TRUE, can-default=FALSE, has-default=FALSE, receives-default=FALSE, composite-child=FALSE, style, events=0, no-show-all=FALSE, has-tooltip=FALSE, tooltip-markup=NULL, tooltip-text=NULL, window, opacity=1.000000, double-buffered, halign=GTK_ALIGN_FILL, valign=GTK_ALIGN_FILL, margin-left, margin-right, margin-start=0, margin-end=0, margin-top=0, margin-bottom=0, margin=0, hexpand=FALSE, vexpand=FALSE, hexpand-set=FALSE, vexpand-set=FALSE, expand=FALSE, scale-factor=1, border-width=0, resize-mode, child, type=GTK_WINDOW_TOPLEVEL, title="Profile", role=NULL, resizable=TRUE, modal=FALSE, window-position=GTK_WIN_POS_NONE, default-width=800, default-height=600, destroy-with-parent=FALSE, hide-titlebar-when-maximized=FALSE, icon, icon-name=NULL, screen, type-hint=GDK_WINDOW_TYPE_HINT_NORMAL, skip-taskbar-hint

## 11.5 Guidance for High Performance Code

In chapter-9, we used some general functional programming patterns to address some standard programmimng challenges. Here we shall look at some patterns that help address some performance bottlenecks in code.    

### Memoization

Using cache can help reduce computations as previous computations can be reused.  

In [39]:
using BenchmarkTools
fib(n) = n < 3 ? 1 : fib(n-1) + fib(n-2)

fib (generic function with 1 method)

In [40]:
function fib(n)
    if n < 3
        return (result=1, calls=1)
    else
        r1, c1 = fib(n-2)
        r2, c2 = fib(n-1)
        return (result=r1+r2, calls=1+c1+c2)
    end
end

fib (generic function with 1 method)

In [41]:
for i = 1:10
    r, c = fib(i)
    println("n:", i, "\tresult: ", r, "\tcalls: ", c)
end

n:1	result: 1	calls: 1
n:2	result: 1	calls: 1
n:3	result: 2	calls: 3
n:4	result: 3	calls: 5
n:5	result: 5	calls: 9
n:6	result: 8	calls: 15
n:7	result: 13	calls: 25
n:8	result: 21	calls: 41
n:9	result: 34	calls: 67
n:10	result: 55	calls: 109


In [42]:
@btime fib(20)

  31.367 μs (0 allocations: 0 bytes)


(result = 6765, calls = 13529)

In [43]:
const mem = Dict()

Dict{Any, Any}()

In [44]:
function fib(n)
    haskey(mem, n) && return mem[n]
    println("Calling fib: ", n)
    res = n < 3 ? 1 : fib(n-1) + fib(n-2)
    mem[n] = res
    return res
end
        

fib (generic function with 1 method)

In [45]:
fib(5)

Calling fib: 5
Calling fib: 4
Calling fib: 3
Calling fib: 2
Calling fib: 1


5

In [46]:
@btime fib(20)

Calling fib: 20
Calling fib: 19
Calling fib: 18
Calling fib: 17
Calling fib: 16
Calling fib: 15
Calling fib: 14
Calling fib: 13
Calling fib: 12
Calling fib: 11
Calling fib: 10
Calling fib: 9
Calling fib: 8
Calling fib: 7
Calling fib: 6
  22.028 ns (0 allocations: 0 bytes)


6765

In [47]:
fib(n) = n < 3 ? 1 : fib(n-1) + fib(n-2)

fib (generic function with 1 method)

In [48]:
function memoize(f)
    memo = Dict()
    (args...; kwargs...) -> begin
        x = (args, kwargs)
        haskey(memo, x) && return memo[x]
        v = f(args...; kwargs...)
        memo[x] = v
        return v
    end
end

memoize (generic function with 1 method)

In [49]:
fib! = memoize(fib)

#29 (generic function with 1 method)

In [50]:
@btime fib!(40)

  36.380 ns (0 allocations: 0 bytes)


102334155

In [51]:
@btime fib(40)

  376.974 ms (0 allocations: 0 bytes)


102334155

### Global Variables

The Julia global variables cannot have a declared type. Hence, the performance can be significantly affected when such variables are in use. 

In [52]:
GLOBAL_VAR = 3

3

In [53]:
function add_to_global(x) 
    x + GLOBAL_VAR
end

add_to_global (generic function with 1 method)

In [54]:
@btime add_to_global(10)

  22.364 ns (0 allocations: 0 bytes)


13

Contrast this to a simple addition operation. The global variables are inefficient as they are not type safe. 

In [55]:
@btime 10 + $GLOBAL_VAR

  0.030 ns (0 allocations: 0 bytes)


13

When we use a const instead of a variable, we see significant performance gain. 

In [56]:
const GLOBAL_CONST = 20
add_to_global_const(x) = x + GLOBAL_CONST
@btime add_to_global_const(3)

  0.031 ns (0 allocations: 0 bytes)


23

Here 20 as a constant is used in the addition operation. 

In [57]:
@code_llvm add_to_global_const(3)

[90m;  @ In[56]:2 within `add_to_global_const'[39m
[95mdefine[39m [36mi64[39m [93m@julia_add_to_global_const_5858[39m[33m([39m[36mi64[39m [95msignext[39m [0m%0[33m)[39m [33m{[39m
[91mtop:[39m
[90m; ┌ @ int.jl:87 within `+'[39m
   [0m%1 [0m= [96m[1madd[22m[39m [36mi64[39m [0m%0[0m, [33m20[39m
[90m; └[39m
  [96m[1mret[22m[39m [36mi64[39m [0m%1
[33m}[39m


In [58]:
const GLOBAL_REF = Ref(10)

Base.RefValue{Int64}(10)

In [59]:
add_to_global_ref(x) = x + GLOBAL_REF[]

add_to_global_ref (generic function with 1 method)

The code has two parts. Load the value from the memory to a local volatile memory or register. Perform the addition operation in the register. 

In [60]:
@code_llvm add_to_global_ref(3)

[90m;  @ In[59]:1 within `add_to_global_ref'[39m
[95mdefine[39m [36mi64[39m [93m@julia_add_to_global_ref_5934[39m[33m([39m[36mi64[39m [95msignext[39m [0m%0[33m)[39m [33m{[39m
[91mtop:[39m
[90m; ┌ @ refvalue.jl:56 within `getindex'[39m
[90m; │┌ @ Base.jl:33 within `getproperty'[39m
    [0m%1 [0m= [96m[1mload[22m[39m [36mi64[39m[0m, [36mi64[39m[0m* [95minttoptr[39m [33m([39m[36mi64[39m [33m140067213850240[39m [95mto[39m [36mi64[39m[0m*[33m)[39m[0m, [95malign[39m [33m128[39m
[90m; └└[39m
[90m; ┌ @ int.jl:87 within `+'[39m
   [0m%2 [0m= [96m[1madd[22m[39m [36mi64[39m [0m%1[0m, [0m%0
[90m; └[39m
  [96m[1mret[22m[39m [36mi64[39m [0m%2
[33m}[39m


In [61]:
add_to_global_ref(x) = x + GLOBAL_REF[]

add_to_global_ref (generic function with 1 method)

In [62]:
@btime add_to_global_ref(3)

  2.170 ns (0 allocations: 0 bytes)


13

You can easily wrap a `Ref` inside an accessor pattern as well. 

In [63]:
let _x = Ref(5)
    global X() = _x[]
    global X(y) = (_x[] = y)
end

X (generic function with 2 methods)

In [64]:
@btime X()

  1.883 ns (0 allocations: 0 bytes)


5

In [65]:
@btime X(10)

  1.882 ns (0 allocations: 0 bytes)


10

## 11.6 Conclusion

## Exercises