# Performance

This notebook measures performance of `Simulate.jl` functionality in order to compile the [performance section](https://pbayer.github.io/Simulate.jl/dev/performance/) of the documentation.

In [1]:
using Simulate, BenchmarkTools, Random
res = Dict(); # results dictionary

## Event-based simulations

The following is a modification of the [channel example](https://pbayer.github.io/Simulate.jl/dev/approach/#Event-based-modeling-1). We simulate events 

1. taking something from a common channel or waiting if there is nothing, 
2. then taking a delay, doing a calculation and
3. returning three times to the first step.

As calculation we take the following Machin-like sum:

$$4 \sum_{k=1}^{n} \frac{(-1)^{k+1}}{2 k - 1}$$

This gives a slow approximation to $\pi$. The benchmark creates long queues of timed and conditional events and measures how fast they are handled.

### Function calls as events

The first implementation is based on events with `SimFunction`s. 

In [2]:
function take(id::Int64, qpi::Vector{Float64}, step::Int64)
    if isready(ch)
        take!(ch)                                            # take something from common channel
        event!(SF(put, id, qpi, step), after, rand())    # timed event after some time
    else
        event!(SF(take, id, qpi, step), SF(isready, ch)) # conditional event until channel is ready
    end
end

function put(id::Int64, qpi::Vector{Float64}, step::Int64)
    put!(ch, 1)
    qpi[1] += (-1)^(id+1)/(2id -1)      # Machin-like series (slow approximation to pi)
    step > 3 || take(id, qpi, step+1)
end

function setup(n::Int)                     # a setup he simulation
    reset!(𝐶)
    Random.seed!(123)
    global ch = Channel{Int64}(32)  # create a channel
    global qpi = [0.0]
    si = shuffle(1:n)
    for i in 1:n
        take(si[i], qpi, 1)
    end
    for i in 1:min(n, 32)
        put!(ch, 1) # put first tokens into channel 1
    end
end

setup (generic function with 1 method)

If we setup 250 summation elements, we get 1000 timed events and over 1438 sample steps with conditional events.

In [4]:
@time setup(250)
println(@time run!(𝐶, 500))
println("result=", qpi[1])

  0.000588 seconds (2.05 k allocations: 64.031 KiB)
  0.182265 seconds (1.59 M allocations: 34.813 MiB, 4.82% gc time)
run! finished with 1000 clock events, 1438 sample steps, simulation time: 500.0
result=3.1375926695894556


In [5]:
t = run(@benchmarkable run!(𝐶, 500) setup=setup(250) evals=1 seconds=15.0 samples=50)

BenchmarkTools.Trial: 
  memory estimate:  34.80 MiB
  allocs estimate:  1586345
  --------------
  minimum time:     170.198 ms (0.00% GC)
  median time:      175.226 ms (1.65% GC)
  mean time:        175.521 ms (1.33% GC)
  maximum time:     180.847 ms (1.46% GC)
  --------------
  samples:          50
  evals/sample:     1

In [6]:
res["Event based with SimFunctions"] = minimum(t).time * 1e-6 # ms 

170.19763999999998

### Expressions as events

The 2nd implementation does the same but with expressions, which are `eval`uated in global scope during runtime. This gives a one-time warning for beeing slow:

In [7]:
function take(id::Int64, qpi::Vector{Float64}, step::Int64)
    if isready(ch)
        take!(ch)                                            # take something from common channel
        event!(:(put($id, qpi, $step)), after, rand())   # timed event after some time
    else
        event!(:(take($id, qpi, $step)), :(isready(ch))) # conditional event until channel is ready
    end
end

function put(id::Int64, qpi::Vector{Float64}, step::Int64)
    put!(ch, 1)
    qpi[1] += (-1)^(id+1)/(2id -1)      # Machin-like series (slow approximation to pi)
    step > 3 || take(id, qpi, step+1)
end

put (generic function with 1 method)

In [8]:
@time setup(250)
println(@time run!(𝐶, 500))
println("result=", sum(qpi))

  0.119963 seconds (233.55 k allocations: 12.248 MiB, 9.18% gc time)


└ @ Simulate /Users/paul/.julia/packages/Simulate/nLVtr/src/clock.jl:291


 11.029089 seconds (6.73 M allocations: 384.549 MiB, 0.51% gc time)
run! finished with 1000 clock events, 1438 sample steps, simulation time: 500.0
result=3.1375926695894556


In [9]:
t = run(@benchmarkable run!(𝐶, 500) setup=setup(250) evals=1 seconds=15.0 samples=50)

BenchmarkTools.Trial: 
  memory estimate:  382.13 MiB
  allocs estimate:  6681246
  --------------
  minimum time:     10.963 s (0.40% GC)
  median time:      10.976 s (0.38% GC)
  mean time:        10.976 s (0.38% GC)
  maximum time:     10.989 s (0.36% GC)
  --------------
  samples:          2
  evals/sample:     1

In [11]:
res["Event based with Expressions"] = minimum(t).time * 1e-6 #
res

Dict{Any,Any} with 2 entries:
  "Event based with Expressions"  => 10962.8
  "Event based with SimFunctions" => 170.198

In [12]:
res["Event based with Expressions"]/res["Event based with SimFunctions"]

64.41237809172912

This takes much longer and shows that `eval` for Julia expressions, done in global scope is very expensive and should be avoided if performance is any issue.

### Involving a global variable

The third implementation works with `Simfunction`s like the first but involves a global variable `A`:

In [13]:
function take(id::Int64, qpi::Vector{Float64}, step::Int64)
    if isready(ch)
        take!(ch)                                       # take something from common channel
        event!(SF(put, id, qpi, step), after, rand())    # timed event after some time
    else
        event!(SF(take, id, qpi, step), SF(isready, ch)) # conditional event until channel is ready
    end
end

function put(id::Int64, qpi::Vector{Float64}, step::Int64)
    put!(ch, 1)
    global A += (-1)^(id+1)/(2id -1)      # Machin-like series (slow approximation to pi)
    step > 3 || take(id, qpi, step+1)
end

function setup(n::Int)                     # a setup he simulation
    reset!(𝐶)
    Random.seed!(123)
    global ch = Channel{Int64}(32)  # create a channel
    global A = 0
    si = shuffle(1:n)
    for i in 1:n
        take(si[i], qpi, 1)
    end
    for i in 1:min(n, 32)
        put!(ch, 1) # put first tokens into channel 1
    end
end

setup (generic function with 1 method)

In [14]:
ch = Channel{Int64}(32)
@code_warntype put(1, qpi, 1)

Variables
  #self#[36m::Core.Compiler.Const(put, false)[39m
  id[36m::Int64[39m
  qpi[36m::Array{Float64,1}[39m
  step[36m::Int64[39m

Body[91m[1m::Any[22m[39m
[90m1 ─[39m       Main.put!(Main.ch, 1)
[90m│  [39m       nothing
[90m│  [39m %3  = (id + 1)[36m::Int64[39m
[90m│  [39m %4  = ((-1) ^ %3)[36m::Int64[39m
[90m│  [39m %5  = (2 * id)[36m::Int64[39m
[90m│  [39m %6  = (%5 - 1)[36m::Int64[39m
[90m│  [39m %7  = (%4 / %6)[36m::Float64[39m
[90m│  [39m %8  = (Main.A + %7)[91m[1m::Any[22m[39m
[90m│  [39m       (Main.A = %8)
[90m│  [39m %10 = (step > 3)[36m::Bool[39m
[90m└──[39m       goto #3 if not %10
[90m2 ─[39m       return %10
[90m3 ─[39m %13 = (step + 1)[36m::Int64[39m
[90m│  [39m %14 = Main.take(id, qpi, %13)[91m[1m::Any[22m[39m
[90m└──[39m       return %14


In [15]:
@time setup(250)
println(@time run!(𝐶, 500))
println("result=", A)

  0.039028 seconds (89.01 k allocations: 4.505 MiB)
  0.188030 seconds (1.59 M allocations: 35.104 MiB, 4.06% gc time)
run! finished with 1000 clock events, 1438 sample steps, simulation time: 500.0
result=3.1375926695894556


In [16]:
t = run(@benchmarkable run!(𝐶, 500) setup=setup(250) evals=1 seconds=10.0 samples=30)

BenchmarkTools.Trial: 
  memory estimate:  34.83 MiB
  allocs estimate:  1588345
  --------------
  minimum time:     172.230 ms (0.00% GC)
  median time:      176.953 ms (1.79% GC)
  mean time:        176.710 ms (1.42% GC)
  maximum time:     181.112 ms (1.74% GC)
  --------------
  samples:          30
  evals/sample:     1

In [17]:
res["Event based with functions and a global variable"] = minimum(t).time * 1e-6 #
res

Dict{Any,Any} with 3 entries:
  "Event based with Expressions"                     => 10962.8
  "Event based with SimFunctions"                    => 170.198
  "Event based with functions and a global variable" => 172.23

In this case the compiler does well to infer the type of `A` and it runs only marginally slower than the first version.