## C01 Analyzing Performance
### Timing Julia Functions
- `@time` measure the time of expression run; **not return the time they used**
- `@timev` enhanced `time`; **not return the time they used**
- `@elapsed` **return the time of expression run**

<mark style="background: #DB4D6D;">**Note**: for the first execute expression, the timing will record the time used for compile</mark>


In [6]:
@time sqrt.(rand(1000));

  0.000006 seconds (2 allocations: 15.875 KiB)


In [15]:
@time for i in 1:1000
    x = sin.(rand(1000))
end

  0.011463 seconds (2.00 k allocations: 15.503 MiB)


In [34]:
@timev sqrt.(rand(1000));

  0.000005 seconds (2 allocations: 15.875 KiB)
elapsed time (ns):  5208
gc time (ns):       0
bytes allocated:    16256
pool allocs:        0
non-pool GC allocs: 2
minor collections:  0
full collections:   0


In [38]:
a = @time sqrt.(rand(1000));
sum(@time sqrt.(rand(1000)));
println("\na = " * string(a))

  0.000006 seconds (2 allocations: 15.875 KiB)
  0.000005 seconds (2 allocations: 15.875 KiB)

a = [0.22928406642301435, 0.6355696173000532, 0.7507026474740647, 0.4083889107037258, 0.9078622457560449, 0.5986201416729272, 0.9913453779874813, 0.6909087295791443, 0.3349598781092458, 0.4059871584897557, 0.6376673215695967, 0.30864910243070925, 0.3555903895197032, 0.7562427086712816, 0.7020433480815743, 0.795625096415106, 0.8387906036459296, 0.7811308943494811, 0.9485721002343293, 0.7418204693059206, 0.48528316491694107, 0.8483360526747398, 0.9391562499724064, 0.4798687951642979, 0.5220655287300909, 0.9714791628953767, 0.9914092730328488, 0.8313618969477511, 0.9171728318656901, 0.3725181688459916, 0.46252928615414174, 0.7489396869964122, 0.3429257794939777, 0.7846844866797308, 0.9237726611775696, 0.7550797088174407, 0.6043206201765189, 0.9542891586066216, 0.7718259458135409, 0.6563237998547797, 0.8587371502427545, 0.7852884300094256, 0.7531773991891628, 0.7215169131540982, 0.851541793727049

In [46]:
a = @elapsed sqrt.(rand(1000))
println("a = " * string(a))

a = 6.083e-6


In [49]:
using Test
@test @elapsed(sqrt.(rand(1000))) <= 1e-5

[32m[1mTest Passed[22m[39m

### The Julia Profiler
> The profiler is used to identify bottlenecks in code

- `Profile`
  - `Profile.clear()`: clear the profile info stored in memory.
  - `@profile`: record the profile of expression
  - `Profile.print()`: show the profile info of expression.
  - `Profile.init(delay=::Number)`: setting the *sampling interval*. defualt is `1ms` (`delay=.001`) in Linux. `.01` is `10ms`
- The result of Profile
  - The first field is the count time, which is determined by the interval of count. The larger number of count times represents this part take longer time to run.
  - For long-running program, we need to set larger interval.
- `TimerOutputs`: **for long-runing program**
  - `const to = TimerOutput();`: define a global `TimerOutput`, and pass it to `to`
  - `@time to::TimerOutput "name"::String`
  - `print_timer(to::TimerOutput)`

In [54]:
using Profile

In [56]:
using Statistics
function randmsq()
    x = rand(10000,1000)
    y = mean(x .^2, dims=1)
    return y
end

randmsq (generic function with 1 method)

In [79]:
randmsq();
Profile.clear()
Profile.init(delay = .001)
@profile randmsq();
Profile.print()

Overhead ╎ [+additional indent] Count File:Line; Function
   ╎16  @Base/client.jl:552; _start()
   ╎ 16  @Base/client.jl:318; exec_options(opts::Base.JLOptions)
   ╎  16  @Base/Base.jl:495; include(mod::Module, _path::Str...
   ╎   16  @Base/loading.jl:2130; _include(mapexpr::Function, mo...
   ╎    16  @Base/loading.jl:2070; include_string(mapexpr::typeo...
   ╎     16  @Base/boot.jl:385; eval
   ╎    ╎ 16  ...tebook/notebook.jl:32; top-level scope
   ╎    ╎  16  .../serve_notebook.jl:75; kwcall(::@NamedTuple{crashre...
   ╎    ╎   16  ...serve_notebook.jl:139; serve_notebook(pipename::St...
   ╎    ╎    16  ...NRPC/src/typed.jl:67; dispatch_msg(x::VSCodeServe...
   ╎    ╎     16  ...serve_notebook.jl:13; notebook_runcell_request(co...
   ╎    ╎    ╎ 16  ...rver/src/repl.jl:274; withpath(f::VSCodeServer....
   ╎    ╎    ╎  16  ...erve_notebook.jl:19; (::VSCodeServer.var"#208#...
   ╎    ╎    ╎   16  ...e/essentials.jl:884; invokelatest
   ╎    ╎    ╎    16  ...e/essentials.jl:887; #in

In [81]:
using TimerOutputs
const to = TimerOutput();

function randmsq_timed()
    @timeit to "randmsq" begin
        x = @timeit to "rand" rand(10000,1000)
        y = @timeit to "mean" mean(x .^2, dims=1)
    end
end

randmsq_timed (generic function with 1 method)

In [233]:
randmsq_timed();
print_timer(to)

[0m[1m ────────────────────────────────────────────────────────────────────[22m
[0m[1m                   [22m         Time                    Allocations      
                   ───────────────────────   ────────────────────────
 Tot / % measured:      1302s /   0.0%           3.69GiB /  48.4%    

 Section   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────
 randmsq       12    395ms  100.0%  33.0ms   1.79GiB  100.0%   153MiB
   rand        12    251ms   63.4%  20.9ms    916MiB   50.0%  76.3MiB
   mean        12    145ms   36.6%  12.1ms    916MiB   50.0%  76.3MiB
[0m[1m ────────────────────────────────────────────────────────────────────[22m


In [244]:
const to_myself = TimerOutput();
function testmy()
    @timeit to_myself "forloop" for i in 1:1000
            x = sin.(rand(1000))
    end
    @timeit to_myself "parallel_for" begin
        Threads.@threads for i in 1:1000
            x = sin.(rand(1000))
        end
    end
end



testmy (generic function with 1 method)

In [245]:
testmy();
print_timer(to_myself)

[0m[1m ─────────────────────────────────────────────────────────────────────────[22m
[0m[1m                        [22m         Time                    Allocations      
                        ───────────────────────   ────────────────────────
    Tot / % measured:        835ms /   4.5%           35.0MiB /  91.6%    

 Section        ncalls     time    %tot     avg     alloc    %tot      avg
 ─────────────────────────────────────────────────────────────────────────
 parallel_for        1   23.6ms   62.6%  23.6ms   16.5MiB   51.6%  16.5MiB
 forloop             1   14.1ms   37.4%  14.1ms   15.5MiB   48.4%  15.5MiB
[0m[1m ─────────────────────────────────────────────────────────────────────────[22m
