# Table of Contents
 <p><div class="lev1 toc-item"><a href="#Julia:-Functions,-Type-System,-Multiple-Dispatch,-JIT,-and-Profiling" data-toc-modified-id="Julia:-Functions,-Type-System,-Multiple-Dispatch,-JIT,-and-Profiling-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Julia: Functions, Type System, Multiple Dispatch, JIT, and Profiling</a></div><div class="lev2 toc-item"><a href="#Control-flow-and-loops" data-toc-modified-id="Control-flow-and-loops-11"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Control flow and loops</a></div><div class="lev2 toc-item"><a href="#Functions" data-toc-modified-id="Functions-12"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Functions</a></div><div class="lev2 toc-item"><a href="#Type-system" data-toc-modified-id="Type-system-13"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Type system</a></div><div class="lev2 toc-item"><a href="#Multiple-dispatch" data-toc-modified-id="Multiple-dispatch-14"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Multiple dispatch</a></div><div class="lev2 toc-item"><a href="#Just-in-time-compilation-(JIT)" data-toc-modified-id="Just-in-time-compilation-(JIT)-15"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Just-in-time compilation (JIT)</a></div><div class="lev2 toc-item"><a href="#Profiling-Julia-code" data-toc-modified-id="Profiling-Julia-code-16"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Profiling Julia code</a></div><div class="lev2 toc-item"><a href="#Memory-profiling" data-toc-modified-id="Memory-profiling-17"><span class="toc-item-num">1.7&nbsp;&nbsp;</span>Memory profiling</a></div><div class="lev2 toc-item"><a href="#Type-stability" data-toc-modified-id="Type-stability-18"><span class="toc-item-num">1.8&nbsp;&nbsp;</span>Type stability</a></div>

# Julia: Functions, Type System, Multiple Dispatch, JIT, and Profiling

In this lecture, we try to understand why Julia is fast. 

Machine information

In [None]:
versioninfo()

## Control flow and loops

Building blocks of a function:

* if-elseif-else-end
```julia
if condition1
    # do something
elseif condition2
    # do something
else
    # do something
end
```

* `for` loop
```julia
for i in 1:10
    println(i)
end
```

* Nested `for` loop:
```julia
for i in 1:10
    for j in 1:5
        println(i * j)
    end
end
```
Same as
```julia
for i in 1:10, j in 1:5
    println(i * j)
end
```

* Break loop:
```julia
for i in 1:10
    # do something
    if condition1
        break # skip remaining loop
    end
end
```

* Exit iteration:  
```julia
for i in 1:10
    # do something
    if condition1
        continue # skip to next iteration
    end
    # do something
end
```

## Functions 

* Function definition
```julia
function func(req1, req2; key1=dflt1, key2=dflt2)
    # do stuff
    return out1, out2, out3
end
```
**Required arguments** are separated with a comma and use the positional notation.  
**Optional arguments** need a default value in the signature.  
**Semicolon** is not required in function call.  
**return** statement is optional.  
Multiple outputs can be returned as a **tuple**, e.g., `return out1, out2, out3`.  

* Function names ending with `!` indicates that function mutates at least one argument, typically the first.
```julia
sort!(x) # vs sort(x)
```

* In Julia, all arguments to functions are **passed by reference**, in contrast to R and Matlab.

* Anonymous functions, e.g., `x -> x^2`, is commonly used in collection function or list comprehensions.
```julia
map(x -> x^2, y) # square each element in x
```

* Functions can be nested:
```julia
function outerfunction()
    # do some outer stuff
    function innerfunction()
        # do inner stuff
        # can access prior outer definitions
    end
    # do more outer stuff
end
```

* Functions can be vectorized using the **dot syntax**:

In [None]:
function myfunc(x)
    return sin(x^2)
end

x = randn(5, 3)
myfunc.(x)

Multiple dot operations are fused into a single loop:

In [None]:
myfunc.(x .+ 1)

In [None]:
using BenchmarkTools

# allocate new array for z
@benchmark z = myfunc.(x .+ 1) # sin((x + 1)^2)

In [None]:
# pre-allocate array of same type and size as x
z = similar(x)
# use same z
@benchmark z .= myfunc.(x .+ 1) # sin(x^2) + 1

* **Collection function** (think this as the `apply` series in R).

    Apply a function to each element of a collection:
```julia
map(f, coll) # or
map(coll) do elem
    # do stuff with elem
    # must contain return
end
```

In [None]:
map(x -> sin(x^2), x)

In [None]:
map(x) do elem
    elem = elem^2
    return sin(elem)
end

In [None]:
# Mapreduce
mapreduce(x -> sin(x^2), +, x)

In [None]:
# same as
sum(x -> sin(x^2), x)

* List **comprehension**

In [None]:
[sin(2i + j) for i in 1:5, j in 1:3] # similar to Python

## Type system

* When thinking about types, think about sets.

* Everything is a subtype of the abstract type `Any`.

* An abstract type defines a set of types
    - Consider types in Julia that are a `Number`:

<img src="tree.png" width="600" align="center"/>

* You can explore type hierarchy with `typeof()`, `supertype()`, and `subtypes()`.

In [None]:
typeof(1.0), typeof(1)

In [None]:
supertype(Float64)

In [None]:
subtypes(AbstractFloat)

In [None]:
# Is Float64 a subtype of AbstractFloat?
Float64 <: AbstractFloat

In [None]:
# On 64bit machine, Int == Int64
Int == Int64

In [None]:
convert(Float64, 1) # same as Float64(1)

In [None]:
x = randn(Float32, 5) # vector of 5 single precision numbers

In [None]:
convert(Vector{Float64}, x) # same as Float64.(x)

In [None]:
convert(Int, 1.0) # exact conversion

In [None]:
convert(Int, 1.5) # should use round(1.5)

In [None]:
round(Int, 1.5)

## Multiple dispatch

* Multiple dispatch lies in the core of Julia design. It allows built-in and user-defined functions to be overloaded for different combinations of argument types.

* Let's consider a simple "doubling" function:

In [None]:
g(x) = x + x

In [None]:
g(1.5)

This definition is too broad, since some things can't be added 

In [None]:
g("hello world")

* This definition is correct but too restrictive, since any `Number` can be added.

In [None]:
g(x::Float64) = x + x

* This will automatically work on the entire type tree above!

In [None]:
g(x::Number) = x + x

This is a lot nicer than 
```julia
function g(x)
    if isa(x, Number)
        return x + x
    else
        throw(ArgumentError("x should be a number"))
    end
end
```

* `methods(func)` function display all methods defined for `func`.

In [None]:
methods(g)

* `@which func(x)` marco tells which method is being used for argument signature `x`.

In [None]:
x = 1
typeof(x)

In [None]:
g(x)

In [None]:
@which g(x)

In [None]:
x = randn(5)
@which g(x)

In [None]:
g(x)

## Just-in-time compilation (JIT)

Following figures and some examples are taken from Arch D. Robinson's slides [Introduction to Writing High Performance Julia](https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxibG9uem9uaWNzfGd4OjMwZjI2YTYzNDNmY2UzMmE).

| <img src="./julia_toolchain.png" alt="Julia toolchain" style="width: 400px;"/> | <img src="./julia_introspect.png" alt="Julia toolchain" style="width: 500px;"/> |
|----------------------------------|------------------------------------|
|||

* `Julia`'s efficiency results from its capabilities to infer the types of **all** variables within a function and then call LLVM to generate optimized machine code at run-time. 

In [None]:
workspace() # clear previous definition of g
g(x::Number) = x + x

This function will work on **any** type which has a method for `+`.

In [None]:
@show g(2)
@show g(2.0);

This is the [abstract syntax tree (AST)](https://en.wikipedia.org/wiki/Abstract_syntax_tree).

In [None]:
@code_lowered g(2)

Type inference:

In [None]:
@code_warntype g(2)

In [None]:
@code_warntype g(2.0)

Peek at the compiled **LLVM bitcode** with `@code_llvm`

In [None]:
@code_llvm g(2)

In [None]:
@code_llvm g(2.0)

We didn't provide a type annotation. But different LLVM code is generated according to the argument type!

* In R or Python, `g(2)` and `g(2.0)` would use the same code for both.
 
* In Julia, `g(2)` and `g(2.0)` dispatches to optimized code for `Int64` and `Float64`, respectively.

* For integer input `x`, LLVM compiler is smart enough to know `x + x` is shifting `x` by 1 bit, which is faster than addition.
 
Lowest level is the **assembly code**, which is machine dependent.

In [None]:
@code_native g(2)

In [None]:
@code_native g(2.0)

## Profiling Julia code

Julia has several built-in tools for profiling. Let's go through an example function `tally`, which sums all elements in a vector.

In [None]:
function tally(x)
    s = 0
    for v in x
        s += v
    end
    s
end

The `@time` marco outputs run time and heap allocation.

In [None]:
srand(123)
a = rand(10000)
@time tally(a) # first run: include compile time

In [None]:
@time tally(a)

For more robust benchmarking, the [BenchmarkTools.jl](https://github.com/JuliaCI/BenchmarkTools.jl) package is highly recommended.

In [None]:
using BenchmarkTools

@benchmark tally(a)

We see the memory allocation (468.75 KiB, average 10.73% GC) is suspiciously high.

The `Profile` module gives line by line profile results.

In [None]:
srand(123)
a = rand(10_000_000) # larger problem
Profile.clear()
@profile tally(a)
Profile.print(format=:flat)

One can use [`ProfileView`](https://github.com/timholy/ProfileView.jl) package for better visualization of profile data:

```julia
using ProfileView

ProfileView.view()
```

In [None]:
@code_warntype tally(a)

## Memory profiling

Detailed memory profiling requires a detour. First let's write a script [`bar.jl`](./bar.jl), which contains the workload function `tally` and a wrapper for profiling.

In [None]:
;cat bar.jl

Next, in terminal, we run the script with `--track-allocation=user` option.

In [None]:
;julia --track-allocation=user bar.jl

The profiler outputs a file `bar.jl.mem`.

In [None]:
;cat bar.jl.mem

We see line 4 is allocating suspicious amount of heap memory. 

## Type stability

The key to writing performant Julia code is to be [**type stable**](https://docs.julialang.org/en/stable/manual/performance-tips/#Write-"type-stable"-functions-1), such that `Julia` is able to infer types of all variables and output of a function from the types of input arguments. 

Is the `tally` function type stable? How to diagnose and fix it?

In [None]:
@code_warntype tally(rand(100))

In this case, Julia fails to infer the type of the reduction variable `s`, which has to be **boxed** in heap memory at run time.

<img src="https://www.codeproject.com/KB/dotnet/6importentStepsDotNet/14.jpg" width="400" align="center"/>

<img src="https://i-msdn.sec.s-msft.com/dynimg/IC97798.jpeg" width="300" align="center"/>

This is the generated LLVM bitcode, which is unsually long and contains lots of _box_:

In [None]:
@code_llvm tally(rand(100))

What's the fix?

In [None]:
function tally2(x)
    s = zero(eltype(x))
    for v in x
        s += v
    end
    s
end

In [None]:
@benchmark tally2(a)

Much shorter LLVM bitcode:

In [None]:
@code_llvm tally2(a)

Let's add further performance boost by `@simd`

In [None]:
function tally3(x)
    s = zero(eltype(x))
    @simd for v in x
        s += v
    end
    s
end

In [None]:
@benchmark tally3(a)