# Intro to Julia

This notebook is an introduction to the Julia language and its commonly used IJulia/Jupyter notebook interface. It is based on:
- material developed by Miles Lubin and Sebastien Martin of the MIT Operations Research Center
- Arthur Delarue's COS 2018 notebook (https://github.com/PhilChodrow/cos_2018/tree/master/6_julia_and_jump)
- online open-source material (http://ucidatascienceinitiative.github.io/IntroToJulia/Html/WhyJulia)
- Christopher Rackauckas' fantastic blogs (https://www.stochasticlifestyle.com/7-julia-gotchas-handle/)

## Why Julia?

A **high-level** language:

Easy to use and learn, with a similar syntax to Python/Matlab.
It is possible to do complicated computations quickly.
For example, Solving $Ax = b$ with $A = \begin{pmatrix}
 1 & 2 &  3\\ 
 2 & 1 & 2\\ 
 3 & 2 & 1
\end{pmatrix}$ and $b = \begin{pmatrix}
 1 \\ 
 1 \\ 
 1 
\end{pmatrix}$ is as simple as:

In [2]:
A = [1 2 3
     2 1 2
     3 2 1]

b = [1,1,1]
A \ b

3-element Vector{Float64}:
 0.25
 0.0
 0.25000000000000006

A **dynamic** language:

Julia is, like Python, Matlab or R, a dynamic language: you can interact with the language without the need to compile your code. Static or compiled languages, like C or Fortran, are more complicated to use but generally faster, and thus used when there is a need for time-efficient computations.

Two-language approach: use high level languages for research and scripting, then translate the final result into a static language for performance.

A **high-performance** language:

Julia is fast. Thanks to multiple dispatch, a strong type system, and just-in-time compilation, it can reach performance comparable to C and Fortran.


![figures/Julia-benchmarks.png](figures/Julia-benchmarks.png)

A language for **technical computing**:

- Julia has a lot of built in functions for scientific computing.
- A growing number of packages, mostly written in Julia itself.
- More and more users in Finance, Biology, Optimization.
- Can run C and Python code seamlessly (using Scikit for Machine Learning...)

In this notebook we will try to cover some gotchas and good workflow practices to get the most out of Julia. 

## Warm-up exercise

Complete `find_entering_var` below, which returns the minimum reduced cost and index of the entering variable (with the minimum reduced cost), inside an iteration of the simplex method. 

If no variable has negative reduced cost, we will simply return zeros for `min_rc` and `min_idx`. If multiple variables have the lowest reduced cost, we will return the last of these.

Remember the vector of reduced costs is given by:
$$
rc = c_N - A'\pi
$$
and the $i^{th}$ reduced cost is
$$
rc_i = c_i - A_i' \pi
$$
where $A_i$ is the $i^{th}$ column of $A$.

You might find it useful to use the function `dot` from the `LinearAlgebra` library.

In [3]:
using LinearAlgebra
dot([1, 2], [3, 4])

11

In [4]:
function find_entering_var(A::Matrix{Float64}, c::Vector{Float64}, pi::Vector{Float64}, var_status::Vector{Int})
    min_rc = 0
    min_idx = 0
    for k in eachindex(var_status)
        # only check nonbasic variables
        if iszero(var_status[k])
            rc = c[k] - dot(A[:, k], pi)
            if rc < min_rc
                min_rc = rc
                min_idx = k
            end
        end
    end
    return (min_rc, min_idx)
end

find_entering_var (generic function with 1 method)

In [5]:
# test your function by running this cell

using Random
function make_data(T::Type)
Random.seed!(1)
    basic_idxs = [2, 4, 6]
    A = T[3 2 1 2 1 0 0; 1 1 1 1 0 1 0; 4 3 3 4 0 0 1]
    B = A[:, basic_idxs]
    B_inv = inv(B) # note this would never happen inside the algorithm, we always have B_inv available
    b = T[225, 117, 420]
    c = -T[19, 13, 12, 17, 0, 0, 0]
    c_b = c[basic_idxs]
    x_b = B_inv * b
    var_status = [0, 1, 0, 2, 0, 3]
    pi = B_inv' * c_b
    return (A, b, c, B_inv, pi, var_status, basic_idxs)
end
(A, b, c, B_inv, pi, var_status, basic_idxs) = make_data(Float64)

find_entering_var(A, c, pi, var_status) # should be (-1.5, 1)

(-1.5, 1)

## A bit more on why Julia is fast

In [6]:
# When we write a function, it can have many "methods"
+(1, 2)

3

In [7]:
methods(+)

In [8]:
function my_function(x)
    println("Default output")
end

function my_function(x::Int) # only called when x is an integer
    println("You gave me an integer!")
end

methods(my_function)

In [9]:
my_function(1.0)
my_function(1)
my_function("ORC")

Default output
You gave me an integer!
Default output


In [10]:
# you can check which method will be dispatched to with @which
@which +(1, 2)

We could have made `find_entering_var` more general e.g. we could have a simplex running in rational numbers, arbitrary precision, etc.!

In [11]:
# E.g. 
1 // 2 # fraction in Julia
typeof(1 // 2)

Rational{Int64}

In [13]:
a = BigFloat(1.24)

1.2399999999999999911182158029987476766109466552734375

In [15]:
function find_entering_var(A::Matrix{T}, c::Vector{T}, pi::Vector{T}, var_status::Vector{Int}) where {T <: Real}
    min_rc = 0
    min_idx = 0
    for k in eachindex(var_status)
        # only check nonbasic variables
        if iszero(var_status[k])         
            rc = c[k] - dot(A[:, k], pi)
            if rc < min_rc
                min_rc = rc
                min_idx = k
            end
        end
    end
    return (min_rc, min_idx)
end

find_entering_var (generic function with 2 methods)

In [16]:
# let's generate some rational data
(A, b, c, B_inv, pi, var_status, basic_idxs) = make_data(Rational{Int})
@show A
@show b
@show c
;

A = Rational{Int64}[3//1 2//1 1//1 2//1 1//1 0//1 0//1; 1//1 1//1 1//1 1//1 0//1 1//1 0//1; 4//1 3//1 3//1 4//1 0//1 0//1 1//1]
b = Rational{Int64}[225//1, 117//1, 420//1]
c = Rational{Int64}[-19//1, -13//1, -12//1, -17//1, 0//1, 0//1, 0//1]


In [17]:
# test that this "just works" by running this cell
(min_rc, min_idx) = find_entering_var(A, c, pi, var_status) # should be (-3//2, 1)

(-3//2, 1)

If we have type-stability (more later), Julia can infer the types of all variables in our functions based on the types of its arguments. Using this information the appropriate methods are precompiled. 

Implication: If the types of all variables are fixed, our code runs "like" that of a static language, every time we run a method, some precompiled code is executed for the types we are using. 

Let's make this more concrete.

In [18]:
@code_llvm 2 * 5

[90m;  @ int.jl:88 within `*'[39m
[90m; Function Attrs: uwtable[39m
[95mdefine[39m [36mi64[39m [93m@"julia_[39m[0m*[0m_2472"[33m([39m[36mi64[39m [0m%0[0m, [36mi64[39m [0m%1[33m)[39m [0m#0 [33m{[39m
[91mtop:[39m
  [0m%2 [0m= [96m[1mmul[22m[39m [36mi64[39m [0m%1[0m, [0m%0
  [96m[1mret[22m[39m [36mi64[39m [0m%2
[33m}[39m


In [19]:
@code_llvm 2.0 * 5.0

[90m;  @ float.jl:332 within `*'[39m
[90m; Function Attrs: uwtable[39m
[95mdefine[39m [36mdouble[39m [93m@"julia_[39m[0m*[0m_2497"[33m([39m[36mdouble[39m [0m%0[0m, [36mdouble[39m [0m%1[33m)[39m [0m#0 [33m{[39m
[91mtop:[39m
  [0m%2 [0m= [96m[1mfmul[22m[39m [36mdouble[39m [0m%0[0m, [0m%1
  [96m[1mret[22m[39m [36mdouble[39m [0m%2
[33m}[39m


In [20]:
function count_up(n)
    count = 0.0
    for i in 1:n
        count += sin(1.0) + cos(1.0) + tan(1.0)
    end
    return count
end
println("First use: slower")
@time count_up(100) 
println("Second use: compiled and optimized automatically")
@time count_up(100);
println("Third use: compiled and optimized automatically")
@time count_up(100);

First use: slower
  0.000003 seconds
Second use: compiled and optimized automatically
  0.000002 seconds


Aside: always run twice when using @time

### What is type stability?

In [22]:
@code_warntype find_entering_var(A, c, pi, var_status)

Variables
  #self#[36m::Core.Const(find_entering_var)[39m
  A[36m::Matrix{Rational{Int64}}[39m
  c[36m::Vector{Rational{Int64}}[39m
  pi[36m::Vector{Rational{Int64}}[39m
  var_status[36m::Vector{Int64}[39m
  @_6[33m[1m::Union{Nothing, Tuple{Int64, Int64}}[22m[39m
  min_idx[36m::Int64[39m
  min_rc[91m[1m::Union{Rational{Int64}, Int64}[22m[39m
  k[36m::Int64[39m
  rc[36m::Rational{Int64}[39m

Body[91m[1m::Tuple{Union{Rational{Int64}, Int64}, Int64}[22m[39m
[90m1 ─[39m       (min_rc = 0)
[90m│  [39m       (min_idx = 0)
[90m│  [39m %3  = Main.eachindex(var_status)[36m::Base.OneTo{Int64}[39m
[90m│  [39m       (@_6 = Base.iterate(%3))
[90m│  [39m %5  = (@_6 === nothing)[36m::Bool[39m
[90m│  [39m %6  = Base.not_int(%5)[36m::Bool[39m
[90m└──[39m       goto #7 if not %6
[90m2 ┄[39m       Core.NewvarNode(:(rc))
[90m│  [39m %9  = @_6::Tuple{Int64, Int64}[36m::Tuple{Int64, Int64}[39m
[90m│  [39m       (k = Core.getfield(%9, 1))
[90m│  [39m 

We see that `A`, `c`, and `pi` are rational, but the type of `min_rc` is not necessarily fixed throughout our function.

Let's try to fix our function to get rid of the red.

In [23]:
function find_entering_var(A::Matrix{T}, c::Vector{T}, pi::Vector{T}, var_status::Vector{Int}) where {T <: Real}
    min_rc = zero(T) # <----------------
    min_idx = 0
    for k in eachindex(var_status)
        # only check nonbasic variables
        if iszero(var_status[k])
            rc = c[k] - dot(A[:, k], pi)
            if rc < min_rc
                min_rc = rc
                min_idx = k
            end
        end
    end
    return (min_rc, min_idx)
end

@code_warntype find_entering_var(A, c, pi, var_status)

Variables
  #self#[36m::Core.Const(find_entering_var)[39m
  A[36m::Matrix{Rational{Int64}}[39m
  c[36m::Vector{Rational{Int64}}[39m
  pi[36m::Vector{Rational{Int64}}[39m
  var_status[36m::Vector{Int64}[39m
  @_6[33m[1m::Union{Nothing, Tuple{Int64, Int64}}[22m[39m
  min_idx[36m::Int64[39m
  min_rc[36m::Rational{Int64}[39m
  k[36m::Int64[39m
  rc[36m::Rational{Int64}[39m

Body[36m::Tuple{Rational{Int64}, Int64}[39m
[90m1 ─[39m       (min_rc = Main.zero($(Expr(:static_parameter, 1))))
[90m│  [39m       (min_idx = 0)
[90m│  [39m %3  = Main.eachindex(var_status)[36m::Base.OneTo{Int64}[39m
[90m│  [39m       (@_6 = Base.iterate(%3))
[90m│  [39m %5  = (@_6 === nothing)[36m::Bool[39m
[90m│  [39m %6  = Base.not_int(%5)[36m::Bool[39m
[90m└──[39m       goto #7 if not %6
[90m2 ┄[39m       Core.NewvarNode(:(rc))
[90m│  [39m %9  = @_6::Tuple{Int64, Int64}[36m::Tuple{Int64, Int64}[39m
[90m│  [39m       (k = Core.getfield(%9, 1))
[90m│  [39m %11 = C

Other useful functions include `eltype()` or `one()`.

## Exercise
Complete the function `find_leaving_var` to return `(min_ratio, min_idx)`. I.e. the minimum and the minimizer of:
$$
\min_{k: e_k' B^{-1} A_i > 0} \frac{e_k' B^{-1}b}{e_k' B^{-1} A_i}
$$
If $ e_k' B^{-1} A_i \leq 0 $ for all $k$, return (0, Inf). Assume you are provided the vectors `B_inv_A_i = B \ A_i`, and `x_b = B \ b`, as well as a list of basic indices as input.

Test for correctness and type stability by running the box below. 

In [24]:
# hint:
@show typeof(Inf)
@show typeof(Float64(Inf))
@show typeof(Rational{Int}(Inf))

typeof(Inf) = Float64
typeof(Float64(Inf)) = Float64
typeof(Rational{Int}(Inf)) = Rational{Int64}


Rational{Int64}

In [25]:
function find_leaving_var(x_b::Vector{T}, B_inv_A_i::Vector{T}, basic_idxs::Vector{Int}) where {T <: Real}
    min_ratio = T(Inf)
    min_idx = 0
    for k in eachindex(B_inv_A_i)
        if B_inv_A_i[k] > 0
            ratio = x_b[k] / B_inv_A_i[k]
            if ratio < min_ratio
                min_ratio = ratio
                min_idx = k
            end
        end
    end
    return (min_ratio, min_idx)
end

find_leaving_var (generic function with 1 method)

In [26]:
# use our data and entering variable from before
(A, b, c, B_inv, pi, var_status, basic_idxs) = make_data(Float64)
(_, entering_idx) = find_entering_var(A, c, pi, var_status)
x_b = B_inv * b
B_inv_A_i = B_inv * A[:, entering_idx]


(min_ratio, leaving_idx) = find_leaving_var(x_b, B_inv_A_i, basic_idxs) # should be (14.999999999999993, 1)

(14.999999999999993, 1)

In [27]:
@code_warntype find_leaving_var(x_b, B_inv_A_i, basic_idxs)

Variables
  #self#[36m::Core.Const(find_leaving_var)[39m
  x_b[36m::Vector{Float64}[39m
  B_inv_A_i[36m::Vector{Float64}[39m
  basic_idxs[36m::Vector{Int64}[39m
  @_5[33m[1m::Union{Nothing, Tuple{Int64, Int64}}[22m[39m
  min_idx[36m::Int64[39m
  min_ratio[36m::Float64[39m
  k[36m::Int64[39m
  ratio[36m::Float64[39m

Body[36m::Tuple{Float64, Int64}[39m
[90m1 ─[39m       (min_ratio = ($(Expr(:static_parameter, 1)))(Main.Inf))
[90m│  [39m       (min_idx = 0)
[90m│  [39m %3  = Main.eachindex(B_inv_A_i)[36m::Base.OneTo{Int64}[39m
[90m│  [39m       (@_5 = Base.iterate(%3))
[90m│  [39m %5  = (@_5 === nothing)[36m::Bool[39m
[90m│  [39m %6  = Base.not_int(%5)[36m::Bool[39m
[90m└──[39m       goto #7 if not %6
[90m2 ┄[39m       Core.NewvarNode(:(ratio))
[90m│  [39m %9  = @_5::Tuple{Int64, Int64}[36m::Tuple{Int64, Int64}[39m
[90m│  [39m       (k = Core.getfield(%9, 1))
[90m│  [39m %11 = Core.getfield(%9, 2)[36m::Int64[39m
[90m│  [39m %12 = Ba

### Using the REPL

Working in the REPL doesn't allow type specification:

In [28]:
a = 3
f() = a
@code_warntype f()

Variables
  #self#[36m::Core.Const(f)[39m

Body[91m[1m::Any[22m[39m
[90m1 ─[39m     return Main.a


In [29]:
# wrap code inside functions/modules when possible
function foo()
    a = 3
    f() = a
    return f()
end
@code_warntype foo()

Variables
  #self#[36m::Core.Const(foo)[39m
  f[36m::var"#f#1"{Int64}[39m
  a[36m::Int64[39m

Body[36m::Int64[39m
[90m1 ─[39m      (a = 3)
[90m│  [39m %2 = Main.:(var"#f#1")[36m::Core.Const(var"#f#1")[39m
[90m│  [39m %3 = Core.typeof(a::Core.Const(3))[36m::Core.Const(Int64)[39m
[90m│  [39m %4 = Core.apply_type(%2, %3)[36m::Core.Const(var"#f#1"{Int64})[39m
[90m│  [39m      (f = %new(%4, a::Core.Const(3)))
[90m│  [39m %6 = (f::Core.Const(var"#f#1"{Int64}(3)))()[36m::Core.Const(3)[39m
[90m└──[39m      return %6


In [30]:
const my_const = 3
f() = my_const
@code_warntype f()

Variables
  #self#[36m::Core.Const(f)[39m

Body[36m::Int64[39m
[90m1 ─[39m     return Main.my_const


## Matrices and views

In [31]:
A = [3 5; 8 12]
@show A
B = A
B[1, 1] = 1
@show A
@show B

A = [3 5; 8 12]
A = [1 5; 8 12]
B = [1 5; 8 12]


2×2 Matrix{Int64}:
 1   5
 8  12

In [36]:
using BenchmarkTools
A = randn(100, 100); # random 100x100 matrix with elements from N(0,1)
@benchmark B = A # set the value of B to point to the same location as A

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.400 ns (0.00% GC)
  median time:      1.400 ns (0.00% GC)
  mean time:        1.500 ns (0.00% GC)
  maximum time:     59.300 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

In [35]:
B = A[1:2, 44:67]
@show B[1, 1]
@show A[1, 44]
B[1, 1] = 55
@show B[1, 1]
@show A[1, 44]

B[1, 1] = 0.16312268571920324
A[1, 44] = 0.16312268571920324
B[1, 1] = 55.0
A[1, 44] = 0.16312268571920324


0.16312268571920324

In [38]:
@benchmark B = A[1:2, 44:67]

BenchmarkTools.Trial: 
  memory estimate:  496 bytes
  allocs estimate:  1
  --------------
  minimum time:     120.678 ns (0.00% GC)
  median time:      149.289 ns (0.00% GC)
  mean time:        202.459 ns (3.82% GC)
  maximum time:     3.780 μs (87.17% GC)
  --------------
  samples:          10000
  evals/sample:     914

If we want to operate on `A[1:2, 44:67]`, using `A[1:2, 44:67]` creates a copy of the indexed elements in `A`. To avoid expensive copying, we can use the convenient `@views` macro.

In [40]:
@views B = A[1:2, 44:67]
@show B[1, 1]
@show A[1, 44]
B[1, 1] = 55
@show B[1, 1]
@show A[1, 44]

B[1, 1] = -0.4352571634371558
A[1, 44] = -0.4352571634371558
B[1, 1] = 55.0
A[1, 44] = 55.0


55.0

### Operations on arrays

In [41]:
f(x) = 2x

f (generic function with 2 methods)

In [42]:
using Random
Random.seed!(1)
n = 500
my_vec = randn(n);

In [46]:
@benchmark vec2 = f(my_vec)

BenchmarkTools.Trial: 
  memory estimate:  4.06 KiB
  allocs estimate:  1
  --------------
  minimum time:     390.090 ns (0.00% GC)
  median time:      642.793 ns (0.00% GC)
  mean time:        1.148 μs (21.48% GC)
  maximum time:     44.353 μs (93.86% GC)
  --------------
  samples:          10000
  evals/sample:     222

In [47]:
@benchmark vec2 = f.(my_vec)

BenchmarkTools.Trial: 
  memory estimate:  4.12 KiB
  allocs estimate:  4
  --------------
  minimum time:     675.862 ns (0.00% GC)
  median time:      934.483 ns (0.00% GC)
  mean time:        1.324 μs (16.42% GC)
  maximum time:     50.381 μs (93.62% GC)
  --------------
  samples:          10000
  evals/sample:     145

In [45]:
@benchmark vec2 .= f.(my_vec)

BenchmarkTools.Trial: 
  memory estimate:  32 bytes
  allocs estimate:  2
  --------------
  minimum time:     332.917 ns (0.00% GC)
  median time:      372.917 ns (0.00% GC)
  mean time:        442.257 ns (0.13% GC)
  maximum time:     6.270 μs (93.49% GC)
  --------------
  samples:          10000
  evals/sample:     240

In [48]:
@benchmark @. vec2 = f(my_vec)

BenchmarkTools.Trial: 
  memory estimate:  32 bytes
  allocs estimate:  2
  --------------
  minimum time:     316.228 ns (0.00% GC)
  median time:      348.684 ns (0.00% GC)
  mean time:        383.807 ns (0.17% GC)
  maximum time:     6.810 μs (93.73% GC)
  --------------
  samples:          10000
  evals/sample:     228

In [49]:
(A, b, c, B_inv, pi, var_status, basic_idxs) = make_data(Float64)
@benchmark find_entering_var(A, c, pi, var_status)

BenchmarkTools.Trial: 
  memory estimate:  384 bytes
  allocs estimate:  5
  --------------
  minimum time:     209.057 ns (0.00% GC)
  median time:      235.472 ns (0.00% GC)
  mean time:        307.554 ns (5.56% GC)
  maximum time:     7.222 μs (95.20% GC)
  --------------
  samples:          10000
  evals/sample:     530

In [50]:
function find_entering_var2(A::Matrix{Float64}, c::Vector{Float64}, pi::Vector{Float64}, var_status::Vector{Int})
    min_rc = 0.0
    min_idx = 0
    for k in eachindex(var_status)
        # only check nonbasic variables
        if iszero(var_status[k])
            @views rc = c[k] - dot(A[:, k], pi)
            if rc < min_rc
                min_rc = rc
                min_idx = k
            end
        end
    end
    return (min_rc, min_idx)
end
@benchmark find_entering_var2($A, $c, $pi, $var_status)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     58.410 ns (0.00% GC)
  median time:      62.283 ns (0.00% GC)
  mean time:        76.597 ns (0.00% GC)
  maximum time:     303.262 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     981