# <center>Julia Workshop </center> 

#### <center> _Alan Crawford, June 2016_</center>

# Roadmap

1. Introduction to Julia: what is Julia and why should i use it?
2. Getting started and setting up Julia
3. The basics of Julia
4. Working with packages 
5. Developing your own code
6. Julia for Economists

# 1. Introduction to Julia

* What is Julia?
* Is Julia for me?


## What is Julia?

[Julia](http://julialang.org/) is a (relatively) new programming language aimed at the scientific computing community. It is a fast, open source, high productivity programming language. 

From the authors of Julia - [Why we created Julia](http://julialang.org/blog/2012/02/why-we-created-julia)

> _"We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell."_

# Is Julia for me?

For an interesting [overview](http://quant-econ.net/about_lectures.html) see the excellent [quant-econ.net](http://quant-econ.net/) project - a great learning resource for economists! Here are some of the key points...

### Advantages of Julia

[+] Open source and runs on all major platforms
* Easy to install and run code on different platforms
* No enforced migration to new versions (i.e. MATLAB)
* Libraries are open source and easy to share, access and adapt though [Github](https://github.com/)

[+] Julia features enable fast and flexible coding

* [Multiple dispatch](http://quant-econ.net/jl/types_methods.html)
* [Julia types](https://en.wikibooks.org/wiki/Introducing_Julia/Types)
* [Metaprogramming](https://en.wikibooks.org/wiki/Introducing_Julia/Metaprogramming)

[+] Great visualisation, data management, statistical and optimisation toolboxes actively maintained by Julia community 

[+] Easy to call other popular open source languages:

* [RCall](https://github.com/JuliaStats/RCall.jl): R -> Julia
* [PyCall](https://github.com/stevengj/PyCall.jl): Python -> Julia 
* [ccall](http://docs.julialang.org/en/release-0.4/manual/calling-c-and-fortran-code/?highlight=ccall): C/C++/Fortran to Julia

### Disadvantages of Julia

[-] Julia is a developing language, hence there are less mature 'pure' Julia libraries than in Python and R

[-] Backwards compatibility can be an issue, but there are [Julia programs to mitigate this](https://github.com/JuliaLang/Compat.jl) already

[-] By virtue of Julia being so young, it is less externally marketable skill than say R, MATLAB and Python... at the moment.

[-] The [documentation](http://docs.julialang.org/en/release-0.4/) is extensive, but not all that user friendly... but lots of useful resources out there to get you started

# The '2 language problem'?

Existing code development approaches suffer from the so-called '2 language problem': 

> Develop in a dynamically typed language (Python, R, MATLAB) and (partially) execute in in a statically typed language (C/C++ or Fortran). Typically, the low level statically typed code does the 'heavy lifting' and is spliced into the dynamic language using a wrapper.

* Issue with this existing approach:
    * If exact low level code doesn't already exist, you still need to be able code in low level language as well as dynamic language of choice
    * Duplicated effort to translate 'toy model' in dynamic language into statically typed language
    * Wrapper code can be cumbersome and unreliable and expected performance gains do not always materialise
    * Can get even more complicated to parallelise such code

* Is Julia the answer?
    - Dynamically typed, high productivity language: *minimise development time*
    - Near C-speed performance: *Low runtime*
    - If low level code exists, Julia has easy to use light-weight and reliable wrappers
    - Only need ever code it up once - no translation needed
    - Julia code is relatively easily to parallelize
    
... why not give a try and see!


# 2. Getting started with Julia

* Download Julia
* Setting up your Julia Environment
* What tools will help me be productive with Julia?

## Downloading Julia

- Go to the [Julia downloads page](http://julialang.org/downloads/) and download the command line binary matching corresponding to your operating system
   
- 2 download options:

    1. The current release: this is the latest 'stable' release of Julia 
    2. Nightly builds: This is the bleeding edge release of Julia - use with care...

Most likely you will want option 1 - at least to start out with.  
    
- If you have any installation problems check out [Platform specific download instructions](http://julialang.org/downloads/platform.html)

## Setting up your Julia Environment

### Juno

Many of you will be used to working with Integrated Development Environment (IDEs) in Stata, MATLAB, and RStudio. If this is your preferred way to work you may like to try out [Juno](http://junolab.org/)

![alt text](http://junolab.org/images/screenshot.png "Juno IDE")


### Terminal + Text Editor

Perhaps the most flexible way of working is to use a text editor together with the terminal.

#### The Terminal

Open Julia either at the command line or by double clicking the icon. You should see a Julia terminal:

![alt](JuliaTermShot.png)

#### Text Editor

Many text editor choices include VIM, Emacs, LightTable, Notepad++, Atom...

My personal preference:

* [Sublime Text](https://www.sublimetext.com/): Cross platform text editor. Once downloaded open Sublime Text. Then:
    1. Install `Package Control` in Sublime Text
    2. `Cmd+Shift+p` to open `Package Control`
        2a. Select `Install Packages`
    3. In the `Package Control: Install Packages` window type:
        * `Julia` then select the Julia package for text highlighting. 
        * `SendTextPlus` and install the package. This allows you to highlight text in Sublime Text and cmd+Enter sends it Terminal. If terminal is Julia REPL, this runs Julia code.

* When you open .jl files in Sublime Text you will have Julia highlighting and have an interactive development environment 

![alt](SublimeShot.png)

* Other .... 

### jupyter

Another good way to commumicate Julia code is using [Jupyter Notebook](http://jupyter.org/). 

## Other tools for use with Julia

* Julia package management is hosted by Github. Knowing a little Github will help a lot - especially developing your own packages/code

* You will find it very useful to know how to use the UNIX shell.
    * [Software Carpentry](https://www.ucl.ac.uk/isd/services/research-it/training/courses/software-carpentry-workshop) workshop at UCL
    * Use google! 
    
* Related to UNIX shell, it is also useful to get to know the basics of its text editor, [VI](http://ex-vi.sourceforge.net/)
    * Clunky, but convenient - especially when working on HPC cluster

# 3. The basics of Julia

Once you have Julia environment setup you might like to familiarise yourself with basic building blocks of coding in Julia.

* Learning the basics: Online resources and books
* Julia vs MATLAB: Additional features and operational differences
* The big difference: Julia Types

## Learning the basics: online resources and books

Here are two resources that provide a really nice introduction to the basics of Julia for beginners. 

**Quant-Econ.net** has an excellent introduction to Julia:
- [Julia Essentials](http://quant-econ.net/jl/julia_essentials.html)
- [Vector, Arrays and Matrices](http://quant-econ.net/jl/julia_arrays.html)
- [Types, Methods and Performance](http://quant-econ.net/jl/types_methods.html)

**'Introducing Julia'** [Wiki Book](https://en.wikibooks.org/wiki/Introducing_Julia)

You may also find [Getting started with Julia](https://www.packtpub.com/application-development/getting-started-julia) a useful resource. 

## Julia vs MATLAB

Julia looks like MATLAB... However, there are some additional features  

* Strings
* Dictionaries
* Comprehensions
* Metaprogramming

and important operational differences....

* Vectorisation is not necessary/desirable to write perfomant code - use loops. 
* Assignment and mutation

### Additional Features:

#### Strings

MATLAB always been a bit clunky with strings, however Julia works very nicely with strings

In [1]:
day = 2
year = 2016
date = "Today: $day June $year" # Strings and interpolation

"Today: 2 June 2016"

#### Dictionaries

Dictionaries can be useful ways to store and recall information. 

In [10]:
Customer = Dict("Name"=>"John Smith", "Product"=>"iPhone", "Price"=>600, "Network"=>"O2", "ContractMonths"=>12)

Dict{ASCIIString,Any} with 5 entries:
  "Product"        => "iPhone"
  "Network"        => "O2"
  "ContractMonths" => 12
  "Price"          => 600
  "Name"           => "John Smith"

In [11]:
Customer["Name"]

"John Smith"

As we will see later on, a vector of dictionary entries like this looks a lot like a Dataset in Stata or a DataFrame in R.

#### Comprehensions

Elegant way to combine formulas and loops to define multidimensional Arrays (and Arrays of Arrays)

In [19]:
[3i*j.^2 for i in 1:5, j in 1:5] # Create a Matrix using a Comprehension

5x5 Array{Int64,2}:
  3  12   27   48   75
  6  24   54   96  150
  9  36   81  144  225
 12  48  108  192  300
 15  60  135  240  375

In [20]:
[i*j*k for i in 1:3, j in 1:4, k in 1:2] # Create an 3D-Array using a Comprehensions

3x4x2 Array{Int64,3}:
[:, :, 1] =
 1  2  3   4
 2  4  6   8
 3  6  9  12

[:, :, 2] =
 2   4   6   8
 4   8  12  16
 6  12  18  24

In [22]:
myelement(i,j,k) = 3i*j + k # this is an inline function 
[myelement(i,j,k) for i in 1:3, j in 1:4, k in 1:2] # Combine function to define elements with Comprehension

3x4x2 Array{Int64,3}:
[:, :, 1] =
  4   7  10  13
  7  13  19  25
 10  19  28  37

[:, :, 2] =
  5   8  11  14
  8  14  20  26
 11  20  29  38

#### MetaProgramming

Julia works in two stages: (1) is parses the Julia code and (2) it executes the parsed Julia code. 

[MetaProgramming](http://docs.julialang.org/en/latest/manual/metaprogramming/) allows the Julia user to intervene after stage (1). 

* What is the big deal? Julia code can generate Julia code at stage (1) which is executed at stage (2).
* Why is this good? Can write succinct and clear code that avoids repetition of lines of coding

The most common example of metaprogramming you will encounter are **macros**. Macros in Julia are prefixed with `@` and appear at the start of the line of code. 

For example, there is a macro that times code execution:

In [32]:
@time for n in 1:100 
    x = [i*j*n for i in 1:10, j in 1:10] 
end 

  0.000111 seconds (100 allocations: 87.500 KB)


There are many macros available and you can write your own. Here are some existing you might find useful:

* [Base.Cartesian](http://julia.readthedocs.io/en/latest/devdocs/cartesian/?highlight=base.cartesian#module-Base.Cartesian) for multi-dimensional code
* [Generated functions](http://docs.julialang.org/en/latest/manual/metaprogramming/#generated-functions) allows the user to specify dimensions of functions through inputs

However, it is worth noting that [Julia's flexible iterators](http://julialang.org/blog/2016/02/iteration) may provide an alternative and more elegant way to achieve similar results. 

### Operational differences

#### No need to vectorize

One major criticism of MATLAB has been the need to vectorise code to improve performance. 

* Vectorization is not intuitive, slows development time
* Debugging vectorized code is not fun
* Translation within MATLAB

... although loops have got better over time.

However, Julia loves loops. In fact it prefers them! 

* Intuitive coding, speed up development time
* Julia has lots of tools to make coding in loops easier 
    * Arrays are iterable, 
    * Dictionaries are iterable
    * MetaProgramming can be used to design loops whose dimension is not known at run time
    * Dynamic building of arrays using `push!` and `append!` is very useful in practice


#### A gotcha: Assignment and mutation

Although Julia has MATLAB like syntax there are some important conceptual differences. In particular, the concepts of assignment and mutation in Julia lead to radically different output despite have similar input. 

Failing to recognise that unintentional mutation has happened is an easy mistake to make especially for a MATLAB convert... 

So what is assignment and mutation? The best way to illustrate is by example...

In [13]:
# Suppose we intend to make a copy of the vector, a. In MATLAB we might do something like this:
a = [1.:4.]
a1 = a

4-element Array{Float64,1}:
 1.0
 2.0
 3.0
 4.0

In [14]:
a[4] = 5. # Now suppose we change or 'mutate' the contents of the 4th element of a, a[4], to 5.

5.0

In [15]:
# Now look at a1
a1

4-element Array{Float64,1}:
 1.0
 2.0
 3.0
 5.0

We can see `a1[4]=5.0`, even though we changed `a[4] = 5.0`, not `a1`. What happened?!? Answer: nothing.

This is actually very sensible. Unlike MATLAB, Julia does NOT make a new vector `a1`. Instead, it creates a reference that points to the original vector `a` in memory. So if the contents of `a` change or 'mutate', then the contents of `a1` must also change since both variables are bound to the same object in memory. 

Why is this good? 
* Prevents rapid expansion of superfluous objects in memory - collecting up the garbage takes time and should be avoided! 
* Clear distinction between the objects, `a` and `a1` and their contents. This can actually help write efficient Julia code.

But, can i make a copy if i want to? Yes...

In [16]:
a2 = deepcopy(a) # deepcopy() creates an entirely independent copy of a. 
                 # It creates new contents and assigns a new variable binding `a2` to them

4-element Array{Float64,1}:
 1.0
 2.0
 3.0
 5.0

In [17]:
a[3] = 100 # Let's do a new test by mutating, a[3] = 100.

100

In [18]:
# Let's see what happened to the vectors [a, a1, a2]
hcat(a,a1,a2)

4x3 Array{Float64,2}:
   1.0    1.0  1.0
   2.0    2.0  2.0
 100.0  100.0  3.0
   5.0    5.0  5.0

As expected, a1 reflects the change a[3] but a2 remains unchanged. 

##### So what is assignment and what is mutation?

The best explanation is in this [post](http://stackoverflow.com/questions/33002572/creating-copies-in-julia-with-operator) ... here are the key points:

**Assignment** looks like `x = ...` – what's left of the `=` is an identifier, i.e. a variable name. Assignment changes which object the variable `x` refers to (this is called a variable binding). It does not mutate any objects at all.

**Mutation**. There are two typical ways to mutate something in Julia: 
* `x.f = ...` – what's left of the `=` is a field access expression;
* `x[i] = ...` – what's left of the `=` is an indexing expression. 
    
Key Take Aways:
* Understanding assignment and mutation can help you write very fast julia code (i.e. re-using variable bindings and mutating the contents can help minimise costly garbage collection)!
* However, beware of inadvertent assignment when coding and always run tests to make sure your code does what you expect it to!

## Julia Types

A major different between Julia and other languages is its Type system. It is helpful to think of two sets of types:

* Common or Primitive data types
* User defined types

### Primitive data types

Below is a diagram showing Julia's primitive type hierarchy

![alt](JuliaTypes.png)

In addition to the above you will also likely use:

* Strings (i.e. AbstractString, ASCIIString) - we already saw these above
* Bool types (i.e. true, false) - useful for setting options in functions

Any user of statically typed languages will be very familiar with primitive data types - these languages will not compile unless every object's type is specified. This can make developing code slow and not so much fun to debug! However, this is also one reason they are so fast...

In contrast, programs such as R, MATLAB and Python take care of types in the backend

[+] High producitivity - easy to develop in

[-] Process of checking types and turning program inputs into machine code significantly slows them down...

##### Is Julia really faster?

![alt+text](http://istc-bigdata.org/wp-content/uploads/2013/12/ISTC_Big_Data_Blog_Julia-Language-Benchmark-JChen-1024x558.png "Speed benchmarks")
__Source: [ISTC-Bigdata.org](http://istc-bigdata.org/wp-content/uploads/2013/12/ISTC_Big_Data_Blog_Julia-Language-Benchmark-JChen-1024x558.png)__ 

#### Developing in Julia: When to specify types 

To develop in Julia you do not *NEED* to specify primitive types.

* You can (and should) develop in Julia just as you would in MATLAB, R or Python without specifying types.
* Enables high-productivity development

Only once code is finalised might you add types to make the [Type stable](http://docs.julialang.org/en/release-0.4/manual/performance-tips/#write-type-stable-functions) code. This helps Julia compile fast machine code.

We will see an example in a few slides...

### User defined Types

A key part of Julia is that users can define their own types. This is perhaps best illustrated by an example.

Below we will make a user type for a user type called `poly` with three fields: `basis`, `coef`, and `value`.

In [44]:
type mypoly
    basis
    coef
    value
end

Now lets make an instance of poly:

In [45]:
B = 5*rand(10) # Basis Vector
θ = rand(10)   # Coefficients
P = mypoly(B,θ,dot(B,θ)) # Make an instance of Poly

mypoly([3.976971899565107,0.10260209767688155,0.7076203275488768,0.9233391418248682,4.3411191250057914,3.435688209021418,1.0806314554976437,4.907022692550345,1.1973525029889964,1.018414809995568],[0.6367916538215455,0.767010373695862,0.6784982748622173,0.4533470498135985,0.0010983604335135233,0.8003197421950976,0.8434286620164058,0.8454246458745913,0.39215733194269564,0.7599911355264124],12.56781910466461)

To access the fields of the type can use P.fieldname

In [48]:
P.basis

10-element Array{Float64,1}:
 3.97697 
 0.102602
 0.70762 
 0.923339
 4.34112 
 3.43569 
 1.08063 
 4.90702 
 1.19735 
 1.01841 

In [50]:
P.coef

10-element Array{Float64,1}:
 0.636792  
 0.76701   
 0.678498  
 0.453347  
 0.00109836
 0.80032   
 0.843429  
 0.845425  
 0.392157  
 0.759991  

As stated above the `poly` could allow any type. However, if we want to help Julia compile efficient code we might give it more information by making a new **type stable** version of `poly` called `poly2`. 


```julia
type poly2
    basis :: Vector{Float64}
    coef  :: Vector{Float64}
    value :: Float64
end
```

where `::` can be taken to mean 'is an instance of'. 

Above I suggested that making code type stable could speeds the code up ... Let's test:

In [121]:
# Define a new more tightly parameterised type
type poly2
    basis :: Vector{Float64}
    coef  :: Vector{Float64}
    value :: Float64
end

# Function to wrap loop to calculate N instances of times D-dimensional poly2
function calcpoly(D::Int64,N::Int64)
    B = rand(D)
    θ = rand(D)
    for n = 1:N
        fill!(B,rand())
        fill!(θ,rand())
        poly(B,θ,dot(B,θ))
    end
end

# Function to wrap loop to calculate N instances of times D-dimensional poly2
function calcpoly2(D::Int64,N::Int64)
    B = rand(D)
    θ = rand(D)
    for n = 1:N
        fill!(B,rand())
        fill!(θ,rand())
        poly2(B,θ,dot(B,θ))
    end
end

# Before benchmarking ALWAYS compile the code...
calcpoly(10,1)
calcpoly2(10,1)

In [141]:
Pkg.add("BenchmarkTools") # Download a new package that is more accurate the @time macro, more on packages in a few slides

In [143]:
using BenchmarkTools # Load BenchmarkTools package into scope
@benchmark calcpoly(10,1000)

     Time per evaluation: 58.92 μs [56.56 μs, 61.28 μs]
Proportion of time in GC: 3.18% [2.04%, 4.33%]
        Memory allocated: 47.19 kb
   Number of allocations: 2004 allocations
       Number of samples: 4501
   Number of evaluations: 146001
         R² of OLS model: 0.833
 Time spent benchmarking: 10.72 s


In [144]:
@benchmark calcpoly2(10,1000)

     Time per evaluation: 51.60 μs [50.05 μs, 53.15 μs]
Proportion of time in GC: 1.94% [1.09%, 2.78%]
        Memory allocated: 31.56 kb
   Number of allocations: 1004 allocations
       Number of samples: 4601
   Number of evaluations: 160601
         R² of OLS model: 0.897
 Time spent benchmarking: 10.54 s


#### What if specifying types leads to overly restrictive code, but I still want speed?

Answer: [Multiple dispatch](http://docs.julialang.org/en/release-0.4/manual/methods/)! 

You can write multiple versions of type stable functions and Julia will figure out which version of the function it should 'dispatch' at runtime. 

# 4. Working with Julia: Packages

Julia uses packages (also called libraries) that provide the user with lots of great functionality without re-inventing the wheel. 

Like R, these packages and be installed, removed and loaded from within Julia.

If the package is part of [Julia Official Package](http://pkg.julialang.org) list can do Pkg.add("PackageName"), Otherwise can use Pkg.clone("GithubURL") or Fork it directly from Github page.

In [52]:
Pkg.add("GLM") # Add Generalized linear modelling package
Pkg.add("RDatasets") #  RDatasets

Now lets using the GLM package to do a regression using an example from [GLM Github page](https://github.com/JuliaStats/GLM.jl)

In [56]:
using GLM, RDatasets # Load the packages: Note GLM loads DataFrames for use with functions exported by GLM
LifeCycleSavings = dataset("datasets", "LifeCycleSavings") # Load the dataset from RDatasets

Unnamed: 0,Country,SR,Pop15,Pop75,DPI,DDPI
1,Australia,11.43,29.35,2.87,2329.68,2.87
2,Austria,12.07,23.32,4.41,1507.99,3.93
3,Belgium,13.17,23.8,4.43,2108.47,3.82
4,Bolivia,5.75,41.89,1.67,189.13,0.22
5,Brazil,12.88,42.19,0.83,728.47,4.56
6,Canada,8.79,31.72,2.85,2982.88,2.43
7,Chile,0.6,39.74,1.34,662.86,2.67
8,China,11.9,44.75,0.67,289.52,6.51
9,Colombia,4.98,46.64,1.06,276.65,3.08
10,Costa Rica,10.78,47.64,1.14,471.24,2.8


In [57]:
# Let's do the regression
fm = SR ~ Pop15 + Pop75 + DPI + DDPI
fm2 = fit(LinearModel, fm, LifeCycleSavings)

DataFrames.DataFrameRegressionModel{GLM.LinearModel{GLM.DensePredQR{Float64}},Float64}

Formula: SR ~ 1 + Pop15 + Pop75 + DPI + DDPI

Coefficients:
                 Estimate   Std.Error   t value Pr(>|t|)
(Intercept)       28.5661     7.35452   3.88416   0.0003
Pop15           -0.461193    0.144642  -3.18851   0.0026
Pop75             -1.6915      1.0836    -1.561   0.1255
DPI          -0.000336902 0.000931107 -0.361829   0.7192
DDPI             0.409695    0.196197   2.08818   0.0425


#### Managing packages

The Julia documentation has a nice section on [Package managment](http://docs.julialang.org/en/release-0.4/manual/packages/). Here are some highlights:

To check which packages you have installed:

```julia
Pkg.installed()
```

You may wish to check which version of installed packages:

```julia
Pkg.status()
```

You can also update packages to the most recent version:

```julia
Pkg.update()
```

If you want to remove a package, say GLM:

```julia
Pkg.rm("GLM")
```

If you are having a problem using a package ask the [community](https://groups.google.com/forum/#!forum/julia-users) and/or search the database.

# 5. Developing your own code and packages

To develop your own code you will need to familiarise yourself with [modules](http://docs.julialang.org/en/release-0.4/manual/modules/?highlight=modules). 

Here is a very simple example of a module that does Sparse Grid Integration called [SparseGridsHW](https://github.com/alancrawford/SparseGridsHW)

```julia
module SparseGridsHW

import Base.show # loads other packages that subsequent code depends ons

include("sparse_grid_hw.jl") # This calls julia code that does the integration and defines nwspge function

export nwspgr # This specifies which function in module i wish to use after loading the module

end
```

In general, you may find it useful to you might find it useful to draw on the examples of mature, existing packages in Julia (i.e. DataFrames.jl, GLM.jl, etc.). In particular, see how they:

* set up package folder (i.e. source code into `src`, test code in `tests`, include a README.md)
* write source code
* test code
* create documentation
* use Github

# 6. Using Julia with Economists:

- Specific projects in Julia very useful for Economics 
    - [Plots](http://plots.readthedocs.io/en/latest/): Data visualization in Julia
    - [Julia Stats](http://juliastats.github.io/): Data Frames, Regression, R-like analysis
    - [Julia Opt](http://www.juliaopt.org/): Optimisation in julia (incl. AMPL like nonlinear optimisation language called JuMP)
    - [Mopt](http://moptjl.readthedocs.io/en/latest/): Likelihood free simulated method of moments in Julia
    - ... many [others](http://pkg.julialang.org/) to explore
