# A Short Tour of Julia

- Calling Python and R code with ```PyCall.jl``` and ```RCall.jl```
- Benchmarking with ```BenchmarkTools.jl```
- Plotting: ```Plots.jl``` and others libraries
- Managing data with ```DataFrames.jl```
- Neural networks with ```Flux.jl```

In [None]:
# Install all the packages used in this tutorial
using Pkg
Pkg.activate(".") #create a new virtual environment in the folder where the notebook is located
Pkg.add("DataFrames")
Pkg.add("Plots")
Pkg.add("Statistics")
Pkg.add("PyCall")
Pkg.add("RCall")
Pkg.add("Flux")
Pkg.add("BenchmarkTools")

In [None]:
using DataFrames
using Plots
using Statistics

## Calling Python code from Julia

To call Python code (including ```numpy``` and any other library) fro Julia we can use the ```PyCall``` package

In [None]:
using PyCall

We can now import Python function (from libraries or builtin) and use them as standard Julia code.

We will see:

- how to import a builtin function
- how to import a Python package
- how to use inline Python code
- type conversions and zero-copy arrays

Let us import the builtin Python functions ```sum``` and ```map``` using the ```pybuiltin``` function:

In [None]:
pysum = pybuiltin("sum")
pymap = pybuiltin("map")

We can now use them as normal Julia functions:

In [None]:
pysum([1,2,3,4,5])

In [None]:
pymap_obj = pymap(x -> x^2, 1:10)

Notice that python ```map``` returns a map object, we can iterate on it with the Julia function ```collect``` or use it in a ```for``` cycle:

In [None]:
for i ∈ pymap_obj
    println(i)
end

Type conversion happens automatically and works most of the time.

### Importing a Python Module

A python module (e.g., ```math```, ```matplotlib```, ```numpy```, etc.) can be imported using the ```pyimport``` function

In [None]:
pymath = pyimport("math")

Now we can access all the functions (and Objects) in the Python ```math``` module directly from the variable ```pymath```:

In [None]:
pymath.cos(2*pymath.pi)

### Inline Python code

We can also write directly Python code, enclosing it with ```py" "``` or ```py""" """```:

In [None]:
x = py"[ i + 5 for i in range(0,10)]"

As we can see, Python code is executed and the result is automatically converted in the correct Julia type.

We can avoid automatic type conversion by using ```py" "o``` or ```py""" """o```, which return a ```PyObject```:

In [None]:
py"3 + 5"o

We can also define entire Python functions:

In [None]:
py"
def a_python_function(func, x, key=None):
    if key is not None:
        return func(x, key)
    else:
        return func(x, x)
"

The function can then be called quite easily, including the use of keyword arguments

In [None]:
py"a_python_function"((x,y) -> x^2 + y^2, 4)

In [None]:
py"a_python_function"((x,y) -> x^2 + y^2, 4, key=2)

By assigning ```py"a_python_function"``` to a variable we can be more "transparent":

In [None]:
a_python_function = py"a_python_function"

In [None]:
a_python_function(+, 4, key=12)

### Python/Numpy Arrays

Pasing arrays to Python functions especting numpy arrays (```ndarray```) works as espected **but** a copy is performed:

- Julia arrays are stored in _column-major order_
- Numpy arrays are stored in _row-major order_

The ```PyCall``` module has a ```PyArray``` type that can be used with Julia and, when passed to Python functions _no copy_ is made.

In [None]:
py"import numpy as np

xs = np.zeros([10,10])"

In [None]:
PyArray(py"xs"o) # No copy!

### Extras

- It is also possible to call Julia code from Python (see ```PyCall.jl``` documentation)
- If you want to use Matplotlib for plot the package ```PyPlots.jl``` provides a wrapper using ```PyCall.jl```

## Calling R from Julia

R code can be called easily from Julia using the ```RCall.jl``` package. In case you want to use some R-only package or the very good ```ggplot``` library.

In [None]:
using RCall

There are multiple ways to move objects to and from R and to evaluate R code.

- The ```@rput``` and ```@rget``` macros
- Importing libraries with ```@rimport``` and ```@rlibrary```
- Enclosing R code in ```R" "```

To make a variable visible to R we use the ```@rput``` followed by the name of the variable, which will have the same name in the R environment

In [None]:
x = [1,2,3,4,5]

@rput x

We can now use the variable ```x``` in the R code that we can write inside ```R" "```:

In [None]:
R"y = max(x)"

We can retrieve the variable ```y``` from the R environment using the macro ```@rget```

In [None]:
@rget y

println(typeof(y))
println(y)

Notice that we can use string interpolation inside ```R" "```:

In [None]:
R"z <- $([1,2,3,4] .+ 9)"

@rget z

This also works for functions from Julia to R...

In [None]:
function function_for_R(x)
    sum(x)
end

@rput function_for_R

R"w <- function_for_R(c(1,7,8,9))"

@rget w

... and from R to Julia

In [None]:
R"r_function <- function(x, y) x + y"

@rget r_function

r_function(3, 4)

It is also possible to convert R dataframes to Julia dataframes:

In [None]:
R"""
n <- c(2, 3, 4)
s <- c("a", "b", "c")
df <- data.frame(s, n)
"""

@rget df

In [None]:
typeof(df)

## Managing Data with DataFrames

DataFrames on Julia are similar to dataframes in R and Pandas dataframes in Python

Let us start by creating an _empty_ dataframe. We will add/load new data

In [None]:
df = DataFrame()

An empty dataframe is quite boring, let us generate some data:
- $x$ positions from $1$ to $10$
- a first random $y$ coordinate (uniform in $[0,1)$)
- a second random $y$ coordinate ($N(0,1)$)

In [None]:
data = hcat(collect(1:10), rand(10, 1), randn(10, 1))
df = DataFrame(data)

We can rename the columns of the dataframe by passing a vector of Strings of Symbols to the ```rename!``` function (notice the ```!```)

In [None]:
names = [:x, :y₁, :y₂]
rename!(df, names)

We could have also added the names during the creation of the dataframe

In [None]:
df = DataFrame(data, [:x, :y₁, :y₂])

We can access the different columns of the dataframe by name

In [None]:
df.y₁ # An array of Float64

In [None]:
df[:, :x] # This looks more similar to array/dictionary access 

In [None]:
df."x" # We can even use strings...

We can add a new column by simply assigning a vector of suitable length ($10$ elements in this case)

In [None]:
df.x₂ = 10*rand(10)
df

We might want to rename and reorder the columns.

We can do this via the ```rename``` and ```select!``` functions:

In [None]:
rename!(df, :x => :x₁)
select!(df, r"x", :) # group all columns matching the regexp "x" before all the rest (:)

Let us get some statistics on this data via the ```describe``` function

In [None]:
describe(df)

We can select which statstics to get via additional arguments to the ```describe``` function.

This has the form _column name_ ```=>``` _function to apply_

In [None]:
describe(df, :mean => mean, :median => median, :sum => sum, :product => prod)

If we need a matrix, instead of a dataframe, simply using the ```Matrix()``` constructor works:

In [None]:
Matrix(df)

Before moving on, let us start with our first scatter plot.

- ```scatter``` creates a new plot
- ```scatter!``` adds to the existing plot

In [None]:
scatter(df.x₁, df.y₁, label="data 1")
scatter!(df.x₂, df.y₂, label="data 2")

### A few notes on plotting in Julia

There are multiple packages that can be used for plotting in Julia:

- ```Plots.jl```: the "main" Julia plotting library with multiple backends (including in JavaScript)
- ```PyPlots.jl```: wrapper for Python's matplotlib
- ```Gadfly.jl```: promising package, inspired by ggplot

### Manipulation of Dataframes

In [None]:
df = DataFrame()

for i ∈ 1:10^5
    elem::Vector{Float64} = []
    while sum(elem) ≤ 1
        push!(elem, rand())
    end
    push!(df, (id = i, length = length(elem), elements = elem))
end

df

Maybe we want to add the sum of all the elements in each list as an additional column in out dataframe.

Notice that ```ByRow``` indicates that the function is applied to each row of the column, not to the entire column

In [None]:
transform!(df, :elements => ByRow(sum))

```elements_sum``` is not a good name. Let us delete the column and create it again with a different name (without using ```rename!``` which would be better)

In [None]:
select!(df, :id, :length, :elements)
transform!(df, :elements => ByRow(sum) => :sum)

Let us find the average length of the list of elements

In [None]:
mean(df.length)

It is possible to prove that the expecte value is the constant $e$:

In [None]:
MathConstants.e

We could have used ```combine``` to _combine_ all elements of a column in a single value

In [None]:
combine(df, :length => mean)

Let us explore how we can group the different rows of the dataframe using the ```groupby``` function.

In [None]:
grouped_df = groupby(df, :length, sort=true) |> x -> combine(x, :length => length => :num_elems)

In [None]:
histogram(df.length, yaxis = :log, bar_width = 0.75, title = "number of sequences", key=false)

## Building a Neural Network

We are going to build a simple neural network from scratch, then we are going to use the facilities provided by ```Flux.jl``` to help us build and train neural networks.

In [None]:
using Flux

### Automatic gradient computation

Let us define a function of which we want to compute the derivative:

In [None]:
f(x) = 3x^3 + 2x^2 + 5

We can compute the derivative by using the ```gradient``` function:

In [None]:
derivative_f(x) = gradient(f, x)[1]

Let us plot both $f$ and its derivative:

In [None]:
x_vals = -5:0.01:5

plot(x_vals, f.(x_vals), label="f(x)")
plot!(x_vals, derivative_f.(x_vals), label="f'(x)")

Notice that we expect $9x^2 + 4x$ as a derivative and a good automatic differentiation engine will actually write the code corresponding to it

In [None]:
@code_llvm derivative_f(3.0) # we expect 9x^2 + 4x

For an introduction to automatic diffentiation the [wikipedia page](https://en.wikipedia.org/wiki/Automatic_differentiation) provides a good overview.

For one of the automatic differentiation framework in Julia that is used in Flux see [Zigote.jl](https://github.com/FluxML/Zygote.jl) and the paper describing how automatic differentiation is performed on [arXiv](https://arxiv.org/abs/1810.07951).

### Neural Networks from scratch

For a general introduction to machine learning a quick read is [The hundred-page machine learning book](http://themlbook.com/wiki/doku.php) where all chapter are available online. For a more in-depth course on neural networks and deep learning, we refer to the [Deep Learning course](https://atcold.github.io/pytorch-Deep-Learning/) by Yann LeCun and Alfredo Canziani.

Let us build a simple fully connected layer (i.e., a simple linear function) with two inputs and one output:

In [None]:
W = rand(1, 2) .- 0.5;
b = rand(1) .- 0.5;

The output of this linear function is $Wx + b$:

In [None]:
simple_layer(x) = W*x .+ b

The error between the expected outputs $y = (y_1, \ldots, y_n)$ and the outputs $\hat{y} = (\hat{y}_1, \ldots, \hat{y}_n)$ given by the layer is $\frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2$:

In [None]:
function loss(x, y)
    ŷ = simple_layer(x)
    mean((y .- ŷ).^2)
end

We can compute the gradient by using the ```gradient``` function made available by Flux, and we can decide to derive whith respect to what parameters by using ```Flux.params```

In [None]:
d_simple(x, y) = gradient(() -> loss(x,y), Flux.params(W, b))

println(d_simple([2, 3], 4)[W])
println(d_simple([2, 3], 4)[b])

How can we train this simple neural network/linear function? by using gradient descent. Notice that here we use ```global``` like in Python to modify a variable in the global scope

In [None]:
function train!(X, Y; η=0.1)
    grad = d_simple(X,Y)
    W̃ = grad[W]
    b̃ = grad[b]
    global W = W - η*W̃
    global b = b - η*b̃
end

Let us generate a very simple training set as $100$ random point where the target value is actually a linear function of first component plus a gaussian noise:

In [None]:
simple_X = rand(2, 100)
simple_y = [10*i + randn()*0.05 + 3 for i ∈ simple_X[1,:]];

Let us visualize the data in 3D:

In [None]:
function show_data()
    scatter3d(simple_X[1,:], simple_X[2,:], simple_y, label="target")
    simple_ŷ = simple_layer(simple_X)
    scatter3d!(simple_X[1,:], simple_X[2,:], reshape(simple_ŷ, (100,)), label="predicted")
end

show_data()

We can now train for a few epochs the network, printing the loss before and after the training:

In [None]:
println("Loss before training : $(loss(simple_X, simple_y))")
for _ in 1:100
    train!(simple_X, simple_y)
end
println("Loss after training: $(loss(simple_X, simple_y))")

In [None]:
show_data()

### MNIST

Let us download the MNIST dataset, which contains $60,000$ images of handwritten digits as $28x28$ greyscale images.

Flux downloads the MNIST dataset in a directory inside ```~/.julia/```. If you do not have access to it you can use ```@eval(Flux.Data.MNIST, dir=“.”)``` to download in the current directory or, if you download the dataset manually in the current directory, to make Flux find it.

In [None]:
images = Flux.Data.MNIST.images()
labels = Flux.Data.MNIST.labels();

Let us see one of the images:

In [None]:
images[1]

The first image has label: 

In [None]:
labels[1]

Some standard preprocessing:

- encoding the labels as one-hot vectors of $10$ elements
- For this example we will use only $1,000$ images instead of $60,000$
- change the type of the images as arrays of ```Float64``` and the shape of the input as $28 \times 28 \times \textit{ num channels } \times \textit{ num samples}$
- prepare the minibatches.

In [None]:
n_images = 1000

Y = Flux.onehotbatch(labels[1:n_images], 0:9);

X = Float64.(reshape(hcat(images[1:n_images]...), (28, 28, 1, n_images)))

batches = Flux.Data.DataLoader((X, Y), batchsize=32);

We can now build our model as a convolutional neural network with:

- convolutional layers, with a $3x3$ kernel, a padding of $1$ in all directions and with _input channels_ ```=>``` _output channels_
- max pooling layers
- a dense layer with $288$ inputs and $10$ outpus followed by a softmax layer

In [None]:
model = Chain(
    Conv((3,3), 1 => 16, relu, pad=(1,1)),
    MaxPool((2,2)),
    Conv((3,3), 16 => 32, relu, pad=(1,1)),
    MaxPool((2,2)),
    Conv((3,3), 32 => 32, relu, pad=(1,1)),
    MaxPool((2,2)),
    Flux.flatten,
    Dense(288, 10, identity),
    softmax)

We can define the loss as the _crossentropy_ loss. Notice that the model is now included in the definition of the loss function.

In [None]:
function loss(x, y)
    ŷ = model(x)
    Flux.Losses.crossentropy(y, ŷ)
end

In addition to the loss function, we are interested in the accuracy of the prediction:

In [None]:
function accuracy(x, y)
    mean(Flux.onecold(model(x)) .== Flux.onecold(y))
end

We decide which optimizer to use (e.g., ADAM, ADAGrad, etc.)

In [None]:
optim = Flux.ADAM()

We also define a callback function to be called at most once every ```n_seconds``` during training to print the current value of the loss

In [None]:
n_seconds = 5
cb = Flux.throttle(() -> println("Current loss: $(loss(X, Y))"), n_seconds)

How good is our untrained network?

In [None]:
accuracy(X, Y)

We can now train the network. Notice that we also have a macro ```Flux.@epochs num_epochs code``` available

In [None]:
for i ∈ 1:5
    println("Epoch $i")
    Flux.train!(loss, Flux.params(model), batches, optim, cb=cb)
end

We can see that our accuracy has improved (but not by a lot, we had a very short learning phase):

In [None]:
accuracy(X, Y)

## Benchmarking

We have seen a few ways of exploring how much time a certain operation requires in Julia, using the ```@time``` or the ```@timed``` macros.

Let us start by benchmarking this function ```my_sum``` with a $10^6$ vector of random elements

In [None]:
function my_sum(v)
    s = 0.0 # zero(eltype(v)) # would be better since it will use the "correct" zero
    for x ∈ v
        s += x
    end
    s
end

In [None]:
rand_vec = rand(10^6);

Let us time the function using the ```@time``` macro:

In [None]:
@time my_sum(rand_vec)

In [None]:
@time my_sum(rand_vec)

Since Julia is JIT-compiled the first execution includes the compilation and might not be representative of the successive execution. Furthermore, we need more than one execution to get some significant result!

We can use the ```BenchmarkTools.jl``` package

In [None]:
using BenchmarkTools

We now have the ```@benchmark``` macro that executes the code multiple times

In [None]:
@benchmark my_sum(rand_vec)

Let us compare this implementation with the Python implementation of ```sum```. First of all, we convert the array to a Python array before calling the function to avoid type conversion overhead.

In [None]:
py_rand_vec = PyVector(rand_vec);

In [None]:
@benchmark pysum(py_rand_vec) # recall that we defined pysum previously

Notice that there is still some overhead, since we are calling python code from Julia.

But what about the Julia native sum implementation?

In [None]:
@benchmark sum(rand_vec)

## The End

In [None]:
scatter(randn(3,2000) .+ [-3,0,3], randn(3, 2000) .+ [-3,2,-3], 
        c=palette(:default)[2:4], key=:none, grid=false, showaxis=false,
        ticks=false, size=(600,600), markerstrokewidth=0)