# Machine Learning in Julia

## A neural network in ~10 lines of Julia code

Let's define a fully-connected two layer network.

![](imgs/network.png)

In [None]:
dense(W, b, σ = identity) = x -> σ.(W * x .+ b)

chain(layers...) = foldl(∘, reverse(layers))

network = chain(
    dense(randn(5,3), randn(5), tanh), # 3 input neurons  -> 5 hidden neurons
    dense(randn(2,5), randn(2)))       # 5 hidden neurons -> 2 output neurons

In [None]:
x = rand(3); # some input

In [None]:
network(x)

Let's unravel the compact definition of our neural network a bit:

*Dense layer:*
```julia
dense(W, b, σ = identity) = x -> σ.(W * x .+ b)
```
`σ` is the activation function, `W` is the weight matrix, `x` is the input to the layer, and `b` are the biases.

*Chaining layers:*
```julia
chain(layers...) = foldl(∘, reverse(layers))
```
The `layers...` means that the `chain` function takes an arbitrary number of layer (functions) as input. Those functions are then reversed (`reverse`) and composed (`∘`) via `foldl`.

In [None]:
f(x) = x^2
g(x) = 2x

In [None]:
(f ∘ g)(3) # == f(g(3))

In [None]:
f(g(3))

In [None]:
foldl(∘, [f,g])(3)

Our neural network `network` is now just a piece of code, a function to be specific. We learned how to use automatic differentiation (AD) to derive functions/code. So let's do it! **Let's (fake) train our neural network.**

In [None]:
using Zygote # reverse mode AD

# let's take `sum` as our cost function for now

dnetwork = gradient(model->sum(model(x)), m)[1]

Training just amounts to updating the weights `W` and biases `b` according to the gradients.

In [None]:
network.f.W # weights

In [None]:
dnetwork.f.W # weight gradients

In [None]:
η = 0.01 # learning rate

network.f.W .-= η * dnetwork.f.W # Gradient descent!

## Flux - A pure-Julia machine learning library

Web page: https://fluxml.ai/, Examples: [Model zoo](https://github.com/FluxML/model-zoo/)

<img src="https://fluxml.ai/logo.png" width=300>

<img src="imgs/flux.png" width=800>

In [None]:
using Flux

In [None]:
m = Chain(
    Dense(3, 5, tanh),
    Dense(5, 2),
    softmax # normalize output neurons
)

In [None]:
data, labels = rand(3, 100), fill(0.5, 2, 100); # fake data

In [None]:
loss(x, y) = sum(Flux.mse(m(x), y)) # mean squared error

In [None]:
opt = Descent(0.01) # optimizer, i.e. gradient descent

In [None]:
Flux.train!(loss, params(m), [(data,labels)], opt)

In [None]:
m(rand(3)) # trained model