# Machine Learning in Julia: Flux.jl

<img src="https://fluxml.ai/logo.png" width=800>

<img src="flux.png" width=900>

Web page: https://fluxml.ai/

Examples: [Model zoo](https://github.com/FluxML/model-zoo/)

# A single neuron

In [1]:
using Flux

┌ Info: Recompiling stale cache file C:\Users\carsten\.julia\compiled\v1.2\Flux\QdkVy.ji for Flux [587475ba-b771-5e3f-ad9e-33799f191a9c]
└ @ Base loading.jl:1240


In [2]:
model(W,b,x) = σ.(W * x + b)

model (generic function with 1 method)

In [4]:
# single neuron 5 in 1 out
W = randn(1, 5) # weights
b = zeros(1)    # biases
x = rand(5)     # input

5-element Array{Float64,1}:
 0.6634757123055008
 0.7218375978945015
 0.5357182307706128
 0.8515426328531592
 0.7280812668372678

In [5]:
model(W, b, x)

1-element Array{Float64,1}:
 0.5796260667956458

In [6]:
loss(W, b, x) = Flux.mse(model(W,b,x), 0.5)

loss (generic function with 1 method)

In [7]:
loss(W,b,x)

0.006340310513344641

In [8]:
import Flux.Tracker: gradient # AD

gradient(loss, W, b, x)

([0.02574506397348128 0.028009699212171026 … 0.03304268589240641 0.02825197432997781] (tracked), [0.038803325420941455] (tracked), [0.027414767802176537, 0.05198128817237491, -0.08451593238424214, 0.004415870233551073, -0.0023754666311185824] (tracked))

In [9]:
typeof(gradient(loss, W, b, x))

Tuple{TrackedArray{…,Array{Float64,2}},TrackedArray{…,Array{Float64,1}},TrackedArray{…,Array{Float64,1}}}

In [11]:
gradient(loss, W, b, x)[2]

Tracked 1-element Array{Float64,1}:
 0.038803325420941455

Since there can be hundreds of parameters in a neural network, we use a slightly different approach.

In [None]:
Wnew = W .* -gradient(W) * Δsomething

In [12]:
using Flux.Tracker: param, back!, grad

W = param(randn(1, 5))
b = param(zeros(1))
x = rand(5)

y = loss(W, b, x)

back!(y) # Automatic differentiation (backpropagation)

grad(W), grad(b)

([0.025582989185281266 0.0626209842125887 … 0.0347646810506487 0.06870707949907927], [0.08959860500608491])

We can now use these gradients to update our parameters.

In [13]:
using Flux.Tracker: update!

η = 0.1
for p in (W, b)
  update!(p, -η * grad(p)) # gradient descent
end

Of course, Flux offers more sophisticated optimizers, like [stochastic gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) etc.

# A small Neural Network

In [14]:
m = Chain(
    Dense(10, 5),
    Dense(5, 2),
    softmax # normalize output neurons
)

opt = ADAM(0.01)

data, labels = rand(10, 100), fill(0.5, 2, 100)

loss(x, y) = sum(Flux.mse(m(x), y))

Flux.train!(loss, params(m), [(data,labels)], opt)

In [15]:
m(rand(10)) # trained model

Tracked 2-element Array{Float32,1}:
 0.41269642f0
 0.5873035f0 