# Machine Learning in Julia: Flux.jl

<img src="https://fluxml.ai/logo.png" width=800>

<img src="flux.png" width=900>

Web page: https://fluxml.ai/

Examples: [Model zoo](https://github.com/FluxML/model-zoo/)

# A single neuron

In [1]:
using Flux

┌ Info: Precompiling Flux [587475ba-b771-5e3f-ad9e-33799f191a9c]
└ @ Base loading.jl:1242


In [2]:
model(W,b,x) = σ.(W * x + b)

model (generic function with 1 method)

In [3]:
# single neuron 5 in 1 out
W = randn(1, 5) # weights
b = zeros(1)    # biases
x = rand(5)     # input

5-element Array{Float64,1}:
 0.7819245376842503 
 0.980184717102657  
 0.7236527383209654 
 0.02382960312602722
 0.8761142686824372 

In [4]:
model(W, b, x)

1-element Array{Float64,1}:
 0.8169639496693879

In [5]:
loss(W, b, x) = Flux.mse(model(W,b,x), 0.5)

loss (generic function with 1 method)

In [6]:
loss(W,b,x)

0.10046614539001826

In [7]:
import Flux.Tracker: gradient # AD

gradient(loss, W, b, x)

([0.0741215062335684 0.09291531870062766 … 0.002258895828848555 0.0830500976728352] (tracked), [0.09479368233293566] (tracked), [0.07965331417551473, -0.020279912799503228, 0.06121391073207397, 0.1817911102407536, 0.05794740373498214] (tracked))

Since there can be hundreds of parameters in a neural network, we use a slightly different approach.

In [8]:
using Flux.Tracker: param, back!, grad

W = param(randn(1, 5))
b = param(zeros(1))
x = rand(5)

y = loss(W, b, x)

back!(y) # Automatic differentiation (backpropagation)

grad(W), grad(b)

([0.022757736607291752 0.009379708248295008 … 0.009674592257967303 0.0668275073246203], [0.08683589788802573])

We can now use these gradients to update our parameters.

In [9]:
using Flux.Tracker: update!

η = 0.1
for p in (W, b)
  update!(p, -η * grad(p)) # gradient descent
end

Of course, Flux offers more sophisticated optimizers, like [stochastic gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) etc.

# A small Neural Network

In [10]:
m = Chain(
    Dense(10, 5),
    Dense(5, 2),
    softmax # normalize output neurons
)

opt = ADAM(0.01)

data, labels = rand(10, 100), fill(0.5, 2, 100)

loss(x, y) = sum(Flux.mse(m(x), y))

Flux.train!(loss, params(m), [(data,labels)], opt)

In [11]:
m(rand(10)) # trained model

Tracked 2-element Array{Float32,1}:
 0.4469751f0
 0.5530249f0