# Machine Learning in Julia

## A neural network in 10 lines of Julia code

Let's define a fully-connected two layer network.

In [58]:
dense(W, b, σ = identity) = x -> σ.(W * x .+ b)

chain(f...) = foldl(∘, reverse(f))

m = chain(
    dense(randn(5,10), randn(5), tanh),
    dense(randn(2,5), randn(2)))

#56 (generic function with 1 method)

In [59]:
x = rand(10); # some input

In [60]:
m(x)

2-element Array{Float64,1}:
  1.2959237046559549
 -2.1264288488618406

Let's train it!

In [61]:
using Zygote # source-to-source reverse mode AD

dm, = gradient(m) do model
    sum(model(x))
end

((f = (W = [-0.9158690901621099 0.5787514427051449 … -0.9474818206096963 0.9477195924001385; -0.9158690901621099 0.5787514427051449 … -0.9474818206096963 0.9477195924001385], b = [1.0, 1.0], σ = nothing), g = (W = [-0.018255865107548706 -0.08876125935234815 … -0.059507558406487034 -0.09030907543436559; -0.26707159346895404 -1.2985202746563114 … -0.870557399139401 -1.3211638308496765; … ; 0.03127982336645934 0.15208463132091696 … 0.10196098103039244 0.15473667839532546; 0.022852276174800748 0.111109322970635 … 0.07449020636282105 0.11304684389148333], b = [-0.09317009308937145, -1.363018683499156, 0.0609648030757468, 0.15963878116447336, 0.11662820063398797], σ = nothing)),)

In [62]:
m.f.W

2×5 Array{Float64,2}:
  0.222052  -1.27388    1.93939   0.691362  1.0537  
 -0.800088  -0.775628  -0.778323  0.869467  0.091647

In [63]:
dm.f.W

2×5 Array{Float64,2}:
 -0.915869  0.578751  0.973392  -0.947482  0.94772
 -0.915869  0.578751  0.973392  -0.947482  0.94772

In [64]:
η = 0.01

m.f.W .-= η * dm.f.W # Gradient descent!

2×5 Array{Float64,2}:
  0.23121   -1.27967    1.92966   0.700837  1.04423  
 -0.790929  -0.781415  -0.788057  0.878941  0.0821698

## Flux - The ML library that doesn't make you tensor

Web page: https://fluxml.ai/, Examples: [Model zoo](https://github.com/FluxML/model-zoo/)

<img src="https://fluxml.ai/logo.png" width=300>

<img src="flux.png" width=800>

In [65]:
using Flux

In [66]:
m = Chain(
    Dense(10, 5),
    Dense(5, 2),
    softmax # normalize output neurons
)

Chain(Dense(10, 5), Dense(5, 2), softmax)

In [67]:
data, labels = rand(10, 100), fill(0.5, 2, 100); # fake data

In [68]:
loss(x, y) = sum(Flux.mse(m(x), y)) # mean squared error

loss (generic function with 1 method)

In [69]:
opt = Descent(0.01) # or ADAM

Descent(0.01)

In [70]:
Flux.train!(loss, params(m), [(data,labels)], opt)

In [71]:
m(rand(10)) # trained model

2-element Array{Float32,1}:
 0.22444387
 0.7755561 