# Single neuron using Flux.jl

## Read in and process data

In [None]:
using TextParse
using DataFrames
cols, colnames = TextParse.csvread("Apple_Golden_1.dat",'\t')
apples = DataFrame(Dict(name=>col for (name, col) in zip(colnames, cols)))
cols, colnames = TextParse.csvread("bananas.dat",'\t')
bananas = DataFrame(Dict(name=>col for (name, col) in zip(colnames, cols)))

In [None]:
col1 = 4 #red
col2 = 3 #green

x_apples  = [ [apples[i, col1], apples[i, col2]] for i in 1:size(apples)[1] ]
x_bananas = [ [bananas[i, col1], bananas[i, col2]] for i in 1:size(bananas)[1] ]

xs = vcat(x_apples, x_bananas)

ys = vcat( zeros(size(x_apples)[1]), ones(size(x_bananas)[1]) );

The input data is in `xs` and the labels in `y`

## Using Flux.jl

In [None]:
using Flux

The function $\sigma$ that we have been using is predefined by Flux:

In [None]:
σ

In [None]:
methods(σ)

In [None]:
?σ

We can make a neuron in a simple way:

In [None]:
model = Dense(2, 1, σ)

In [None]:
typeof(model)

We have made an object of type `Dense`, defined by `Flux`. This represents a "dense neural network layer" (see later).
Inside the object live the parameters that we will modify during the learning process:

In [None]:
model.W

In [None]:
model.b

The fact that `W` and `b` are of size $1 \times 2$ and $1$, respectively, comes from the `(2, 1)` pair in the call to the `Dense` constructor when we created `model`. A "tracked" array is a special type provided by `Flux.jl` that is able to calculate ("track") derivatives via reverse-mode automatic differentiation, usually called **backpropagation** in the context of neural networks. This is more efficient in this context than calculating the derivatives via forward-mode automatic differentiation, as we did previously using the `ForwardDiff.jl` package.

## 

In [None]:
W = rand(1, 2)
b = rand(1)

predict(x) = σ.(W*x + b)
loss(x, y) = sum(abs2, (predict(x) .- y) )

x, y = rand(2), rand(1) 
loss(x, y) 

We will now see how `Flux.jl` facilitates the type of calculations that we have been doing.
To do so, we use the `param` function to define objects that will contain both the values of 
the parameters `W` and `b` *and* the derivatives. These derivatives will be the derivatives of the loss 
function with respect to `W` and `b` that we calculated previously using `ForwardDiff`.

Let's start, as usual, by setting up some random initial values for the parameters:

In [None]:
W_data = rand(1, 2)  
b_data = rand(1)

W_data, b_data

We now set up `Flux.jl` objects that will contain these values *and* their derivatives, and allow to propagate
this information around:

In [None]:
W = param( W_data )
b = param( b_data )

predict(x) = σ.(W*x + b)
loss(x, y) = sum( (predict(x) .- y).^2 )

x, y = rand(2), rand(1) 
l = loss(x, y) 

In [None]:
fieldnames(W)

We see that the data is indeed inside the object:

In [None]:
W.data  # the random 

Initially, the derivatives are zero:

In [None]:
W.grad

Having set up the structure, we can now propagate the derivative information backwards 
from the `loss` function to all of the objects that are used to calculate it:

In [None]:
using Flux.Tracker

back!(l)   # backpropagate derivatives of the loss function

In [None]:
W.grad

In [None]:
b.grad

We can now use this structure to do stochastic gradient descent, just as we did in the previous notebook.

**Exercise:** Implement this.

In [None]:
function stochastic_gradient_descent(loss, xs, ys, W, b, N=1000)

    η = 0.01

    for i in 1:N
        
        which = rand(1:length(xs))  # choose a data point
        
        xx = xs[which]
        yy = ys[which]
        
        l = loss(xx, yy)
        back!(l)
        
        W.data -= η * W.grad
        b.data -= η * b.grad
    
    end
    
    return W, b
    
end
    

In [None]:
b

In [None]:
ys

In [None]:
W_final, b_final = stochastic_gradient_descent(loss, xs, ys, W, b)

In [None]:
W_final

In [None]:
b_final

In [None]:
using Plots; gr()

In [None]:
scatter(first.(x_apples), last.(x_apples), m=:cross)
scatter!(first.(x_bananas), last.(x_bananas))

Let's draw the function that the network has learned, together with the data:

In [None]:
heatmap(0.4:0.01:0.7, 0.4:0.01:0.7, (x,y)->predict([x, y]).data[1])

scatter!(first.(x_apples), last.(x_apples), m=:cross)
scatter!(first.(x_bananas), last.(x_bananas))

TODO: Animation of learning process

## Automation with Flux.jl

We will need to repeat the above process for a lot of different systems.
Fortunately, Flux.jl provides us with tools to automate this.

Firstly, we create the model:

In [None]:
using Flux

In [None]:
model = Dense(2, 1, σ)

In [None]:
model.W

In [None]:
model.b

We can use the `model` object just like a function to apply it to data:

In [None]:
model(rand(2))

Flux has various loss functions built in:

In [None]:
loss(x, y) = Flux.mse(model(x), y)

In [None]:
data = zip(xs, ys)

In [None]:
collect(data)

In [None]:
opt = SGD([model.W, model.b], 0.01)
# give a list of the parameters that will be modified

In [None]:
for i in 1:100
    Flux.train!(loss, data, opt)
end

In [None]:
model.W

In [None]:
model.b

In [None]:
params(model)