# Multiple layers with Flux.jl

We now reach our first serious **neural network** by adding another layer to get a network looking something like this:

In [None]:
include("draw_neural_net.jl")

In [None]:
plot()
draw_layer(1, 1, 2, 4, 0.2)
draw_layer(2, 1, 4, 3, 0.2)
plot!()

We will continue to use two input data and try to classify into three types, so we will have three output neurons. We have chosen to add a single "hidden layer" in between, and have arbitrarily chosen to put 4 neurons there (WHY?)

## Read in and process data

In [None]:
using Flux
using Flux: onehot

In [None]:
using TextParse
using DataFrames
cols, colnames = TextParse.csvread("Apple_Golden_1.dat",'\t')
apples_1 = DataFrame(Dict(name=>col for (name, col) in zip(colnames, cols)))
cols, colnames = TextParse.csvread("Apple_Golden_2.dat",'\t')
apples_2 = DataFrame(Dict(name=>col for (name, col) in zip(colnames, cols)))
cols, colnames = TextParse.csvread("Apple_Golden_3.dat",'\t')
apples_3 = DataFrame(Dict(name=>col for (name, col) in zip(colnames, cols)))
cols, colnames = TextParse.csvread("Banana.dat",'\t')
bananas = DataFrame(Dict(name=>col for (name, col) in zip(colnames, cols)))
cols, colnames = TextParse.csvread("Grape_White.dat",'\t')
grapes_1 = DataFrame(Dict(name=>col for (name, col) in zip(colnames, cols)))
cols, colnames = TextParse.csvread("Grape_White_2.dat",'\t')
grapes_2 = DataFrame(Dict(name=>col for (name, col) in zip(colnames, cols)))

apples = vcat(apples_1, apples_2, apples_3)
grapes = vcat(grapes_1, grapes_2);

In [None]:
col1 = 4 #red
col2 = 1 #blue

x_apples  = [ [apples_1[i, col1], apples_1[i, col2]] for i in 1:size(apples_1)[1] ]
append!(x_apples, [ [apples_2[i, col1], apples_2[i, col2]] for i in 1:size(apples_2)[1] ])
append!(x_apples, [ [apples_3[i, col1], apples_3[i, col2]] for i in 1:size(apples_3)[1] ])

x_bananas = [ [bananas[i, col1], bananas[i, col2]] for i in 1:size(bananas)[1] ]

x_grapes = [ [grapes_1[i, col1], grapes_1[i, col2]] for i in 1:size(grapes_1)[1] ]
append!(x_grapes, [ [grapes_2[i, col1], grapes_2[i, col2]] for i in 1:size(grapes_2)[1] ])

xs = vcat(x_apples, x_bananas, x_grapes);

We now we wish to classify the three types of fruit, so we will use as output *one-hot vectors*. Effectively, the first neuron learns whether (1) or not (0) the data corresponds to an apple, the second whether (1) or not (0) it corresponds to a banana, etc:

In [None]:
labels = [ones(length(x_apples)); 2*ones(length(x_bananas)); 3*ones(length(x_grapes))];

ys = [onehot(label, 1:3) for label in labels];  # onehotbatch(labels, 1:3)

The input data is in `xs` and the one-hot vectors are in `ys`.

## Multiple layers in Flux

Let's tell Flux what structure we want the network to have:

In [None]:
inputs = 2
outputs = 3
hidden = 4

layer1 = Dense(inputs, hidden, σ)
layer2 = Dense(hidden, outputs, σ)

To make the `model`, we use Flux's `Chain` function:

In [None]:
model = Chain(layer1, layer2)

We see that `model` understands that it has layers, and that each layer has a `W` and a `b`:

In [None]:
model.layers

In [None]:
params(model)

In [None]:
loss(x, y) = Flux.mse(model(x), y)

In [None]:
data = zip(xs, ys)

Flux has several different optimizers, apart from `SGD`. For example, a popular one is `ADAM`:

In [None]:
opt = ADAM(params(model), 0.02)
# give a list of the parameters that will be modified

In [None]:
for i in 1:100
    Flux.train!(loss, data, opt)
end

In [None]:
params(model)

What does this neural network represent? It is simply a more complicated function with two inputs and three outputs, i.e. a function $f: \mathbb{R}^2 \to \mathbb{R}^3$.

Let's look at each component:

In [None]:
coords = 0:0.01:0.8

heatmap(coords, coords, (x,y)->model([x,y]).data[1])
contour!(coords, coords, (x,y)->model([x,y]).data[1], levels=[0.5, 0.501], lw=3)


scatter!(first.(x_apples), last.(x_apples), m=:cross, label="apples")
scatter!(first.(x_bananas), last.(x_bananas), m=:circle, label="bananas")
scatter!(first.(x_grapes), last.(x_grapes), m=:square, label="grapes")

xlims!(0.4, 0.8)
ylims!(0.1, 0.5)


We see that the first component, which is supposed to separate apples from non-apples, has been able to learn a set that has a much more complicated shape than simply a hyperplane: the hyperplane has been bent round. It is not able to encompass all of the apple data, but it's definitely progress.  It's also possible that we need to let the network learn for a longer time.

In [None]:
coords = 0:0.01:1

#contour(coords, coords, (x,y)->model([x,y]).data[2], levels=[0.5, 0.501], lw=3)
heatmap(coords, coords, (x,y)->model([x,y]).data[2])


scatter!(first.(x_apples), last.(x_apples), m=:cross, label="apples")
scatter!(first.(x_bananas), last.(x_bananas), m=:circle, label="bananas")
scatter!(first.(x_grapes), last.(x_grapes), m=:square, label="grapes")

xlims!(0.4, 0.8)
ylims!(0.1, 0.5)


The second component is reasonably successful at separating out the bananas. Since there are some apples mixed in there, it can't be expected to do too much better.

In [None]:
coords = 0:0.01:1

heatmap(coords, coords, (x,y)->model([x,y]).data[3])
contour!(coords, coords, (x,y)->model([x,y]).data[3], levels=[0.5, 0.501], lw=3)


scatter!(first.(x_apples), last.(x_apples), m=:cross, label="apples")
scatter!(first.(x_bananas), last.(x_bananas), m=:circle, label="bananas")
scatter!(first.(x_grapes), last.(x_grapes), m=:square, label="grapes")

xlims!(0.4, 0.8)
ylims!(0.1, 0.5)

The third component separates grapes from the rest pretty successfully.

We see that adding an intermediate layer allows the network to start to deform the separating surfaces that it is learning.