# Single neural network layer using Flux.jl

## Read in and process data

In [1]:
using Flux
using Flux: onehot

In [None]:
using TextParse
using DataFrames
cols, colnames = TextParse.csvread("Apple_Golden_1.dat",'\t')
apples_1 = DataFrame(Dict(name=>col for (name, col) in zip(colnames, cols)))
cols, colnames = TextParse.csvread("Apple_Golden_2.dat",'\t')
apples_2 = DataFrame(Dict(name=>col for (name, col) in zip(colnames, cols)))
cols, colnames = TextParse.csvread("Apple_Golden_3.dat",'\t')
apples_3 = DataFrame(Dict(name=>col for (name, col) in zip(colnames, cols)))
cols, colnames = TextParse.csvread("Banana.dat",'\t')
bananas = DataFrame(Dict(name=>col for (name, col) in zip(colnames, cols)))
cols, colnames = TextParse.csvread("Grape_White.dat",'\t')
grapes_1 = DataFrame(Dict(name=>col for (name, col) in zip(colnames, cols)))
cols, colnames = TextParse.csvread("Grape_White_2.dat",'\t')
grapes_2 = DataFrame(Dict(name=>col for (name, col) in zip(colnames, cols)))

apples = vcat(apples_1, apples_2, apples_3)
grapes = vcat(grapes_1, grapes_2);

In [None]:
bananas

In [None]:
col1 = 4 #red
col2 = 1 #blue

x_apples  = [ [apples_1[i, col1], apples_1[i, col2]] for i in 1:size(apples_1)[1] ]
append!(x_apples, [ [apples_2[i, col1], apples_2[i, col2]] for i in 1:size(apples_2)[1] ])
append!(x_apples, [ [apples_3[i, col1], apples_3[i, col2]] for i in 1:size(apples_3)[1] ])

x_bananas = [ [bananas[i, col1], bananas[i, col2]] for i in 1:size(bananas)[1] ]

x_grapes = [ [grapes_1[i, col1], grapes_1[i, col2]] for i in 1:size(grapes_1)[1] ]
append!(x_grapes, [ [grapes_2[i, col1], grapes_2[i, col2]] for i in 1:size(grapes_2)[1] ])

xs = vcat(x_apples, x_bananas, x_grapes);

We now we wish to classify the three types of fruit, so we will use as output *one-hot vectors*. Effectively, the first neuron learns whether (1) or not (0) the data corresponds to an apple, the second whether (1) or not (0) it corresponds to a banana, etc:

In [None]:
labels = [ones(length(x_apples)); 2*ones(length(x_bananas)); 3*ones(length(x_grapes))];

ys = [onehot(label, 1:3) for label in labels]  # onehotbatch(labels, 1:3)

In [None]:
onehot(1, 1:3)

The input data is in `xs` and the one-hot vectors are in `ys`.

## Single layer in Flux

Let's suppose that there are two pieces of input data. Then the network has 2 input neurons and 3 output neurons:

In [None]:
include("draw_neural_net.jl")

In [None]:
plot()
draw_layer(1, 1, 2, 3, 0.2)
plot!()

In [None]:
model = Dense(2, 3, σ)

In [None]:
model.W

Each of the 6 lines in the figure denotes a weight of the neuron on the right, taking as input the output of the neuron on the left. These weights are collected in the **matrix** `W`. Note that it seems to be "backwards", since it is designed to multiply vectors of length 2 (the input size):

In [None]:
x = rand(2)
model.W * x

The whole `model` object represents a set of three sigmoidal neurons:

In [None]:
model(x)

In [None]:
σ.(model.W*x + model.b)

Note that here we have used Julia's **broadcasting** capability, in which the function $\sigma$ is applied to each element of the vector `W * x` in turn. This elementwise application of the function is implicit in most of the literature on machine learning, but it is much clearer to make it explicit, as Julia allows us (in fact, basically forces us) to do.

In [None]:
loss(x, y) = Flux.mse(model(x), y)

In [None]:
data = zip(xs, ys)

In [None]:
collect(data)

In [None]:
params(model)

In [None]:
opt = SGD(params(model), 0.01)
# give a list of the parameters that will be modified

In [None]:
for i in 1:100
    Flux.train!(loss, data, opt)
end

In [None]:
model.W

In [None]:
model.b

Let's visualize the hyperplanes that the neurons have learnt.

In [None]:
plot()

contour!(0:0.01:1, 0:0.01:1, (x,y)->model([x,y]).data[1], levels=[0.5, 0.501], c=:blue)
contour!(0:0.01:1, 0:0.01:1, (x,y)->model([x,y]).data[2], levels=[0.5,0.501], c=:red)
contour!(0:0.01:1, 0:0.01:1, (x,y)->model([x,y]).data[3], levels=[0.5,0.501], c=:green)

scatter!(first.(x_apples), last.(x_apples), m=:cross, label="apples")
scatter!(first.(x_bananas), last.(x_bananas), m=:circle, label="bananas")
scatter!(first.(x_grapes), last.(x_grapes), m=:square, label="grapes")

We see that the result is not very good: this network is so simple that it is *not capable of learning a good representation of the data*. The reason for this is that the class of functions modelled is not complex enough.

Note that two of the hyperplanes have been learnt correctly, the one that separates bananas from the rest, and the one that separates grapes from the rest. The third hyperplane has not, and it is intuitively clear why: there *is no way* to separate apples from non-apples with a *single* hyperplane, given this data.