# Single neural network layer using Flux.jl

## Read in and process data

In [29]:
using Flux
using Flux: onehot

In [59]:
using CSV

apples_1 = CSV.read("Apple_Golden_1.dat", delim='\t')
apples_2 = CSV.read("Apple_Golden_2.dat", delim='\t')
apples_3 = CSV.read("Apple_Golden_3.dat", delim='\t')
bananas = CSV.read("Banana.dat", delim='\t')
grapes_1 = CSV.read("Grape_White.dat", delim='\t')
grapes_2 = CSV.read("Grape_White_2.dat", delim='\t');

apples = vcat(apples_1, apples_2, apples_3)
grapes = vcat(grapes_1, grapes_2);

In [3]:
col1 = :red
col2 = :blue

x_apples  = [ [apples_1[i, col1], apples_1[i, col2]] for i in 1:size(apples_1)[1] ]
append!(x_apples, [ [apples_2[i, col1], apples_2[i, col2]] for i in 1:size(apples_2)[1] ])
append!(x_apples, [ [apples_3[i, col1], apples_3[i, col2]] for i in 1:size(apples_3)[1] ])

x_bananas = [ [bananas[i, col1], bananas[i, col2]] for i in 1:size(bananas)[1] ]

x_grapes = [ [grapes_1[i, col1], grapes_1[i, col2]] for i in 1:size(grapes_1)[1] ]
append!(x_grapes, [ [grapes_2[i, col1], grapes_2[i, col2]] for i in 1:size(grapes_2)[1] ])

xs = vcat(x_apples, x_bananas, x_grapes);

We now we wish to classify the three types of fruit, so we will use as output *one-hot vectors*. Effectively, the first neuron learns whether (1) or not (0) the data corresponds to an apple, the second whether (1) or not (0) it corresponds to a banana, etc:

In [35]:
labels = [ones(length(x_apples)); 2*ones(length(x_bananas)); 3*ones(length(x_grapes))];

ys = [onehot(label, 1:3) for label in labels]  # onehotbatch(labels, 1:3)

2935-element Array{Flux.OneHotVector,1}:
 Bool[true, false, false]
 Bool[true, false, false]
 Bool[true, false, false]
 Bool[true, false, false]
 Bool[true, false, false]
 Bool[true, false, false]
 Bool[true, false, false]
 Bool[true, false, false]
 Bool[true, false, false]
 Bool[true, false, false]
 Bool[true, false, false]
 Bool[true, false, false]
 Bool[true, false, false]
 ⋮                       
 Bool[false, false, true]
 Bool[false, false, true]
 Bool[false, false, true]
 Bool[false, false, true]
 Bool[false, false, true]
 Bool[false, false, true]
 Bool[false, false, true]
 Bool[false, false, true]
 Bool[false, false, true]
 Bool[false, false, true]
 Bool[false, false, true]
 Bool[false, false, true]

In [32]:
onehot(1, 1:3)

3-element Flux.OneHotVector:
  true
 false
 false

The input data is in `xs` and the one-hot vectors are in `ys`.

## Single layer in Flux

Let's suppose that there are two pieces of input data. Then the network has 2 input neurons and 3 output neurons:

In [9]:
include("draw_neural_net.jl")

draw_layer (generic function with 1 method)

In [11]:
plot()
draw_layer(1, 1, 2, 3, 0.2)
plot!()

In [12]:
model = Dense(2, 3, σ)

Dense(2, 3, NNlib.σ)

In [13]:
model.W

Tracked 3×2 Array{Float64,2}:
  0.870099   0.791934
 -0.333706  -1.05544 
  0.530088  -0.189723

Each of the 6 lines in the figure denotes a weight of the neuron on the right, taking as input the output of the neuron on the left. These weights are collected in the **matrix** `W`. Note that it seems to be "backwards", since it is designed to multiply vectors of length 2 (the input size):

In [16]:
x = rand(2)
model.W * x

Tracked 3-element Array{Float64,1}:
  1.06859 
 -0.779776
  0.320209

The whole `model` object represents a set of three sigmoidal neurons:

In [17]:
model(x)

Tracked 3-element Array{Float64,1}:
 0.744329
 0.314368
 0.579375

In [19]:
σ.(model.W*x + model.b)

Tracked 3-element Array{Float64,1}:
 0.744329
 0.314368
 0.579375

Note that here we have used Julia's **broadcasting** capability, in which the function $\sigma$ is applied to each element of the vector `W * x` in turn. This elementwise application of the function is implicit in most of the literature on machine learning, but it is much clearer to make it explicit, as Julia allows us (in fact, basically forces us) to do.

In [20]:
loss(x, y) = Flux.mse(model(x), y)

loss (generic function with 1 method)

In [36]:
data = zip(xs, ys)

Base.Iterators.Zip2{Array{Array{Float64,1},1},Array{Flux.OneHotVector,1}}(Array{Float64,1}[[0.708703, 0.341998], [0.648376, 0.284163], [0.647237, 0.282579], [0.647963, 0.283689], [0.647653, 0.2846], [0.648491, 0.28597], [0.647974, 0.285646], [0.649307, 0.287323], [0.648141, 0.286103], [0.64984, 0.288396]  …  [0.722939, 0.414788], [0.721564, 0.413433], [0.723195, 0.413965], [0.722358, 0.413741], [0.723049, 0.416157], [0.722233, 0.414729], [0.722148, 0.41648], [0.721761, 0.416422], [0.722839, 0.417423], [0.722266, 0.417273]], Flux.OneHotVector[Bool[true, false, false], Bool[true, false, false], Bool[true, false, false], Bool[true, false, false], Bool[true, false, false], Bool[true, false, false], Bool[true, false, false], Bool[true, false, false], Bool[true, false, false], Bool[true, false, false]  …  Bool[false, false, true], Bool[false, false, true], Bool[false, false, true], Bool[false, false, true], Bool[false, false, true], Bool[false, false, true], Bool[false, false, true], Bool[fa

In [37]:
collect(data)

2935-element Array{Tuple{Array{Float64,1},Flux.OneHotVector},1}:
 ([0.708703, 0.341998], Bool[true, false, false])
 ([0.648376, 0.284163], Bool[true, false, false])
 ([0.647237, 0.282579], Bool[true, false, false])
 ([0.647963, 0.283689], Bool[true, false, false])
 ([0.647653, 0.2846], Bool[true, false, false])  
 ([0.648491, 0.28597], Bool[true, false, false]) 
 ([0.647974, 0.285646], Bool[true, false, false])
 ([0.649307, 0.287323], Bool[true, false, false])
 ([0.648141, 0.286103], Bool[true, false, false])
 ([0.64984, 0.288396], Bool[true, false, false]) 
 ([0.648446, 0.287733], Bool[true, false, false])
 ([0.709808, 0.322328], Bool[true, false, false])
 ([0.650164, 0.290677], Bool[true, false, false])
 ⋮                                               
 ([0.722031, 0.412562], Bool[false, false, true])
 ([0.730362, 0.422273], Bool[false, false, true])
 ([0.722939, 0.414788], Bool[false, false, true])
 ([0.721564, 0.413433], Bool[false, false, true])
 ([0.723195, 0.413965], Bool[false,

In [38]:
params(model)

2-element Array{Any,1}:
 param([0.870099 0.791934; -0.333706 -1.05544; 0.530088 -0.189723])
 param([0.0, 0.0, 0.0])                                            

In [39]:
opt = SGD(params(model), 0.01)
# give a list of the parameters that will be modified

(::#71) (generic function with 1 method)

In [54]:
for i in 1:100
    Flux.train!(loss, data, opt)
end

In [55]:
model.W

Tracked 3×2 Array{Float64,2}:
  7.39953   -5.39071
 -4.46358  -10.173  
 -2.78041   14.7777 

In [42]:
model.b

Tracked 3-element Array{Float64,1}:
 -2.55731
  2.7284 
 -2.25216

Let's visualize the hyperplanes that the neurons have learnt.

In [56]:
plot()

contour!(0:0.01:1, 0:0.01:1, (x,y)->model([x,y]).data[1], levels=[0.5, 0.501], c=:blue)
contour!(0:0.01:1, 0:0.01:1, (x,y)->model([x,y]).data[2], levels=[0.5,0.501], c=:red)
contour!(0:0.01:1, 0:0.01:1, (x,y)->model([x,y]).data[3], levels=[0.5,0.501], c=:green)

scatter!(first.(x_apples), last.(x_apples), m=:cross, label="apples")
scatter!(first.(x_bananas), last.(x_bananas), m=:circle, label="bananas")
scatter!(first.(x_grapes), last.(x_grapes), m=:square, label="grapes")

We see that the result is not very good: this network is so simple that it is *not capable of learning a good representation of the data*. The reason for this is that the class of functions modelled is not complex enough.

Note that two of the hyperplanes have been learnt correctly, the one that separates bananas from the rest, and the one that separates grapes from the rest. The third hyperplane has not, and it is intuitively clear why: there *is no way* to separate apples from non-apples with a *single* hyperplane, given this data.