# Multilayered Perceptron: The Start of Deep Learning
In this notebook we implement a multilayered perceptron model in order to classify species of flower based off of measurements given in the [iris data set](https://en.wikipedia.org/wiki/Iris_flower_data_set). Our task is to predict the species of flower based off of measurements of sepeal length and width, and measurements of petal length and width. 

You will need to add the following packages:
 * CSV [documentation](https://juliadata.github.io/CSV.jl/stable/)



* Setosa
<img src="setosa.jpg" alt="Drawing" style="width: 150px; height: 150px"/>

* Versicolor
<img src="versicolor.jpg" alt="Drawing" style="width: 150px;"/>

* Virginica
<img src="virginica.jpg" alt="Drawing" style="width: 150px;"/>


In [1]:
using CSV
""" Provided you have a saved and valid .csv file in your current working directory, you may 
    load this file as a Dataframe using the following syntax. 
"""
iris = CSV.read("iris_data.csv")
println(iris)

150×5 DataFrames.DataFrame
│ Row │ SepalLength │ SepalWidth │ PetalLength │ PetalWidth │ Species    │
│     │ [90mFloat64[39m     │ [90mFloat64[39m    │ [90mFloat64[39m     │ [90mFloat64[39m    │ [90mString[39m     │
├─────┼─────────────┼────────────┼─────────────┼────────────┼────────────┤
│ 1   │ 5.1         │ 3.5        │ 1.4         │ 0.2        │ setosa     │
│ 2   │ 4.9         │ 3.0        │ 1.4         │ 0.2        │ setosa     │
│ 3   │ 4.7         │ 3.2        │ 1.3         │ 0.2        │ setosa     │
│ 4   │ 4.6         │ 3.1        │ 1.5         │ 0.2        │ setosa     │
│ 5   │ 5.0         │ 3.6        │ 1.4         │ 0.2        │ setosa     │
│ 6   │ 5.4         │ 3.9        │ 1.7         │ 0.4        │ setosa     │
│ 7   │ 4.6         │ 3.4        │ 1.4         │ 0.3        │ setosa     │
│ 8   │ 5.0         │ 3.4        │ 1.5         │ 0.2        │ setosa     │
│ 9   │ 4.4         │ 2.9        │ 1.4         │ 0.2        │ setosa     │
│ 10  │ 4.9         │ 3

We next construct data matrices $X$ and $Y$. The matrix $X$ will be the $4\times150$ matrix, where each column corresponds to the measurements for a given flower. The $Y$ matrix will be the $3\times150$ matrix, where each $i$th column corresponds to the one-hot encoding of the label for the $i$th flower. 

In [2]:
X = zeros(4, 150)
Y = zeros(3, 150)

for i = 1:150
    for j = 1:4
        X[j, i] = iris[i, j]
        if iris[i , 5] == "setosa"
            Y[1, i] = 1.0
        elseif iris[i, 5] == "versicolor"
            Y[2, i] = 1.0
        else
            Y[3, i] = 1.0
        end
    end
end

## Building the Network Architecture 
For our purposes, we will build a multilayered perceptron with $4$ input notes, $2$ hidden layers, and $3$ output nodes. 

<img src="multilayerPerceptron.jpg" alt="Drawing" style="width: 450px;"/>

Each node in our network will have two phases, preactivation, and postactivation. The preactivation phase consists of a weighted linear combination of postactivation values in the previous layer. The postactivation values consists of passing the preactivation value through an activation function elementwise. For our activation function, we will use the sigmoid function:


* Sigmoid Function
$$
\sigma(s) = \frac{1}{1+e^{-s}}.
$$


In [3]:
# Define sigmoid function and its derivative
σ(s) = 1/(1+exp(-s))
dσ(s) = σ(s)*(1 - σ(s))

# Define softmax function
softmax(a, i) = exp(a[i])/(sum(exp(a[j]) for j = 1:length(a)))

# Define cross-entropy loss function
L(O, y) = -sum(y[i]*log(O[i]) for i = 1:length(y))

# Define Hadamard Product
hadamard(x,y) = [x[i]*y[i] for i = 1:length(x)];

In [4]:
function forward_propagation(x, y, W, b)
    a1 = copy(x)
    z2 = W[1]*a1 + b[1]
    a2 = σ.(z2)
    
    z3 = W[2]*a2 + b[2]
    a3 = σ.(z3)
    
    z4 = W[3]*a3 + b[3]
    a4 = σ.(z4)
    
    a = [a1, a2, a3, a4]
    z = [[0.0], z2, z3, z4]
    O = [softmax(a4, i) for i = 1:length(a4)]
    loss = L(O, y)
    return a, z, O, loss
end

forward_propagation (generic function with 1 method)

In [5]:
function backpropagation(x, y, W, b)
    a, z, O, loss = forward_propagation(x, y, W, b)
    δ4 = a[4] - y
    δ3 = hadamard(W[3]'*δ4, dσ.(z[3]))
    δ2 = hadamard(W[2]'*δ3, dσ.(z[2]))
    δ = [[0.0], δ2, δ3, δ4]
    return a, δ
end

function ∇L(x, y, W, b)

    a, δ = backpropagation(x, y, W, b)
    
    db1 = copy(δ[2])
    db2 = copy(δ[3])
    db3 = copy(δ[4])
    
    dW1 = δ[2]*a[1]'
    dW2 = δ[3]*a[2]'
    dW3 = δ[4]*a[3]'
    return [db1, db2, db3], [dW1, dW2, dW3]
end


function gradient_descent!(x, y, W, b, α)
    db, dW = ∇L(x, y, W, b)
    for i = 1:length(W)
        W[i] -= α*dW[i]
        b[i] -= α*b[i]
    end
end

gradient_descent! (generic function with 1 method)

In [6]:
function mini_batch_∇L(train_data, train_label, W, b, m)

    i = rand(1:100)
    a, δ = backpropagation(train_data[:,i], train_label[:,i], W, b)
    
    db1 = δ[2]
    db2 = δ[3]
    db3 = δ[4]
    
    dW1 = δ[2]*a[1]'
    dW2 = δ[3]*a[2]'
    dW3 = δ[4]*a[3]'
    
    for _ in 1:m
        j = rand(1:100)
        a, δ = backpropagation(train_data[:,j], train_label[:,j], W, b)
    
        db1 += copy(δ[2])
        db2 += copy(δ[3])
        db3 += copy(δ[4])
    
        dW1 += δ[2]*a[1]'
        dW2 += δ[3]*a[2]'
        dW3 += δ[4]*a[3]'
    end
    
    return [db1/m, db2/m, db3/m], [dW1/m, dW2/m, dW3/m]
end

mini_batch_∇L (generic function with 1 method)

In [7]:
function stochastic_gradient_descent!(train_data, train_label, W, b, α, m)
    db , dW = mini_batch_∇L(train_data, train_label, W, b, m)
    for i = 1:length(W)
        W[i] -= α*dW[i]
        b[i] -= α*b[i]
    end
end

stochastic_gradient_descent! (generic function with 1 method)

In [8]:
# Initialize weight matrices 
W1 = randn(5, 4)
W2 = randn(5, 5)
W3 = randn(3, 5)
W = [W1, W2, W3]

# Initialize bias 
b1 = -1*ones(5)
b2 = -1*ones(5)
b3 = -1*ones(3)
b = [b1, b2, b3]

3-element Array{Array{Float64,1},1}:
 [-1.0, -1.0, -1.0, -1.0, -1.0]
 [-1.0, -1.0, -1.0, -1.0, -1.0]
 [-1.0, -1.0, -1.0]            

In [9]:
function make_prediction(i)
    output = forward_propagation(X[:,i], Y[:,i], W, b)[3]
    println("      setosa       |     versicolor       |     virginica")
    println("----------------------------------------------------------------")
    println(output[1]," | ", output[2], "  |  ", output[3])
end       

make_prediction (generic function with 1 method)

In [14]:
for _ in 1:100000
    stochastic_gradient_descent!(X, Y, W, b, 0.38, 23)
end

In [18]:
make_prediction(110)

      setosa       |     versicolor       |     virginica
----------------------------------------------------------------
0.21194320265478428 | 0.5761144827027491  |  0.21194231464246657
