## Neural Networks From Scratch

Here we will build a simple feed forward artificial Neural network from scratch

First step by step and then functionise it.

In [2]:
#first of all we will need some packages
import numpy as np
import pandas as pd
import math
import random

In [3]:
#lets create a small numpy array of input data, this is the same as our deforestation data
Input = np.array([10,1,4])

In [4]:
#take a look at it
Input

array([10,  1,  4])

In [5]:
#Ok for dimensional reasons we want to transpose our Input

In [6]:
Input = Input.reshape((Input.shape[0],1))

In [7]:
Input

array([[10],
       [ 1],
       [ 4]])

From the lecture we had a simple structure:

3 Input : 4 Hidden Layer : 1 Output

In [8]:
#So lets generate an array that is the weights our hidden layer of size 4 
# the weights need to be generated randomly and we will generate them between -4 and +4 for ease
mean = 0
stddev = 3
Hidden = np.array([random.gauss(mean,stddev) for _ in range(12)])

In [9]:
Hidden

array([ 3.20681229,  5.35992724,  1.51177548, -1.17093425,  4.54976766,
       -2.22912093,  0.1594277 ,  1.37598663,  1.25677934, -1.30370169,
       -0.52900196, -1.88578714])

In [10]:
#We want to make sure that we have the correct dimensions here
Hidden = Hidden.reshape(len(Input),4)

In [11]:
Hidden

array([[ 3.20681229,  5.35992724,  1.51177548, -1.17093425],
       [ 4.54976766, -2.22912093,  0.1594277 ,  1.37598663],
       [ 1.25677934, -1.30370169, -0.52900196, -1.88578714]])

In [12]:
#Lets also generate some random biases
BiasHidden = np.array([random.gauss(mean,stddev) for _ in range(4)])

In [13]:
BiasHidden

array([-1.20959402,  2.32626214,  5.0973965 ,  1.39541992])

In [14]:
#Ok so we have our input and our hidden layer now we want to work out the first layer
#which means we need to do our first multiplication

Layer1 = np.dot(Input.T,Hidden)

In [15]:
#And here are our outputs for layer 1 after multiplication
Layer1

array([[ 41.64500794,  46.15534474,  13.16117471, -17.87650441]])

In [16]:
#Remember to add the biases
Layer1 = Layer1+BiasHidden

In [17]:
Layer1

array([[ 40.43541392,  48.48160687,  18.25857121, -16.48108449]])

In [18]:
#Now we have to apply our activation functions, we covered a few activation functions in the lecture and specifically for the hidden layer
#we looked at ReLu and tanh so we will define them here as functions

#ReLU is simple as it just turns any negative values to 0
def ReLU(x):
    return np.maximum(x,0)

In [19]:
ReLU(Layer1)

array([[40.43541392, 48.48160687, 18.25857121,  0.        ]])

In [20]:
#tanh is a bit more complicated taking the form e^x - e^-x / e^x + e^-x
def tanh(x):
    ex = np.exp(x)
    enx = np.exp(-x)
    return (ex - enx) / (ex + enx)

In [21]:
tanh(Layer1)

array([[ 1.,  1.,  1., -1.]])

In [22]:
#Tanh is actually already inside numpy
np.tanh(Layer1)

array([[ 1.,  1.,  1., -1.]])

In [23]:
#lets use tanh as our activation function 
Layer1out = tanh(Layer1)

In [24]:
Layer1out

array([[ 1.,  1.,  1., -1.]])

In [25]:
#next we will need the weights of our output layer we are trying to collapse 4 into 1 so we need 4 weights
OutWeights = np.array([random.gauss(mean,stddev) for _ in range(4)])

In [26]:
OutWeights

array([ 3.67928807, -3.13259297,  6.1587629 ,  0.79561749])

In [27]:
Out = np.dot(Layer1out,OutWeights)

In [28]:
Out

array([5.90984051])

In [29]:
#We need a bias for the final one
BiasOut = np.array([random.gauss(mean,stddev) for _ in range(1)])

In [30]:
BiasOut

array([-0.09625819])

In [31]:
Out += BiasOut

In [32]:
#Now we need our final activation function the sigmoid function
def Sigmoid(x):
    return 1 / (1 + np.exp(-x))

In [33]:
OutPrediction = Sigmoid(Out)

In [34]:
OutPrediction

array([0.99702218])

In [35]:
#Ok we have done a complete forward pass through our neural network and the model is predicting this area will be deforested with a probability
#as this area was deforested it should be 1 so lets calculate the error, we will make a function that does this
def LogLoss(x):
    return -np.log(x)

In [36]:
Error = LogLoss(OutPrediction)

In [37]:
Error

array([0.00298226])

In [38]:
#Now we need to begin our back propagation with this error the first step will need the derivative of the sigmoid function
# the dsigmoid = sigmoid * 1 - Sigmoid
def SigmoidDerivative(x):
    return (1 / (1 + np.exp(-x))) * (1 - (1 + np.exp(-x)))

In [39]:
#we also need our log loss derivative which is just -1/-logloss
def DLogLoss(x):
    return 1/(-x)

In [40]:
#our error composite then becomes DLogLoss * Sigmoid Derivative
Errorcomp = DLogLoss(Error) * SigmoidDerivative(OutPrediction)

In [41]:
Errorcomp

array([90.37685675])

In [42]:
#now we back propagate this through the weights
grads1 = np.dot(Errorcomp, Layer1out)

In [43]:
grads1

array([ 90.37685675,  90.37685675,  90.37685675, -90.37685675])

In [44]:
#Now to make the change in bias which is the sum of the changes in weights
bias1 = sum(grads1)

In [45]:
bias1

180.7537135066233

In [46]:
#next we need to backcast the error into the final layer of weights
#for this we will need the differential of the tanh activation function that we used
def DTanh(x):
    return 1 - np.tanh(x)**2

In [47]:
DLayer1 = DTanh(Layer1out)

In [48]:
DLayer1

array([[0.41997434, 0.41997434, 0.41997434, 0.41997434]])

In [49]:
#now we multiple this by our change in the last layer or grads 1
Dstep1 = DLayer1 * (Errorcomp * OutWeights)

In [50]:
Dstep1

array([[ 139.65091411, -118.90057616,  233.76176393,   30.19842648]])

In [51]:
#we have this step which is the error and the differential of layer 1 together we now need to multiply this by the input to get our weights
grads2 = np.dot(Input,Dstep1)

In [52]:
grads2

array([[ 1396.50914111, -1189.00576161,  2337.61763932,   301.98426476],
       [  139.65091411,  -118.90057616,   233.76176393,    30.19842648],
       [  558.60365644,  -475.60230464,   935.04705573,   120.7937059 ]])

In [311]:
bias2 = sum(sum(grads2))

In [312]:
bias2

-125.10991423501088

In [230]:
#now we have our changes in weights and changes in biases we just need to add these to our weights and biases
Hidden += grads2
OutWeights += grads1
BiasHidden += bias2
BiasOut += bias1

And that is that, we have updated all the weights and biases in one full pass if we run it through again it would hopefully perform better

Now we should turn it into a nice function.

In [64]:
#lets start with a function that creates the network
def InitialiseNetwork(Inputsize,HiddenSize,OutSize):
    mean = 0
    stddev = 3
    Hidden = np.array([random.gauss(mean,stddev) for _ in range(Inputsize * HiddenSize)])
    Hidden = Hidden.reshape(Inputsize,HiddenSize)
    BiasHidden = np.array([random.gauss(mean,stddev) for _ in range(HiddenSize)])
    OutWeights = np.array([random.gauss(mean,stddev) for _ in range(HiddenSize * OutSize)])
    OutWeights = OutWeights.reshape(OutSize,HiddenSize)
    BiasOut = np.array([random.gauss(mean,stddev) for _ in range(OutSize)])
    return dict(Hidden = Hidden, BiasHidden = BiasHidden,OutWeights = OutWeights,BiasOut = BiasOut )
    
                          

In [56]:
Net1 = InitialiseNetwork(3,4,1)

In [57]:
Net1

{'Hidden': array([[-1.68627158, -0.36213892,  0.58467549,  5.25495914],
        [ 2.30930193, -4.125814  , -0.81786148,  6.35209179],
        [-0.61522387, -1.08612155,  1.05474095, -0.67232774]]),
 'BiasHidden': array([ 1.78557522,  2.12804168, -2.39568901,  4.68914505]),
 'OutWeights': array([[ 0.5392547 ,  1.28473714, -0.56769683,  5.80142289]]),
 'BiasOut': array([0.17311207])}

In [58]:
def ForwardPass(Input,Net):
    Forward1 = tanh((np.dot(Input.T,Net['Hidden'])) + Net['BiasHidden'])
    Forward2 = Sigmoid(np.dot(Forward1,Net['OutWeights'].T)+Net['BiasOut'])
    return dict(Forward1 = Forward1, Forward2 = Forward2)
    

In [59]:
FW1 = ForwardPass(Input,Net1)

In [60]:
FW1

{'Forward1': array([[-1.        , -1.        ,  0.99999776,  1.        ]]),
 'Forward2': array([[0.97295531]])}

In [61]:
def BackwardsPass(Input,Forward,Net):
    Error = LogLoss(Forward['Forward2'])
    Errorcomp = DLogLoss(Forward['Forward2']) * SigmoidDerivative(Forward['Forward2'])
    grads1 = Errorcomp * Forward['Forward1']
    bias1 = sum(sum(grads1))
    grads2 = np.dot(Input,(Errorcomp * Net['OutWeights'] * DTanh(Forward['Forward1'])))
    bias2 = sum(sum(grads2))
    Net['Hidden'] += grads2
    Net['BiasHidden'] += bias2
    Net['OutWeights'] += grads1
    Net['BiasOut'] += bias1
    return Net, Error

In [62]:
Net2,Err = BackwardsPass(Input,FW1,Net1)

In [63]:
Net2

{'Hidden': array([[-1.04780718,  1.15895851, -0.08746598, 12.12370222],
        [ 2.37314837, -3.97370426, -0.88507562,  7.0389661 ],
        [-0.35983811, -0.47768258,  0.78588437,  2.0751695 ]]),
 'BiasHidden': array([14.31982039, 14.66228685, 10.13855615, 17.22339022]),
 'OutWeights': array([[ 0.25733853,  1.00282097, -0.28578128,  6.08333906]]),
 'BiasOut': array([0.17311144])}

In [308]:
Err

array([[0.0004194]])

There we go if we run this over and over on the same data it will improve

Try than now using a for loop:

Next thing to do is to make the neural network a class. All the functions are there for you all you need to do is make a class that works with any size:

## Task

As a final task for this practical we will build a neural network to classify some real world data. You should have two data sets called ForestClass either training or testing. These are real world datasets extracted from satelite images. The class variable is the type of forest that the data is from and the other data are all bandwidth measurements from band 1:9 of satelite images. Build a neural network that will predict the type of forest. For an easy version just try and classify between the two most common types or if you want more of a challenge you can try and classify between all of them, for which you will need the softmax function:

In [1]:
#softmax is very much like the sigmoid function but for multiple classes
def softmax(x):
   #the softmax is defined as e^x-max(x) / sum(e^x-max(x))
    e_x = np.exp(x - np.max(x))
    #it will return an array that is the softmax of an array the highest value in the softmax is your predicted class
    return e_x / e_x.sum(axis = 0)

In [2]:
#the softmax derivative is incredibly complicated as it uses a masking jacobian matrix but you will also need it
def softmax_dif(s):
    #if this isn't working try and transpose your softmax output
    jacobian_m = np.diag(s)

    for i in range(len(jacobian_m)):
        for j in range(len(jacobian_m)):
            if i == j:
                jacobian_m[i][j] = s[i] * (1-s[i])
            else: 
                jacobian_m[i][j] = -s[i]*s[j]
    return jacobian_m

Good Luck

In [None]:
# the easiest thing is to use this function to go straight to the errorcomp 
def delta_cross_entropy(X,y):
    """
    X is the output from the softmax layer (num_examples x num_classes)
    y is labels array so e.g [0,0,1,0]
    	Note that y is not one-hot encoded vector. 
    	It can be computed as y.argmax(axis=1) from one-hot encoded vectors of labels if required.
    """
    m = y.shape[0]
    grad = softmax(X)
    grad[range(m),y] -= 1
    grad = grad/m
    return grad