<center><h2>COE49412 Neural Networks and Deep Learning</h2></center>

<center><h3>Lab 6 - Implementing Back Propagation</h3></center>
<br>
<b> Objectives: </b>

* To implement a neural network in Python.
* To do a single iteration of a forward pass.

<b> Refer to your previous lecture notes and notebook (Implementing BP) to solve the exercises </b>

<b> Submit: </b>
* Jupyter notebook with the solutuion

<b> Due Date: Wednesday, 25th March 2020, 11:59pm </b><br><br>
<b><i>Please enter your Student ID & Name below:</i></b>

In [1]:
## Student ID: b00069350
## Student Name: Danayal Khan

<hr>

In [2]:
## For the following exercises, implement the neural networks following the algorithms discussed in Implementing BP lecture.
## https://medium.com/@a.mirzaei69/implement-a-neural-network-from-scratch-with-python-numpy-backpropagation-e82b70caa9bb
    

**Exercise 1**

Consider the simple 3-layer neural network in Figure 1. Implement this neural network and print the predicted values of <b> $z_{i}$ and $a^l$ </b>, after the first forward pass <br>


<center><img src="imgs/Lab5_Ex1.png" style="width:400px;"></center><caption><center><b>Figure 1</b>: 3-layer neural network.</center></caption><br>

**Note:**

- z= self.weights[i].dot(a) + self.biases[i]
- a = activation_function(z_s[-1])
- For this exercise, use the <b> ReLU activation</b> function: $f(z)=max(0,z)$

- Assume the actual output value $y = 2$ for the given input $x_{1}$ = 4

- Use the **Loss** function as: $Loss = (x-y)^{2}$

- The initial <b>weights</b> are given as: <br>
w1 = $\begin{bmatrix}0.65 \end{bmatrix}$
<br>
w2 = $\begin{bmatrix}0.75 \end{bmatrix}$


In [3]:
## Implementing backpropagation
## Source: 
## https://medium.com/@a.mirzaei69/implement-a-neural-network-from-scratch-with-python-numpy-backpropagation-e82b70caa9bb
## comments added to the original code

import numpy as np

class NeuralNetwork(object):
    
    # The constructor takes 
    # 1. layers representing the number of nodes
    # 2. activations representing the activation functions to choose in
    #    each layer. 
    
    def __init__(self, layers = [2 , 10, 1], activations=['sigmoid', 'sigmoid']):
        
        # check to make sure that no. of layers is one more than 
        # no. of activation functions because the input layer 
        # has no activation function.
        
        assert(len(layers) == len(activations)+1)
        
        # define the local variables layers and activations
        self.layers = layers
        self.activations = activations
        
        # initialize weights and biases as two lists to hold
        # weights and biases for each layer
        
        self.weights = []
        self.biases = []
        
        # create random weights for biases and weights
        # for each of the layes.
        
        for i in range(len(layers)-1):
            self.weights.append(np.random.randn(layers[i+1], layers[i]))
            self.biases.append(np.random.randn(layers[i+1], 1))
    
    # do feedforward for x
    # where x is an input
    
    # return a list of a's and z's as expected.
    # a's and z's are layer by layer
    
    def feedforward(self, x):
        # make a copy of x
        a = np.copy(x)
        
        # this variable will contain all the z's
        z_s = []
        
        # this variable will contain all the a's
        # the output of the input layer is simply x which is the
        # input. So we initialize the a_s to contain a. 
        a_s = [a]
        
        # for each layer do
        for i in range(len(self.weights)):
            
            # retrieve the appropriate activation function
            activation_function = self.getActivationFunction(self.activations[i])
            
            # create z_s by z = w.a + b for each layer
            z_s.append(self.weights[i].dot(a) + self.biases[i])
            
            # create z_a by a = f(row) -- 
            # note that we apply it to the last element only 
            # by using the -1 notation. We only want to apply
            # the activation function to the last layer just 
            # added.
            
            a = activation_function(z_s[-1])
            
            # keep track of the new activation or a_s 
            a_s.append(a)
            
            # return both z_s and a_s 
            # we will have z_s and a_s for each layer 
        return (z_s, a_s)

    
    # takes the y -- the actual answer, a's and z's and 
    # calculates the dLoss/dw and dLoss/db
    
    def backpropagation(self,y, z_s, a_s):
        
        # initialize list of dLoss/dw and dLoss/db
        dw = []  # dLoss/dw
        db = []  # dLoss/db
        
        # create an empty list of deltas, one for each weight
        deltas = [None] * len(self.weights)  # delta = dLoss/dz  known as error for each layer
        
        
        # start from the back and insert the last layer error
        # based on the square loss function. Note -1 is used to 
        # fill things from the back of the list 
        # also note that we need to use the derivative function 
        # for the activation function.
        # note that we do not need to use the 2 in the loss function derivation
        
        # again note this is for the last layer only!
        
        deltas[-1] = ((y-a_s[-1])*(self.getDerivitiveActivationFunction(self.activations[-1]))(z_s[-1]))
        
        
        # Perform BackPropagation
        
        # for the rest of the deltas, go in reverse order
        for i in reversed(range(len(deltas)-1)):
            deltas[i] = self.weights[i+1].T.dot(deltas[i+1])*(self.getDerivitiveActivationFunction(self.activations[i])(z_s[i]))        
        
        
        #a= [print(d.shape) for d in deltas]
        
        # now we need to update the weights based on the calculated
        # deltas
        
        #now we will determine the batch size from the first dimension 
        #of shape of y. We simply want to see how many test cases are there
        #for example there may be 10 y's; one for each x. 
        
        batch_size = y.shape[1]
        
        # determine the two derivatives by taking 
        # the average according to batch sizes 
        
        db = [d.dot(np.ones((batch_size,1)))/float(batch_size) for d in deltas]
        dw = [d.dot(a_s[i].T)/float(batch_size) for i,d in enumerate(deltas)]
        
        # return the derivitives respect to weight matrix and biases
        return dw, db

    
    # Now we will write the main training function that uses
    # feedforward and backpropagation many times (called epochs)
    # lr (learning rate) is the eta in our equations.
    
    def train(self, x, y, batch_size=10, epochs=100, lr = 0.01):
    
    # update weights and biases based on the output
    # for the number of epochs
    
        for e in range(epochs): 
            i=0
            
            # Do the training in batches
            # each batch is a subset of the original 
            # data 
            
            while(i<len(y)):
                
                # extract a batch
                x_batch = x[i:i+batch_size]
                y_batch = y[i:i+batch_size]
                
                # update i for the next batches
                i = i+batch_size
                
                # do the feedforward for the batch and update the weights
                # based on the average loss for each weight for the whole
                # batch.
                
                z_s, a_s = self.feedforward(x_batch)
                
                # do the back propagation 
                dw, db = self.backpropagation(y_batch, z_s, a_s)
                
                
                # update the weights for each pair of weights and dw
                # and biases and db
                
                self.weights = [w+lr*dweight for w,dweight in  zip(self.weights, dw)]
                self.biases = [w+lr*dbias for w,dbias in  zip(self.biases, db)]
                
                # print the loss using a built in function 
                # to calculate the loss
                print("loss = ", np.linalg.norm(a_s[-1]-y_batch) )
    
    
    # This function is being used to return an activation function 
    # depending on its weights
    
    @staticmethod
    def getActivationFunction(name):
        if(name == 'sigmoid'):
            return lambda x : np.exp(x)/(1+np.exp(x))
        elif(name == 'linear'):
            return lambda x : x
        elif(name == 'relu'):
            def relu(x):
                y = np.copy(x)
                y[y<0] = 0
                return y
            return relu
        else:
            print('Unknown activation function. linear is used')
            return lambda x: x
    
    # This function returns the derivative of a function depending
    # on its name.
    
    @staticmethod
    def getDerivitiveActivationFunction(name):
        if(name == 'sigmoid'):
            sig = lambda x : np.exp(x)/(1+np.exp(x))
            return lambda x :sig(x)*(1-sig(x)) 
        elif(name == 'linear'):
            return lambda x: 1
        elif(name == 'relu'):
            def relu_diff(x):
                y = np.copy(x)
                y[y>=0] = 1
                y[y<0] = 0
                return y
            return relu_diff
        else:
            print('Unknown activation function. linear is used')
            return lambda x: 1







<hr>

In [4]:

# Let us try feedforward on one simple output

import matplotlib.pyplot as plt
    
nn = NeuralNetwork([1, 1, 1],activations=['relu', 'relu'])

x = np.array([4])
#x = [[1]] # 2 x 1 
## Let us do one feedforward
## Remember input is 1 row
z, a = nn.feedforward(x)

print("weights=", nn.weights)  ## 3 x 2

print("z_s=",z)
print("a_s=",a)

weights= [array([[0.05883504]]), array([[-0.20340329]])]
z_s= [array([[0.89535767]]), array([[0.29381479]])]
a_s= [array([4]), array([[0.89535767]]), array([[0.29381479]])]


**Exercise 2**

Consider the nerual network in Figure 2. Using a similar approach as Exercise 1, implement this neural network and print the predicted values of <b> $z_{i}$ and $a_{i}$ </b> after the first forward pass.

<center><img src="imgs/Lab5_Ex2.png" style="width:400px;"></center><caption><center><b>Figure 2</b>: Neural network with 1 input, 1 hidden layer with 3 nodes, and 1 output.<br></center></caption><br>

**Note:**
- z= self.weights[i].dot(a) + self.biases[i]
- a = activation_function(z_s[-1])
- For this exercise, use the <b> ReLU activation</b> function: $f(z)=max(0,z)$

- Assume the actual output value $y = 0.25$ for the given input $x_{1} = 0.5$

- Use the **Loss** function as: $Loss = (x-y)^{2}$

- The initial <b>weights</b> are given as: <br>
w1 = $\begin{bmatrix}0.15 & 0.45 & -0.35 \end{bmatrix}$
<br>
w2 = $\begin{bmatrix}0.23 & 0.65 & -0.15 \end{bmatrix}$




In [5]:
### ENTER YOUR CODE HERE ###


# Let us try feedforward on one simple output

import matplotlib.pyplot as plt
    
nn = NeuralNetwork([1, 3, 1],activations=['relu', 'relu'])

x = np.array([0.5])
#x = [[1]] # 2 x 1 
## Let us do one feedforward
## Remember input is 1 row
z, a = nn.feedforward(x)

print("weights=", nn.weights)  ## 3 x 2

print("z_s=",z)
print("a_s=",a)



weights= [array([[ 0.37686796],
       [ 1.04729499],
       [-1.19144963]]), array([[-0.97121173,  1.15703367,  0.78519048]])]
z_s= [array([[-1.44664164, -1.11142813, -2.23080044],
       [-0.19102765,  0.14418586, -0.97518645],
       [-3.11100477, -2.77579125, -3.89516356]]), array([[0.96427358, 1.13110147, 0.96427358]])]
a_s= [array([0.5]), array([[0.        , 0.        , 0.        ],
       [0.        , 0.14418586, 0.        ],
       [0.        , 0.        , 0.        ]]), array([[0.96427358, 1.13110147, 0.96427358]])]


<hr>

**Exercise 3**

Consider the nerual network in Figure 3. Using a similar approach as the previous exercises,implement this neural network and print the predicted values of <b> $z_{i}$ and $a_{i}$ </b> after the first forward pass. 

<center><img src="imgs/Lab5_Ex3.png" style="width:500px;"></center><caption><center><b>Figure 3</b>: Neural network with 4 inputs, 2 hidden layers, and 1 output.<br></center></caption><br>


**Note:**
- z= self.weights[i].dot(a) + self.biases[i]
- a = activation_function(z_s[-1])
- For this exercise, use the <b> ReLU activation</b> function: $f(z)=max(0,z)$

- Assume the actual output value $y = 7$ for the given inputs $x_{1} = 2$, $x_{2} = 3$, $x_{3} = 1$, $x_{4} = 1$

- Use the **Loss** function as: $Loss = (x-y)^{2}$

- The initial <b>weights</b> are given as: 

w1 = $\begin{bmatrix}
0.15 & 0.45 \\
-0.65 & -0.35 \\
0.35 & 0.25 \\
-0.55 & -0.75\end{bmatrix}$


w2 = $\begin{bmatrix}
0.23 & 0.15 & 0.45 \\ 
-0.001 & -0.9 & 0.25 \\
\end{bmatrix}$


w3 = $\begin{bmatrix}0.1 \\  0.2 \\ -0.1 \end{bmatrix}$


In [7]:
### ENTER YOUR CODE HERE ###

### ENTER YOUR CODE HERE ###


# Let us try feedforward on one simple output

import matplotlib.pyplot as plt
    
nn = NeuralNetwork([4, 2, 3, 1],activations=['relu', 'relu', 'relu'])

x = np.array([2,3,1,1])
#x = [[1]] # 2 x 1 
## Let us do one feedforward
## Remember input is 1 row
z, a = nn.feedforward(x)

print("weights=", nn.weights)  ## 3 x 2

print("z_s=",z)
print("a_s=",a)




weights= [array([[ 1.26990974,  2.05154436,  0.95100371,  0.68794692],
       [ 0.74603863, -1.40429693,  0.19958755,  0.45326056]]), array([[ 0.17739952, -0.02436658],
       [ 1.54964989, -1.05501852],
       [ 0.38061867, -1.90555293]]), array([[-0.22035439, -0.78576227,  1.49995501]])]
z_s= [array([[ 9.2417968 , -3.15957182],
       [10.68536311, -1.71600551]]), array([[  0.77709602,  -0.60202852],
       [  3.68296401,   0.63467055],
       [-17.54040742,  -0.69648283]]), array([[-3.47249543, -0.90602493]])]
a_s= [array([2, 3, 1, 1]), array([[ 9.2417968 ,  0.        ],
       [10.68536311,  0.        ]]), array([[0.77709602, 0.        ],
       [3.68296401, 0.63467055],
       [0.        , 0.        ]]), array([[0., 0.]])]


<hr>