# Implementing A Multi-layered Perceptron Using Numpy

This coursework requires you to write your own implementation of the backpropagation algorithm for training your own neural network. You are required to do this assignment in the python  (python version 3) programming language, using only numpy and/or scipy.

The goal of this assignment is to label images of 10 handwritten digits of “zero”, “one”,...,  “nine”. The images are 28 by 28 in size (mnist dataset), which will be represented as a vector x of  dimension 784 by listing all the pixel values in raster scan order. The labels t are 0,1,2,...,9 corresponding  to 10 classes as written in the image. There are 60000 training cases, containing 6000 examples of each of  10 classes. 

The way you choose to design your code for this homework will affect how much time you  spend coding. We recommend that you look through all of the problems before attempting the first  problem. A good foundation will make the rest of these problems easier. Use object oriented principles to code layers, activation function etc as classes. 

---



In [None]:
import numpy as np
import pandas as pd 
import os

np.random.seed(42)

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Problem 1

Here you must read an input file. Each line contains 785 numbers (comma delimited): the first number in each row denotes the class label: 0 corresponds to digit 0, 1 corresponds to digit 1, etc. The rest of the values are the 784 pixel values between 0 and 255 correspondig to black and white images.  As a warm up  question, load the data.

For this problem you must write a function that takes a file path as an argument which contains  this data. Your function must return two values (x and y) that contains the data from the file as  described. Specifically, the first return value (x) must be a matrix where the rows are individual  examples of images, and the columns are individual pixels (n x 784 matrix). The second return value  must be a list/array of real numbers representing the labels of the examples (rows) in x. 

eg: 

1.0,0.0,1.0,0.0,....0.0,0.25,0.0,0.0 

... 

1.0,0.0,1.0,0.0,...,1.0,0.0,0.0,0.96776 

x = [ 

[1.0,0.0,1.0,0.0,....0.0,0.25,0.0,0.0] 

... 

[1.0,0.0,1.0,0.0,...,1.0,0.0,0.0,0.96776] 

]

y = [5,...,2]

In [None]:
class load_data:

  def __init__(self,path):
    self.path = path
    self.read_file()

  '''
  Function for reading the data 
  '''
  def read_file(self):
  #loading the file using loadtxt function in numpy 
    data = np.loadtxt(self.path,delimiter=",",skiprows=1)
    self.Y = data[:, 0].astype(int)          # <slice> = <array>[start_row:end_row, start_col:end_col]
    self.X = data[:, 1:]
    
  '''
  Funtion which return data outside class when called 
  '''
  def get_data(self):
    return self.X,self.Y

In [None]:
path = "/content/drive/MyDrive/data_set/mnist_train.csv"      #path where the file is stored 

In [None]:
mnist = load_data(path)                    # making object 

In [None]:
X,Y=mnist.get_data()         #return X and y 

In [None]:
print(X,X.shape)

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]] (60000, 784)


In [None]:
print(Y,Y.shape)

[5 0 4 ... 5 6 8] (60000,)


## Problem 2 

Implement the backpropagation algorithm in a zero hidden layer neural network (weights between input and output nodes). The output layer should be a softmax output over 10 classes corresponding to 10 classes of handwritten digits (e.g. An architecture: 784 > 10). Your backprop code should minimize the cross-entropy function for multi-class classification problems (categorical  cross entropy). 

 

where j is the class label

This step should be done with a full step of gradient descent, not stochastic gradient descent or rmsprop. For this  problem you must write a function that takes as an input a matrix of x values, a list of y values (as  returned from problem 1), a weight matrix, and a learning rate and performs a single step of  backpropagation. You will need to do both a forward step with the inputs, and then a backward prop to  get the gradients. Return the updated weight matrix and bias in the same format as it was passed.

The list of weight matrices will be a list with 1 entry where the only entry is a matrix in the  format where the rows represent all of the outgoing weights for a neuron in the input layer and the  columns represent the weights for the incoming neurons. A specific row column index will give you the  weight for a neuron to neuron connection. 

The list of bias vectors will be in the form where each entry in the list is a vector with the same  length as the first set of weights. (e.g. For an architecture of 784 > 10, there will be a single element list  with a vector of size 10).

In [None]:
def loss(y_actual,y_pred):
    '''
    Calculating cost 
    '''
    cost = -np.sum(y_actual*np.log(y_pred))       #calculating the cost      categorical cross entropy 
    return cost/y_actual.shape[0]

In [None]:
# #softmax function 

def softmax(x, derivative=False):
    exp_x = np.exp(x)  
    softmax = exp_x / np.sum(exp_x, axis=1, keepdims=True)
    if derivative:
        return softmax * (1 - softmax)
    return softmax

In [None]:
def one_hot_encoded(x):
    '''
    one hot encoding 
    '''
    n_values = 10     #number of categories (for mnist categories = 10)
    y=np.eye(n_values)[x]     
    return y

In [None]:
def neural_network(X,Y,layer_nodes,epochs,learning_rate,momentum):
  
  weights = np.random.rand(layer_nodes[0],layer_nodes[len(layer_nodes)-1])*0.01  #  matrix containing weights between ouput and input layer 
  bias = np.random.rand(1,layer_nodes[len(layer_nodes)-1])*0.01
  Vdw = np.zeros_like(weights) # initializing the momentum term
  Vdb = np.zeros_like(bias) # initializing the momentum term
  y = one_hot_encoded(Y)
  
  x = X/255.0

  for i in range(epochs):
    
    #forward prop
    Z  = np.add((np.matmul(x,weights)),bias)  
    y_pred = A = softmax(Z)
    
    #cost calulation

    cost =  loss(y,y_pred)

    #back_prop
    #updatweing weights 
    delta = (y_pred - y)
    grad_w = np.matmul(x.T, delta)/y.shape[0]
    Vdw = (momentum)*Vdw + (1- momentum)*grad_w                         
    weights = weights -  learning_rate*Vdw

    #updating bias 
    grad_B = np.sum(delta, axis=0, keepdims=True) / y.shape[0]
    Vdb = (momentum)*Vdb + (1- momentum)*grad_B
    bias = bias - learning_rate*Vdb

    if i%10 == 0:
      print("Epochs :",i)
      y_temp = np.argmax(y_pred, axis = 1)
      accuracy = (Y == y_temp).sum() / len(Y)
      print(f"Accuracy: {accuracy*100} Cost:{cost}")
    elif i == epochs-1:
      print('training completed')

  return weights,bias 

In [None]:
weight,bias=neural_network(X,Y,[784,10],100,0.05,0)

Epochs : 0
Accuracy: 10.783333333333333 Cost:2.305870774569131
Epochs : 10
Accuracy: 74.88166666666667 Cost:1.8577564722962943
Epochs : 20
Accuracy: 77.71666666666667 Cost:1.5518255795233806
Epochs : 30
Accuracy: 79.46 Cost:1.3405131577699605
Epochs : 40
Accuracy: 80.68833333333333 Cost:1.1906338986975964
Epochs : 50
Accuracy: 81.62166666666667 Cost:1.0804939637095197
Epochs : 60
Accuracy: 82.31333333333333 Cost:0.9966955091681919
Epochs : 70
Accuracy: 82.88333333333333 Cost:0.9309556117369875
Epochs : 80
Accuracy: 83.32166666666667 Cost:0.8780235337240258
Epochs : 90
Accuracy: 83.74166666666667 Cost:0.8344619646845146
training completed


In [None]:
weight.shape,weight

((784, 10),
 array([[0.0037454 , 0.00950714, 0.00731994, ..., 0.00866176, 0.00601115,
         0.00708073],
        [0.00020584, 0.0096991 , 0.00832443, ..., 0.00524756, 0.00431945,
         0.00291229],
        [0.00611853, 0.00139494, 0.00292145, ..., 0.00514234, 0.00592415,
         0.0004645 ],
        ...,
        [0.0022309 , 0.00056516, 0.00103395, ..., 0.00940537, 0.00905944,
         0.00566806],
        [0.00354424, 0.00381147, 0.00411852, ..., 0.00335946, 0.00560694,
         0.00095057],
        [0.00200469, 0.00413466, 0.00206203, ..., 0.00691864, 0.00383069,
         0.00869099]]))

In [None]:
bias.shape,bias 

((1, 10),
 array([[-0.03061242,  0.08054823, -0.01777461, -0.01511964,  0.02659754,
          0.03242076, -0.00094546,  0.03535134, -0.06060231, -0.00553034]]))

## Problem 3 

Extend your code from problem 2 to support a single layer neural network with n hidden units (e.g. An  architecture: 784 > 10 > 10). These hidden units should be using sigmoid activations. 

For this problem you must write a function that takes as an input a matrix of x values, a list of y  values (as returned from problem 1), list of weight matrices, a list of bias vectors, and a learning rate and performs a single step of backpropagation. You will need to do both a forward step with the inputs to get the outputs, and then a backward prop to get the gradients. Return the  updated weight matrix and bias in the same format as it was passed.

The list of weight matrices is a list with 2 entries where each entry in the list contains a single weight matrix as previously defined in problem 2. For a network with shape 784 > 10 > 10 the passed list  of weight matrices would look like this: [matrix with shape 784x10, matrix with shape 10x10]. Note:  though a hidden layer of size 10 is used as an example here, your code must be able to support a hidden  layer of dimension n.

The list of bias vectors will be in the form where each entry in the list is a vector with the same  length as the first set of weights. (e.g. For an architecture of 784 > 10 > 10, there will be a two element  list with an vector of size 10 and a vector of size 10) 

In [None]:
#relu function 
def relu(x, derivative=False):
    if derivative:
        return np.where(x <= 0, 0, 1)
    else:
        return np.maximum(0, x)

In [None]:
#tanh function 
def tanh(x,derivative=False):
    t=(np.exp(x)-np.exp(-x))/(np.exp(x)+np.exp(-x)) 
    if derivative:
         return (1-t**2)
    return t

In [None]:
def sigmoid(x,derivative =False):
    sig = 1 / (1 + np.exp(-x))
    if derivative:
        return sig * (1 - sig)
    return sig

In [None]:
def Activation(input,function_type,derivative=False):

  if function_type == "sigmoid":
    if derivative:
      sigmoid(input,True)
    return sigmoid(input,False)

  if function_type == "relu":
    if derivative:
      relu(input,True)
    return relu(input,False)

  if function_type == "tanh":
    if derivative:
      tanh(input,True)
    return tanh(input,False)


In [None]:
#checking the data values 

lst = [None]*10
for i in range(len(lst)):
   count = 0
   for j in Y:
     if i == j:
       count+=1
       lst[i] = count

print(lst)

[5923, 6742, 5958, 6131, 5842, 5421, 5918, 6265, 5851, 5949]


In [None]:
def neural_network(X,Y,layer_nodes,epochs,activation_type,learning_rate,momentum,initialization):
  weights = [None]*(len(layer_nodes)-1)
  bias = [None]*(len(layer_nodes)-1)
  Z = [None]*(len(layer_nodes)-1)
  A = [None]*(len(layer_nodes)-1)
  Vdw = [None]*(len(layer_nodes)-1)
  Vdb = [None]*(len(layer_nodes)-1)
  
  
  for i in range(len(layer_nodes)-1):
     if initialization == "xavier":
            # Xavier initialization 
            weights[i] = np.random.randn(layer_nodes[i], layer_nodes[i+1]) * np.sqrt(1/layer_nodes[i])
     elif initialization == "he":
            # He initialization 
            weights[i] = np.random.randn(layer_nodes[i], layer_nodes[i+1]) * np.sqrt(2/layer_nodes[i])
     bias[i] = np.random.rand(1,layer_nodes[i+1])*0.01
     Vdw[i] = np.zeros_like(weights[i]) # initializing the momentum term
     Vdb[i] = np.zeros_like(bias[i]) # initializing the momentum term

  y = one_hot_encoded(Y)
  # training 
  x = X/255.0
  for i in range(epochs):
      
    
    inputs = x
    #forward prop
    for j in range(len(layer_nodes)-2):
       Z[j]  = np.add(np.matmul(inputs,weights[j]),bias[j])
       A[j] = Activation(Z[j],activation_type)
       inputs =  A[j]
      #  print("ssa",Z[j][:2],A[j][:2])
    Z[-1] = np.add(np.matmul(inputs,weights[-1]),bias[-1])
    y_pred = A[-1] = softmax(Z[-1])
    
    
    #cost calulation
    cost =  loss(y,y_pred)
  
    #back_prop
    
    delta = [None] * (len(layer_nodes) - 1)
    delta[-1] = y_pred - y
    # print("delta -1",delta[-1])
    for j in range(len(layer_nodes) - 2, 0, -1):
            delta[j-1] = (np.matmul(delta[j], weights[j].T) * Activation(Z[j-1], activation_type, True))

        # weight and bias updates
    for j in reversed(range(len(layer_nodes)-1)):

            dW = np.matmul(A[j-1].T, delta[j])/y.shape[0]  if j != 0 else np.matmul(x.T, delta[j])/y.shape[0] 
            db = np.sum(delta[j], axis=0, keepdims=True)/y.shape[0] 

            Vdw[j] = momentum*Vdw[j] + (1-momentum)*dW
            Vdb[j] = momentum*Vdb[j] + (1-momentum)*db

            weights[j] = weights[j] - learning_rate * Vdw[j]
            bias[j] = bias[j] - learning_rate * Vdb[j]

    if i%10 == 0:
      print("Epochs :",i)
      y_temp = np.argmax(y_pred, axis = 1)
      accuracy = (Y == y_temp).sum() / len(Y)
      print(f"Accuracy: {accuracy*100} Cost:{cost}")
    elif i == epochs-1:
      print('training completed')

  return weights,bias 

In [None]:
weight_hidden_1,bias_hidden_1 = neural_network(X,Y,[784,10,10],100,"sigmoid",0.05,0,"xavier")

Epochs : 0
Accuracy: 9.871666666666666 Cost:2.501220155359159
Epochs : 10
Accuracy: 9.871666666666666 Cost:2.333733816528802
Epochs : 20
Accuracy: 10.523333333333333 Cost:2.2626854951222986
Epochs : 30
Accuracy: 13.113333333333333 Cost:2.2098079089282057
Epochs : 40
Accuracy: 16.53333333333333 Cost:2.1653831166839224
Epochs : 50
Accuracy: 22.925 Cost:2.129570187872095
Epochs : 60
Accuracy: 29.031666666666666 Cost:2.102033487702629
Epochs : 70
Accuracy: 31.838333333333335 Cost:2.0809716131614295
Epochs : 80
Accuracy: 33.68333333333333 Cost:2.064429273310835
Epochs : 90
Accuracy: 35.22833333333333 Cost:2.051235606757686
training completed


In [None]:
for i in weight_hidden_1:
 print("shape",i.shape,i)

shape (784, 10) [[ 0.03867857 -0.03844219 -0.02406393 ... -0.01701978 -0.01067215
  -0.01279444]
 [-0.01972362 -0.0391837   0.08185397 ...  0.00461384 -0.00225978
  -0.02356245]
 [ 0.03513758  0.00449595 -0.01701969 ... -0.04273806 -0.0094843
   0.04893221]
 ...
 [-0.00800861 -0.00752981  0.07983669 ... -0.04757592 -0.0289131
   0.06792547]
 [-0.01864408  0.06854576 -0.0444058  ...  0.01609894 -0.0148684
  -0.00476128]
 [-0.0113951  -0.0326662  -0.03171296 ... -0.02758959  0.02641997
   0.05304345]]
shape (10, 10) [[ 0.26076448 -0.08006469  0.31616795  0.31060359 -0.24993848 -0.20563951
   0.43160543 -0.22737842 -0.00585578 -0.34983297]
 [-0.33010044  0.05204221 -0.29499973 -0.22014834  0.18863332 -0.30901265
  -0.30771959 -0.15798602 -0.05709704  0.3151051 ]
 [ 0.38593846 -0.19185248  0.69187401 -0.28337538  0.39048068 -0.24849363
  -0.30336313 -0.17322111 -0.06199649  0.17345506]
 [ 0.07355608 -0.30922649 -0.34820993 -0.17062544  0.02584715 -0.08015863
  -0.38784719  0.21931706  0.69

In [None]:
for i in bias_hidden_1:
 print("shape",i.shape,i)

shape (1, 10) [[ 0.03689095  0.05795016 -0.02511896  0.0157375   0.12489849  0.02142296
  -0.00228715  0.02151841  0.02528147  0.03963077]]
shape (1, 10) [[-0.32478797  0.10274106 -0.00376753  0.21034772  0.11782068 -0.05593662
  -0.06920574  0.20041549 -0.07946515 -0.05747278]]


## Problem 4 

Extend your code from problem 3 (use cross entropy error) and implement a multi-layer neural  network, starting with a simple architecture containing any number of hidden units in each layer (e.g. With  architecture: 784 > 10 > 10 > 10). These hidden units should be using sigmoid activations. 

For this problem you must write a function that takes as an input a matrix of x values, a list of y  values (as returned from problem 1), list of weight matrices, a list of bias vectors, and a learning rate and  performs a single step of backpropagation. You will need to do both a forward step with the inputs to  get the outputs, and then a backward prop to get the gradients. Return the updated weight matrix and  bias in the same format as it was passed. 

The list of weight matrices is a list with k entries where each entry in the list contains a single  weight matrix as previously defined in problem 2. For a network with shape 784 > 10 > 10 > 10 the  passed list of weight matrices would look like this: [matrix with shape 784x10, matrix with shape 10x10,  matrix with shape 10x10]. Note: though a hidden layer of size 10 is used as an example here, your code  must be able to support a hidden layer of dimension n. 

The list of bias vectors will be in the form where each entry in the list is a vector with the same  length as the first set of weights. (e.g. For an architecture of 784 > 10 > 10, there will be a two element  list with an vector of size 10 and a vector of size 10) 

In [None]:
weight_hidden_2_sigmoid,bias_hidden_2_sigmoid = neural_network(X,Y,[784,10,10,10],100,"sigmoid",0.25,0,"he")

Epochs : 0
Accuracy: 10.503333333333334 Cost:2.8010808720754703
Epochs : 10
Accuracy: 19.525000000000002 Cost:2.2967122016264527
Epochs : 20
Accuracy: 26.185000000000002 Cost:2.2145720657945023
Epochs : 30
Accuracy: 28.431666666666665 Cost:2.187854502871245
Epochs : 40
Accuracy: 27.058333333333334 Cost:2.1904211559567113
Epochs : 50
Accuracy: 28.396666666666665 Cost:2.1919251938225917
Epochs : 60
Accuracy: 27.99 Cost:2.1910532053721847
Epochs : 70
Accuracy: 27.66 Cost:2.189410100014983
Epochs : 80
Accuracy: 27.016666666666666 Cost:2.187746309595193
Epochs : 90
Accuracy: 26.218333333333334 Cost:2.1860472358553187
training completed


In [None]:
for i in weight_hidden_2_sigmoid:
  print("shape",i.shape,i)

shape (784, 10) [[ 0.06450809 -0.00300931 -0.01067949 ... -0.02219036 -0.08800531
   0.01185121]
 [ 0.02460519 -0.00313512 -0.06421988 ... -0.05588706 -0.08918705
  -0.00385846]
 [-0.05941697 -0.09461307 -0.01200104 ... -0.05864206  0.05051496
  -0.0177138 ]
 ...
 [ 0.02418423  0.01104036 -0.05067011 ... -0.02472395 -0.01266672
  -0.00710641]
 [-0.06233428  0.00131802 -0.03048509 ... -0.00871182  0.0316206
   0.05538582]
 [ 0.01901513 -0.00961751 -0.05101269 ...  0.02330854  0.01144925
  -0.03684521]]
shape (10, 10) [[ 6.15387878e-02  6.42474682e-01 -4.60129764e-01  1.07805939e+00
  -3.92333841e-01 -2.52568915e-01 -1.89582184e-01 -2.94343325e-01
  -2.45857431e-02 -1.30054656e-01]
 [ 8.73335484e-01  8.60864329e-01 -6.48558692e-02  1.00825068e+00
  -5.80607227e-01  1.20999601e-01 -5.30162268e-01 -2.23088902e-01
  -4.95202576e-01 -6.69527461e-01]
 [-4.50494810e-04 -1.10452777e+00  2.99513265e-01 -4.37619788e-02
   3.76978540e-01  6.97673637e-01 -1.52853353e+00  9.02103686e-01
   9.6566907

In [None]:
for i in bias_hidden_2_sigmoid:
  print("shape",i.shape,i)

shape (1, 10) [[0.21577613 0.59704772 2.0038554  1.99248291 1.30604777 0.53564834
  1.40188012 0.52966428 0.81184694 0.70863123]]
shape (1, 10) [[ 0.12031284  0.3117412  -0.22507724  0.04197843  0.00116065  0.28282829
   0.09068033 -0.05572946  0.1958926   0.00733357]]
shape (1, 10) [[ 0.12688225 -0.2474263   0.14795507  0.68343244 -0.28381852 -0.27390201
  -0.10141667  0.11308086  0.20373667 -0.33165242]]


## Problem 5 

Extend your code from problem 4 to implement different activations functions which will be  passed as a parameter. In this problem all activations (except the final layer which should remain a  softmax) must be changed to the passed activation function. 

Tanh Activation Function 

In [None]:
weight_hidden_2_tanh,bias_hidden_2_tanh = neural_network(X,Y,[784,10,10,10],100,"tanh",0.2,0,"xavier")

Epochs : 0
Accuracy: 7.091666666666667 Cost:2.563581695518555
Epochs : 10
Accuracy: 7.363333333333333 Cost:2.447037352199792
Epochs : 20
Accuracy: 9.921666666666667 Cost:2.4222354513541147
Epochs : 30
Accuracy: 13.985 Cost:2.3217714891228014
Epochs : 40
Accuracy: 11.683333333333334 Cost:2.2805871027887723
Epochs : 50
Accuracy: 15.939999999999998 Cost:2.2529892305473296
Epochs : 60
Accuracy: 21.906666666666666 Cost:2.2059732793706788
Epochs : 70
Accuracy: 14.655000000000001 Cost:2.177787434209266
Epochs : 80
Accuracy: 15.486666666666668 Cost:2.229000105150733
Epochs : 90
Accuracy: 20.235 Cost:2.1780416784674084
training completed


In [None]:
for i in weight_hidden_2_tanh:
  print("shape",i.shape,i)

shape (784, 10) [[-0.0316801  -0.02221189 -0.02684236 ...  0.01356259  0.09089857
   0.06792709]
 [ 0.02977478  0.0382282  -0.00670879 ...  0.06046427 -0.0311955
  -0.00999775]
 [ 0.00795794 -0.04667469 -0.17241928 ... -0.05858568 -0.02140373
   0.05016318]
 ...
 [-0.08449199 -0.03421773  0.15887473 ...  0.09140153 -0.02354644
   0.0189133 ]
 [ 0.01714805 -0.00845479  0.06478213 ... -0.09518251  0.03037768
   0.02960935]
 [ 0.05711142  0.0394226  -0.01606055 ... -0.02964405  0.01599197
  -0.01757267]]
shape (10, 10) [[-9.00034923e-02 -5.51387281e-01 -1.03142279e-01  7.23509164e-01
  -1.86963351e-01 -8.61467463e-01 -3.41908401e-01  9.36307606e-01
  -5.66905682e-01 -7.69176064e-01]
 [ 2.65600477e-01 -3.30920447e-01 -7.61746753e-01  3.03596043e-01
   1.81984788e-01 -1.62186006e-01 -7.91926646e-01  5.97094261e-01
  -3.25800095e-01  7.75046651e-01]
 [ 2.12190548e-01  2.93558205e-01  3.20590963e-01 -2.08881746e-01
   1.55131806e-01 -2.69819379e-01 -1.51594865e-02  1.15995342e+00
   1.4183984

In [None]:
for i in bias_hidden_2_tanh:
  print("shape",i.shape,i)

shape (1, 10) [[ 1.05797892  0.31442825 -0.04317452  0.5029422   0.54482656  0.95860374
   0.85314825  0.38274671  0.67048055  0.66138384]]
shape (1, 10) [[-0.09431307 -0.40636118 -0.20781697  0.47979199 -0.01375025 -0.42810299
  -0.36898373  0.16457903  0.00175803 -0.40470605]]
shape (1, 10) [[-0.14738836 -0.10321176  0.07756606 -0.03278073  0.04718353 -0.10865822
   0.08305507 -0.12282424  0.37758306 -0.02727477]]


Relu Activation funtion 

In [None]:
weight_hidden_2_relu,bias_hidden_2_relu = neural_network(X,Y,[784,10,10,10],100,"relu",0.07,0,"he")

Epochs : 0
Accuracy: 9.211666666666666 Cost:2.3900241360560646
Epochs : 10
Accuracy: 11.200000000000001 Cost:2.283593404475552
Epochs : 20
Accuracy: 12.653333333333332 Cost:2.2359520211441457
Epochs : 30
Accuracy: 14.045 Cost:2.191623841049514
Epochs : 40
Accuracy: 18.165 Cost:2.120736273643124
Epochs : 50
Accuracy: 25.856666666666666 Cost:2.004846117548963
Epochs : 60
Accuracy: 18.066666666666666 Cost:2.2593881604295354
Epochs : 70
Accuracy: 23.915 Cost:2.1724979489016336
Epochs : 80
Accuracy: 20.766666666666666 Cost:2.03342904915184
Epochs : 90
Accuracy: 23.696666666666665 Cost:1.9564053648431396
training completed


In [None]:
for i in weight_hidden_2_relu:
  print("shape",i.shape,i)

shape (784, 10) [[ 0.11014994 -0.08798442  0.07074688 ... -0.00776997  0.04224745
   0.09881605]
 [-0.02917801  0.0926358  -0.06326423 ...  0.05967225  0.05450347
   0.0558639 ]
 [-0.03460228  0.02129302 -0.06436965 ... -0.04655502  0.0184495
   0.06107269]
 ...
 [-0.05315242 -0.03499379  0.00555147 ... -0.06370007 -0.00568046
   0.0300406 ]
 [-0.04049744  0.01755503 -0.10735557 ...  0.09232494  0.00036822
   0.04159052]
 [-0.1088265   0.00064372  0.13488089 ... -0.06315799 -0.04385682
   0.00940506]]
shape (10, 10) [[ 0.74051483 -0.56431134  0.04533769 -0.41186784 -0.42471929 -0.0071292
   0.03172553 -0.29514267 -0.20501289  0.14846359]
 [ 0.15953619 -0.28042935 -0.31373176  0.68238562  0.04343728 -0.57095574
  -0.06350648 -0.0903493   0.48172023 -0.15723955]
 [-0.06794285 -0.04383926 -0.0641771  -0.59872548  0.36744074  1.03836389
   0.24276419 -0.05938632 -0.21128829  0.59187557]
 [-0.13703374  0.78126465 -0.26537997  0.23531288 -0.66564231 -0.6244749
   0.0421782   0.0167708   0.07

In [None]:
for i in bias_hidden_2_relu:
  print("shape",i.shape,i)

shape (1, 10) [[ 2.43874310e-02  3.32056129e-04 -2.51431905e-01  1.26702317e-03
   8.63554917e-02  1.02591591e-02 -7.00512092e-02  2.40831291e-03
  -3.44567712e-01  6.34948147e-03]]
shape (1, 10) [[-0.16537831  0.00765883  0.00796901  0.01651085 -0.00342235 -0.06923109
  -0.1775626  -0.01839671 -0.07396728 -0.08916667]]
shape (1, 10) [[-0.11689366  0.17307672 -0.37140419 -0.07983347 -0.00309773 -0.06206476
  -0.06523411  0.31066559  0.06486062  0.20414297]]


## Problem 6

Extend your code from problem 5 to implement momentum with your gradient descent. The  momentum value will be passed as a parameter. Your function should perform “epoch” number of  epochs and return the resulting weights.

In [None]:
momentum = 0.9
weight_hidden_2_momentum,bias_hidden_2_momentum = neural_network(X,Y,[784,10,10,10],100,"tanh",0.2,momentum,"he")

Epochs : 0
Accuracy: 10.81 Cost:2.4246472234479715
Epochs : 10
Accuracy: 13.306666666666667 Cost:2.4927126755629065
Epochs : 20
Accuracy: 14.278333333333334 Cost:2.288821166113774
Epochs : 30
Accuracy: 15.976666666666667 Cost:2.3576963981269734
Epochs : 40
Accuracy: 12.833333333333332 Cost:2.3448994290027936
Epochs : 50
Accuracy: 5.65 Cost:2.377629138795824
Epochs : 60
Accuracy: 6.7683333333333335 Cost:2.4845010720505387
Epochs : 70
Accuracy: 15.501666666666667 Cost:2.3034441653071953
Epochs : 80
Accuracy: 21.758333333333333 Cost:2.188605570292917
Epochs : 90
Accuracy: 22.356666666666666 Cost:2.204896953666272
training completed


In [None]:
for i in weight_hidden_2_momentum:
  print("shape",i.shape,i)

shape (784, 10) [[ 0.07434923 -0.00397005 -0.05620124 ... -0.01584039  0.03258238
  -0.00689419]
 [-0.0277622   0.00218493 -0.08066819 ...  0.04090915 -0.02733304
   0.01310768]
 [ 0.0025471   0.11238279 -0.04423048 ...  0.04401599  0.03775988
  -0.04514117]
 ...
 [ 0.11361575  0.07326121 -0.00461883 ... -0.00972068 -0.04098391
   0.0657481 ]
 [-0.01793086  0.05181419  0.01345883 ...  0.0009351   0.02380599
  -0.04219216]
 [ 0.03310634  0.01766649  0.01995829 ...  0.00435305  0.08472738
  -0.11835821]]
shape (10, 10) [[-1.88133831  0.52613907 -0.85058873 -0.65210226 -0.31525091  0.27969625
   0.63221791  0.66507093  1.56658925 -0.79434336]
 [ 0.66702685 -0.13597491  0.31891837 -0.28267301 -0.44971518  0.86355978
  -0.38455559  0.28133585  0.10778709 -0.2053669 ]
 [ 0.49916463  0.42290074  0.05428675 -0.41172472 -0.4219439  -0.60028569
  -0.00796702  0.15058193 -0.37507776 -0.25988558]
 [ 0.26527469 -0.02717921  0.03447945 -0.4028924   0.56623128  0.33107743
  -0.03129915  0.13523664  0

In [None]:
for i in bias_hidden_2_momentum:
  print("shape",i.shape,i)

shape (1, 10) [[ 2.69555846 -0.36332662 -0.20858925 -0.2983627  -0.24838776  0.04400455
  -0.31607882  1.36211371  0.5939772   0.52202047]]
shape (1, 10) [[-0.64371937 -0.29753769 -0.29608504  0.14639424  0.10229798  0.22196268
  -0.26687856 -0.34373464  0.20185435 -0.44806351]]
shape (1, 10) [[-0.19101454  0.35858567 -0.10727037  0.17603227  0.31504488  0.10945887
   0.03300546 -0.29126452 -0.16604493 -0.16986692]]


# d

In [None]:
class MLP:

  def __init__(self,X,Y,layer_nodes,activation_type):
    self.X = X
    self.Y = Y
    self.layer_nodes = layer_nodes
    self.activation_type = activation_type 

  def loss(self,y_actual,y_pred):
    '''
    Calculating cost 
    '''
    cost = -np.sum(y_actual*np.log(y_pred))          #calculating the cost      categorical cross entropy 
    return cost/y_actual.shape[0]

  # #softmax function 

  def softmax(self,x, derivative=False):
    exp_x = np.exp(x)  
    softmax = exp_x / np.sum(exp_x, axis=1, keepdims=True)
    if derivative:
        return softmax * (1 - softmax)
    return softmax


  def one_hot_encoded(self,x):
    '''
    one hot encoding 
    '''
    n_values = 10     #number of categories (for mnist categories = 10)
    y=np.eye(n_values)[x]     
    return y

  def relu(self,x,derivative=False):
    '''
    Relu Activation
    '''
    if derivative:
        return np.where(x <= 0, 0, 1)
    else:
        return np.maximum(0, x)
  
  #tanh function 
  def tanh(self,x,derivative=False):
    '''
    tanh Activation 
    '''
    t=np.tanh(x)
    if derivative:
         return (1-t**2)
    return t
  
  def sigmoid(self,x,derivative =False):
    sig = 1 / (1 + np.exp(-x))
    if derivative:
        return sig * (1 - sig)
    return sig


  def Activation(self,input,function_type,derivative=False):
    '''
    Activation call
    '''
    if function_type == "sigmoid":
      if derivative:
        self.sigmoid(input,True)
      return self.sigmoid(input,False)

    if function_type == "relu":
      if derivative:
        self.relu(input,True)
      return self.relu(input,False)

    if function_type == "tanh":
      if derivative:
        self.tanh(input,True)
      return self.tanh(input,False)


  def train(self,epochs,learning_rate,momentum,initialization):
     
     weights = [None]*(len(self.layer_nodes)-1)
     bias = [None]*(len(self.layer_nodes)-1)
     Z = [None]*(len(self.layer_nodes)-1)
     A = [None]*(len(self.layer_nodes)-1)
     Vdw = [None]*(len(self.layer_nodes)-1)
     Vdb = [None]*(len(self.layer_nodes)-1)
     
     for i in range(len(self.layer_nodes)-1):
         if initialization == "xavier":
            # Xavier initialization 
            weights[i] = np.random.randn(self.layer_nodes[i], self.layer_nodes[i+1]) * np.sqrt(1/self.layer_nodes[i])
         elif initialization == "he":
            # He initialization 
            weights[i] = np.random.randn(self.layer_nodes[i], self.layer_nodes[i+1]) * np.sqrt(2/self.layer_nodes[i])
         bias[i] = np.random.rand(1,self.layer_nodes[i+1])
         Vdw[i] = np.zeros_like(weights[i]) # initializing the momentum term
         Vdb[i] = np.zeros_like(bias[i]) # initializing the momentum term

     y = one_hot_encoded(self.Y)
     x = self.X/255.0
     for i in range(epochs):
          '''forward pass '''
          inputs = x
          for j in range(len(self.layer_nodes)-2):
             Z[j]  = np.add(np.matmul(inputs,weights[j]),bias[j])
             A[j] = self.Activation(Z[j],self.activation_type,False)
             inputs =  A[j]
          Z[-1] = np.add(np.matmul(inputs,weights[-1]),bias[-1])
          y_pred = A[-1] = self.softmax(Z[-1],False)

          cost =  self.loss(y,y_pred)     #calculating cost 
          '''backward pass '''
          delta = [None] * (len(self.layer_nodes) - 1)
          delta[-1] = y_pred - y
          # print("delta -1",delta[-1])
          for j in range(len(self.layer_nodes) - 2, 0, -1):
              delta[j-1] = (np.matmul(delta[j], weights[j].T) * Activation(Z[j-1], self.activation_type, True))

          # weight and bias updates
          for j in reversed(range(len(self.layer_nodes)-1)):

            dW = np.matmul(A[j-1].T, delta[j])/y.shape[0]  if j != 0 else np.matmul(x.T, delta[j])/y.shape[0] 
            db = np.sum(delta[j], axis=0, keepdims=True)/y.shape[0] 

            Vdw[j] = momentum*Vdw[j] + (1-momentum)*dW
            Vdb[j] = momentum*Vdb[j] + (1-momentum)*db

            weights[j] = weights[j] - learning_rate * Vdw[j]
            bias[j] = bias[j] - learning_rate * Vdb[j]


          if i%10 == 0:
             print("Epochs :",i)
             y_temp = np.argmax(y_pred, axis = 1)
             accuracy = (Y == y_temp).sum() / len(Y)
             print(f"Accuracy: {accuracy*100} Cost:{cost}")
          elif i == epochs-1:
             print('training completed')

     return weights,bias 

In [None]:
model_mlp = MLP(X,Y,[784,10,10,10],"sigmoid")
weights,bias=model_mlp.train(100,0.07,0.9,"he")

Epochs : 0
Accuracy: 9.871666666666666 Cost:2.4424202280116467
Epochs : 10
Accuracy: 9.871666666666666 Cost:2.3580660601014842
Epochs : 20
Accuracy: 11.553333333333333 Cost:2.2871553369285187
Epochs : 30
Accuracy: 19.121666666666666 Cost:2.246555429933913
Epochs : 40
Accuracy: 25.948333333333334 Cost:2.2204032389317767
Epochs : 50
Accuracy: 30.196666666666665 Cost:2.208334139943309
Epochs : 60
Accuracy: 28.878333333333334 Cost:2.2061389494614705
Epochs : 70
Accuracy: 29.301666666666666 Cost:2.206511210298669
Epochs : 80
Accuracy: 29.883333333333333 Cost:2.2058793975704427
Epochs : 90
Accuracy: 29.645 Cost:2.2044861344471376
training completed
