***Neural network on MNIST***

## Single Hidden layer

**epochs = 1500**

110 neurons, lr=0.1
Training accuracy :  99.92044550517105
Test accuracy :  85.0

120 neurons, lr=0.2
Training accuracy :  100.0
Test accuracy :  89.62962962962962

200 neurons, lr=0.3
Training accuracy :  100.0
Test accuracy :  90.0
Elapsed time: 58.53140044212341 on GPU

As can be seen from the results, 200 neurons with learning rate of 0.3 was the best choice. It can be observed that model is overfitted to the data hence resulting in greater difference in test and train accuracy.

Model gave new results every time for the same combination of neuron and learning rate which is because of the random arrangement of dataset


In [10]:
#Necessary imports
import time
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_digits

#Loading dataset - MNIST
X = load_digits()

#Converts directly to one_hot_encoding
labels = pd.get_dummies(X.target)

#Splitting into test train
X_train, X_test, y_train, y_test = train_test_split(X.data, labels, test_size=0.3, random_state=20)

#Defining the functions used in neural network

#Activation function
def sigmoid(x):
    return 1/(1 + np.exp(-x))

#Calculating slope aka derivative
def calcSlope(x):
    return x * (1 - x)

#Getting probabilities of all classes i.e. digits(0-9)
def softmax(x):
    exps = np.exp(x - np.max(x, axis=1, keepdims=True))
    return exps/np.sum(exps, axis=1, keepdims=True)

#Loss function
def cross_entropy(y_pred, y):
    return (y_pred - y)/y.shape[0]

#Calculating error per prediction
def error(y_pred, y):
    logp = - np.log(y_pred[np.arange(y.shape[0]), y.argmax(axis=1)])
    return np.sum(logp)/y.shape[0]

class neuralNetMNIST:

    #Initializing values of weights and biases

    def __init__(self, x, y):
        
        self.x = x

        #Tried different combinations of neurons such as 120, 200 
        neurons = 200

        #Tried different combinations of learning rate such as 0.2, 0.3
        self.lr = 0.3
        ip_dim = x.shape[1]
        op_dim = y.shape[1]

        #3 layers

        #Input layer
        self.w1 = np.random.randn(ip_dim, neurons)
        self.b1 = np.zeros((1, neurons))
        
        #Hidden layer
        self.w2 = np.random.randn(neurons, neurons)
        self.b2 = np.zeros((1, neurons))
        
        #Output layer
        self.w3 = np.random.randn(neurons, op_dim)
        self.b3 = np.zeros((1, op_dim))
        self.y = y

    def forwardProp(self):

        #Activation of first layer
        z1 = np.dot(self.x, self.w1) + self.b1 # z = wx + b
        self.a1 = sigmoid(z1) # a = sigmoid(z)

        #Activation of first layer is passed as input to second layer(hidden layer)
        z2 = np.dot(self.a1, self.w2) + self.b2
        self.a2 = sigmoid(z2)

        #Activation of hidden layer is passed to output layer and the final label is predicted by softmax
        z3 = np.dot(self.a2, self.w3) + self.b3
        self.a3 = softmax(z3)
        
    def backprop(self):
        loss = error(self.a3, self.y) #Calculation of error of predicted label

        #Calculating the weight update factor using chain rule
        
        #Output layer
        a3_delta = cross_entropy(self.a3, self.y) 
        z2_delta = np.dot(a3_delta, self.w3.T)

        #hidden layer
        a2_delta = z2_delta * calcSlope(self.a2) 
        z1_delta = np.dot(a2_delta, self.w2.T)

        #Input layer
        a1_delta = z1_delta * calcSlope(self.a1) # w1

        #Weights and biases adjustments
        
        #Output layer
        self.w3 -= self.lr * np.dot(self.a2.T, a3_delta)
        self.b3 -= self.lr * np.sum(a3_delta, axis=0, keepdims=True)

        #Hidden layer
        self.w2 -= self.lr * np.dot(self.a1.T, a2_delta)
        self.b2 -= self.lr * np.sum(a2_delta, axis=0)

        #Input layer
        self.w1 -= self.lr * np.dot(self.x.T, a1_delta)
        self.b1 -= self.lr * np.sum(a1_delta, axis=0)

    def predict(self, image):
      #Gives prediction on trained network
        self.x = image
        self.forwardProp()
        return self.a3.argmax()#prediction of last layer

    def fit(self, epochs):
        for i in range(epochs):
          self.forwardProp()    
          self.backprop()    

    def evaluate(self, x, y):
        acc = 0
        for data,label in zip(x, y):
            s = model.predict(data)
            if s == np.argmax(label):
                acc +=1
        return acc/len(x)*100
      
#Initialization  
model = neuralNetMNIST(X_train, np.array(y_train))

start_time = time.time()

#Training the network
model.fit(1500)

elapsed_time = time.time() - start_time
	
#Evaluation on dataset  
print("Training accuracy : ", model.evaluate(X_train, np.array(y_train)))
print("Test accuracy : ", model.evaluate(X_test, np.array(y_test)))
print(elapsed_time)

Training accuracy :  100.0
Test accuracy :  90.0
58.53140044212341


## Double Hidden layer

**epochs = 1500**

110 neurons, lr=0.1
Training accuracy :  98.64757358790771
Test accuracy :  90.18518518518519

120 neurons, lr=0.2
Training accuracy :  100.0
Test accuracy :  89.07407407407408

200 neurons, lr=0.3
Training accuracy :  100.0
Test accuracy :  89.81481481481481
Elapsed time: 93.31503057479858 on GPU

Increasing the number of layers had a very slight effect on the accuracy as compared to a single hidden layer neural network

As can be seen from the results, 110 neurons with learning rate of 0.1 was the best choice. It can be observed that model is overfitted to the data hence resulting in greater difference in test and train accuracy.

Time taken is increased due to an addition of layer

In [12]:
#Necessary imports
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_digits

#Loading dataset - MNIST
X = load_digits()

#Converts directly to one_hot_encoding
labels = pd.get_dummies(X.target)

#Splitting into test train
X_train, X_test, y_train, y_test = train_test_split(X.data, labels, test_size=0.3, random_state=20)

#Defining the functions used in neural network

#Activation function
def sigmoid(x):
    return 1/(1 + np.exp(-x))

#Calculating slope aka derivative
def calcSlope(x):
    return x * (1 - x)

#Getting probabilities of all classes i.e. digits(0-9)
def softmax(x):
    exps = np.exp(x - np.max(x, axis=1, keepdims=True))
    return exps/np.sum(exps, axis=1, keepdims=True)

#Loss function
def cross_entropy(y_pred, y):
    return (y_pred - y)/y.shape[0]

#Calculating error per prediction
def error(y_pred, y):
    logp = - np.log(y_pred[np.arange(y.shape[0]), y.argmax(axis=1)])
    return np.sum(logp)/y.shape[0]

class neuralNetMNIST:

    #Initializing values of weights and biases

    def __init__(self, x, y):
        
        self.x = x
 
        neurons = 200

        self.lr = 0.3
        ip_dim = x.shape[1]
        op_dim = y.shape[1]

        #3 layers

        #Input layer
        self.w1 = np.random.randn(ip_dim, neurons)
        self.b1 = np.zeros((1, neurons))
        
        #Hidden layer
        self.w2 = np.random.randn(neurons, neurons)
        self.b2 = np.zeros((1, neurons))
        
        #Hidden layer 2
        self.w22 = np.random.randn(neurons, neurons)
        self.b22 = np.zeros((1, neurons))
        
        #Output layer
        self.w3 = np.random.randn(neurons, op_dim)
        self.b3 = np.zeros((1, op_dim))
        self.y = y

    def forwardProp(self):

        #Activation of first layer
        z1 = np.dot(self.x, self.w1) + self.b1 # z = wx + b
        self.a1 = sigmoid(z1) # a = sigmoid(z)

        #Activation of first layer is passed as input to second layer(hidden layer)
        z2 = np.dot(self.a1, self.w2) + self.b2
        self.a2 = sigmoid(z2)

        #Activation of first layer is passed as input to second layer(hidden layer)
        z22 = np.dot(self.a2, self.w22) + self.b22
        self.a22 = sigmoid(z22)

        #Activation of hidden layer is passed to output layer and the final label is predicted by softmax
        z3 = np.dot(self.a22, self.w3) + self.b3
        self.a3 = softmax(z3)
        
    def backprop(self):
        loss = error(self.a3, self.y) #Calculation of error of predicted label

        #Calculating the weight update factor using chain rule
        
        #Output layer
        a3_delta = cross_entropy(self.a3, self.y) 
        z22_delta = np.dot(a3_delta, self.w3.T)

        #hidden layer 2
        a22_delta = z22_delta * calcSlope(self.a22) 
        z2_delta = np.dot(a22_delta, self.w22.T)

        #hidden layer 1
        a2_delta = z22_delta * calcSlope(self.a2) 
        z1_delta = np.dot(a2_delta, self.w2.T)

        #Input layer
        a1_delta = z1_delta * calcSlope(self.a1) # w1

        #Weights and biases adjustments
        
        #Output layer
        self.w3 -= self.lr * np.dot(self.a22.T, a3_delta)
        self.b3 -= self.lr * np.sum(a3_delta, axis=0, keepdims=True)

        #Hidden layer 2
        self.w22 -= self.lr * np.dot(self.a2.T, a22_delta)
        self.b22 -= self.lr * np.sum(a22_delta, axis=0)

        #Hidden layer 1
        self.w2 -= self.lr * np.dot(self.a1.T, a2_delta)
        self.b2 -= self.lr * np.sum(a2_delta, axis=0)

        #Input layer
        self.w1 -= self.lr * np.dot(self.x.T, a1_delta)
        self.b1 -= self.lr * np.sum(a1_delta, axis=0)

    def predict(self, image):
      #Gives prediction on trained network
        self.x = image
        self.forwardProp()
        return self.a3.argmax()#prediction of last layer

    def fit(self, epochs):
        for i in range(epochs):
          self.forwardProp()    
          self.backprop()    

    def evaluate(self, x, y):
        acc = 0
        for data,label in zip(x, y):
            s = model.predict(data)
            if s == np.argmax(label):
                acc +=1
        return acc/len(x)*100
      
#Initialization  
model = neuralNetMNIST(X_train, np.array(y_train))

start_time = time.time()

#Training the network
model.fit(1500)

elapsed_time = time.time() - start_time
	
#Evaluation on dataset  
print("Training accuracy : ", model.evaluate(X_train, np.array(y_train)))
print("Test accuracy : ", model.evaluate(X_test, np.array(y_test)))
print(elapsed_time)

Training accuracy :  100.0
Test accuracy :  89.81481481481481
93.31503057479858


## Removing sigmoid function from a single hidden layer network

**epochs = 1500**

200 neurons, lr=0.3
Training accuracy :  10.182975338106603
Test accuracy :  9.25925925925926
Time elapsed: 31.159974098205566 on GPU

Accuracies dropped alot

Due to removal of sigmoid function, the time taken was reduced as compared to the model which used sigmoid activation function


In [13]:
#Necessary imports
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_digits

#Loading dataset - MNIST
X = load_digits()

#Converts directly to one_hot_encoding
labels = pd.get_dummies(X.target)

#Splitting into test train
X_train, X_test, y_train, y_test = train_test_split(X.data, labels, test_size=0.3, random_state=20)

#Defining the functions used in neural network

#Activation function
def sigmoid(x):
    return 1/(1 + np.exp(-x))

#Calculating slope aka derivative
def calcSlope(x):
    return x * (1 - x)

#Getting probabilities of all classes i.e. digits(0-9)
def softmax(x):
    exps = np.exp(x - np.max(x, axis=1, keepdims=True))
    return exps/np.sum(exps, axis=1, keepdims=True)

#Loss function
def cross_entropy(y_pred, y):
    return (y_pred - y)/y.shape[0]

#Calculating error per prediction
def error(y_pred, y):
    logp = - np.log(y_pred[np.arange(y.shape[0]), y.argmax(axis=1)])
    return np.sum(logp)/y.shape[0]

class neuralNetMNIST:

    #Initializing values of weights and biases

    def __init__(self, x, y):
        
        self.x = x
 
        neurons = 200

        self.lr = 0.3
        ip_dim = x.shape[1]
        op_dim = y.shape[1]

        #3 layers

        #Input layer
        self.w1 = np.random.randn(ip_dim, neurons)
        self.b1 = np.zeros((1, neurons))
        
        #Hidden layer
        self.w2 = np.random.randn(neurons, neurons)
        self.b2 = np.zeros((1, neurons))
        
        #Output layer
        self.w3 = np.random.randn(neurons, op_dim)
        self.b3 = np.zeros((1, op_dim))
        self.y = y

    def forwardProp(self):

        #Activation of first layer
        self.a1 = np.dot(self.x, self.w1) + self.b1 # z = wx + b
        # self.a1 = sigmoid(z1) # a = sigmoid(z)

        #Activation of first layer is passed as input to second layer(hidden layer)
        self.a2 = np.dot(self.a1, self.w2) + self.b2
        # self.a2 = sigmoid(z2)

        #Activation of hidden layer is passed to output layer and the final label is predicted by softmax
        z3 = np.dot(self.a2, self.w3) + self.b3
        self.a3 = softmax(z3)
        
    def backprop(self):
        loss = error(self.a3, self.y) #Calculation of error of predicted label

        #Calculating the weight update factor using chain rule
        
        #Output layer
        a3_delta = cross_entropy(self.a3, self.y) 
        z2_delta = np.dot(a3_delta, self.w3.T)

        #hidden layer
        a2_delta = z2_delta * calcSlope(self.a2) 
        z1_delta = np.dot(a2_delta, self.w2.T)

        #Input layer
        a1_delta = z1_delta * calcSlope(self.a1) 

        #Weights and biases adjustments
        
        #Output layer
        self.w3 -= self.lr * np.dot(self.a2.T, a3_delta)
        self.b3 -= self.lr * np.sum(a3_delta, axis=0, keepdims=True)

        #Hidden layer
        self.w2 -= self.lr * np.dot(self.a1.T, a2_delta)
        self.b2 -= self.lr * np.sum(a2_delta, axis=0)

        #Input layer
        self.w1 -= self.lr * np.dot(self.x.T, a1_delta)
        self.b1 -= self.lr * np.sum(a1_delta, axis=0)

    def predict(self, image):
      #Gives prediction on trained network
        self.x = image
        self.forwardProp()
        return self.a3.argmax()#prediction of last layer

    def fit(self, epochs):
        for i in range(epochs):
          self.forwardProp()    
          self.backprop()    

    def evaluate(self, x, y):
        acc = 0
        for data,label in zip(x, y):
            s = model.predict(data)
            if s == np.argmax(label):
                acc +=1
        return acc/len(x)*100
      
#Initialization  
model = neuralNetMNIST(X_train, np.array(y_train))

start_time = time.time()

#Training the network
model.fit(1500)

elapsed_time = time.time() - start_time
	
#Evaluation on dataset  
print("Training accuracy : ", model.evaluate(X_train, np.array(y_train)))
print("Test accuracy : ", model.evaluate(X_test, np.array(y_test)))
print(elapsed_time)



Training accuracy :  10.182975338106603
Test accuracy :  9.25925925925926
31.159974098205566


### Self Practice Neural networks on small self made dataset

First try

In [4]:
from numpy import exp, array, random, dot

#1 neuron
#Definining a class for model
class sequential():

    def __init__(self):

        #initialing the seed
        random.seed(1)
        self.weights = 2 * random.random((3, 1)) - 1 #Initializing weights

    def sigmoid(self, x):
        return 1 / (1 + exp(-x))#Sigmoid function

    def calcSlope(self, x):
        return x * (1 - x) #Slope 

    def predict(self, X):
        return self.sigmoid(dot(X, self.weights)) #Activation

    def predict_class(self, X):
        if(self.predict(X)>0.5):
          return 1
        else:
          return 0  
    
    def train(self, X, y, epochs):

        for i in range(epochs): #Iterations aka epochs

            y_pred = self.predict(X) # label predicted by network, (single neuron)

            error = y - y_pred # Calculating error

            weight_update = dot(X.T, error * self.calcSlope(y_pred)) 

            # Updating weights on the basis of slope, 
            # each weight gets updated on the basis of its contribution in error

            self.weights += weight_update

#Dataset
X = array([[0, 0, 1], [1, 1, 1], [1, 0, 1], [0, 1, 1]])
y = array([[0, 1, 1, 0]]).T


print ('\n Input:')
print(X)

print ('\n Actual Output:')
print(y)

model = sequential() #Instantiating model

print ("\nWeights before training: ")
print (model.weights)

#Training on 10000 epochs
model.train(X, y, 5000)

print ("\nWeights after training: ")
print (model.weights)

print ('\n\n*************************\nPrediction:')

print ("\n\nTesting on [1, 0, 0]:")
print (model.predict_class(array([1, 0, 0]))) 

print ("\n\nTesting on [0, 0, 1]:")
print (model.predict_class(array([0, 0, 1]))) 


 Input:
[[0 0 1]
 [1 1 1]
 [1 0 1]
 [0 1 1]]

 Actual Output:
[[0]
 [1]
 [1]
 [0]]

Weights before training: 
[[-0.16595599]
 [ 0.44064899]
 [-0.99977125]]

Weights after training: 
[[ 8.95950703]
 [-0.20975775]
 [-4.27128529]]


*************************
Prediction:


Testing on [1, 0, 0]:
1


Testing on [0, 0, 1]:
0


Second Try

In [0]:
import numpy as np

X=np.array([ 
             [1,0,1,0],
             [1,0,1,1],
             [0,1,0,1]
           ])

y=np.array([[1],[1],[0]])


In [6]:
class sequential2():

    def __init__(self):
      # initializing the variables
        # self.epoch=5000 
        self.lr=0.1 
#2 neurons
        # initializing weight and bias
        self.w1=np.random.uniform(size=(X.shape[1],3))
        self.b1=np.random.uniform(size=(1,3))
        self.w2=np.random.uniform(size=(3,1))
        self.b2=np.random.uniform(size=(1,1))

        self.z1=0
        self.z2=0
        self.a1=0
        self.a2=0
        print('\nWeights before training:\nwh:\n{} \nwout: \n{} \n'.format(self.w1,self.w2))



    def sigmoid (self,x):
        return 1/(1 + np.exp(-x))

    def calc_slope(self,x):
        return x * (1 - x)

    def predict(self,X):
        #Forward prop
        #First layer
        self.z1=np.dot(X,self.w1)
        self.z1=self.z1 + self.b1
        self.a1 = self.sigmoid(self.z1)#activation

        #Next layer
        self.z2=np.dot(self.a1,self.w2)
        self.z2= self.z2+ self.b2
        self.a2 =self.sigmoid(self.z2)
        return self.a2

    def predict_class(self, X):
        if(self.predict(X)>0.5):
          return 1
        else:
          return 0  

    def train(self,X, epoch):

      # training the model
        for i in range(epoch):

            #forward propagation
            self.predict(X)

            #Backpropagation
            
            #calculating error
            E = y-self.a2

            #calculating slopes
            l2 = self.calc_slope(self.a2)
            l1 = self.calc_slope(self.a1)

            d2 = E * l2
            e1 = d2.dot(self.w2.T)
            d1 = e1 * l1

            #Updation of weights and biases with respect to error and learning rate
            # T = transpose
            self.w2 += self.a1.T.dot(d2) *self.lr
            self.b2 += np.sum(d2, axis=0,keepdims=True) *self.lr
            self.w1 += X.T.dot(d1) *self.lr
            self.b1 += np.sum(d1, axis=0,keepdims=True) *self.lr

        print('\nWeights after training:\nwh: \n{}\nwout: \n{}'.format(self.w1,self.w2))

print ('\n Input:')
print(X)

print ('\n Actual Output:')
print(y)

model = sequential2()
model.train(X,5000)
print ('\n\n*************************\nPrediction:')

print(model.predict_class([1,0,1,0]))
print(model.predict_class([1,0,1,1]))
print(model.predict_class([0,1,0,1]))


 Input:
[[1 0 1 0]
 [1 0 1 1]
 [0 1 0 1]]

 Actual Output:
[[1]
 [1]
 [0]]

Weights before training:
wh:
[[0.30233257 0.14675589 0.09233859]
 [0.18626021 0.34556073 0.39676747]
 [0.53881673 0.41919451 0.6852195 ]
 [0.20445225 0.87811744 0.02738759]] 
wout: 
[[0.14038694]
 [0.19810149]
 [0.80074457]] 


Weights after training:
wh: 
[[ 0.12834561 -1.41338192  1.6931045 ]
 [ 0.41635588  1.39998964 -1.86176229]
 [ 0.36482977 -1.1409433   2.28598541]
 [ 0.32695637  0.86934366 -1.0568335 ]]
wout: 
[[-0.38489259]
 [-3.07912606]
 [ 4.84661501]]


*************************
Prediction:
1
1
0


Try 3 (XOR function)

In [7]:
import numpy as np
def sigmoid (x):
    return 1/(1 + np.exp(-x))
 
#Input data
X = np.array([[0,0], [0,1], [1,0], [1,1]])
y = np.array([ [0],   [1],   [1],   [0]])

epochs = 50000 #was giving poor accuracy on smaller epochs
LR = .3 # learning rate

#Weight initialization 
w1 = np.random.uniform(size=(2, 3))
w2 = np.random.uniform(size=(3, 1))

print ('\n Input:')
print(X)

print ('\n Actual Output:')
print(y)

print('\nWeights before training:\nwh:\n{} \nwout: \n{} \n'.format(w1,w2))
# 2 neurons
for i in range(epochs):
 
    # Forward propagation
    a = sigmoid(np.dot(X, w1))
    y_pred = np.dot(a, w2)
    
    # Calculate error
    e = y - y_pred
  
    # Backward Propagation 
    #Weight updation
    d2 = e * LR
    w2 += a.T.dot(d2)
    d1 = d2.dot(w2.T) * sigmoid(a)
    w1 += X.T.dot(d1)


print('\nWeights after training:\nwh: \n{}\nwout: \n{}'.format(w1,w2))



 Input:
[[0 0]
 [0 1]
 [1 0]
 [1 1]]

 Actual Output:
[[0]
 [1]
 [1]
 [0]]

Weights before training:
wh:
[[0.31342418 0.69232262 0.87638915]
 [0.89460666 0.08504421 0.03905478]] 
wout: 
[[0.16983042]
 [0.8781425 ]
 [0.09834683]] 


Weights after training:
wh: 
[[ 3.03000179e-01 -4.24453501e+02  4.25631797e+02]
 [ 1.18593982e+00  4.24350419e+02 -4.25734879e+02]]
wout: 
[[3.60822483e-15]
 [5.12865101e-01]
 [5.12865101e-01]]


In [8]:

a = sigmoid(np.dot([1,1], w1))
np.dot(a, w2)# hence 0

array([0.4864549])