<a href="https://colab.research.google.com/github/LokeshVadlamudi/DeepLearningClass/blob/master/MNIST_Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

MNIST classifier using numpy and python without CNN and just using plain neural networks.

Things to do:
* The code should do mini batch gradient descent along with appropriate learning rate
* The code should do dropout - try various dropout rates and pick the one which works well.
* The code should initialize the random weights of network properly
* The code should do basic image augmentations to supplement the training data (not testing data) using keras libraries  (NEW than the deck)
*  The code should use  3 or more layers for training (not 2 as in example ) - you have to tune and pick number of neurons in your layer and number of layers
*  The code will continue to use relu activation layer in right places like python code
* The code should normalize the input as discussed in the class before training (scaling the input)
* The code should use appropriate learning rate (try out few to find out which one works) - you can use adaptive learning rates like different learning rates per epoch or per mini batch
* The code should provide appropriate metrics, visualization,  testing and training accuracy etc.,. and plot the results and confusion matrix  (this is important)
* The code should display top common errors

Importing the modules we need

In [1]:
import numpy as np
import keras
import matplotlib.pyplot as plt
from tqdm import trange
from keras.preprocessing.image import ImageDataGenerator


Using TensorFlow backend.


Lets Create a Layer Class

In [0]:
class Layer:
  def __init__(self):
    self.wgts = np.zeros(shape=(input.shape[1], 10))
    bs = np.zeros(shape=(10,))

  def forward(self, ip):
        op = np.matmul(ip, self.wgts) + bs
        return op

Creating Dense Layer Class , we can give learning rate as input to our dense layer


In [0]:
class DenseLayer(Layer):
    def __init__(self, ips, ops, learning_rate=0.1):
        # using normal distribution and initializing small random weights
        self.wgts = np.random.randn(ips, ops)*0.01
        self.bs = np.zeros(ops)
        self.learning_rate = learning_rate
    def forward(self,input):
        return np.matmul(input, self.wgts) + self.bs
    def backward(self,ip,grad_op):
        grad_ip = np.dot(grad_op,np.transpose(self.wgts))
        grad_wgts = np.transpose(np.dot(np.transpose(grad_op),ip))
        grad_bs = np.sum(grad_op, axis = 0)
        
        # gradient descent method(stochastic)
        self.bs = self.bs - self.learning_rate * grad_bs
        self.wgts = self.wgts - self.learning_rate * grad_wgts
        return grad_ip

    

Creating Activation Function Relu Class

In [0]:
class ReLuLayer(Layer):
    def __init__(self):
        pass

    def backward(self, ip, grad_op):
        relu_g = ip > 0
        return grad_op*relu_g 
    
    def forward(self, ip):
        return np.maximum(0,ip)

    

Lets define a loss function

In [0]:
def softcentropy(lgs,ref_ans):
    log_ans = lgs[np.arange(len(lgs)),ref_ans]
    
    finalEnt = - log_ans + np.log(np.sum(np.exp(lgs),axis=-1))
    
    return finalEnt

In [0]:
def grad_softcentropy(lgs,ref_ans):
    oneans = np.zeros_like(lgs)
    oneans[np.arange(len(lgs)),ref_ans] = 1 
    smax = np.exp(lgs) / np.exp(lgs).sum(axis=-1,keepdims=True) 
    return (- oneans + smax) / lgs.shape[0]

import the dataset

Normalizing the inputs and creating validation sets also

In [0]:

def dataLoad(flatten=False):
    #loading dataset from the keras library
    (x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
    # normalizing X
    x_train = x_train.astype(float) / 255.
    x_test = x_test.astype(float) / 255.
    # creating validation set
    x_train, x_val = x_train[:-10000], x_train[-10000:]
    y_train, y_val = y_train[:-10000], y_train[-10000:]
    if flatten:
        x_train = x_train.reshape([x_train.shape[0], -1])
        x_val = x_val.reshape([x_val.shape[0], -1])
        x_test = x_test.reshape([x_test.shape[0], -1])
    return x_train, y_train, x_val, y_val, x_test, y_test

Loading the DataSet

In [8]:
x_train, y_train, x_val, y_val, x_test, y_test = dataLoad(flatten=True)    

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz


Lets do Data Augmentation


In [26]:
(X_Train, Y_train), (X_test, Y_test) = keras.datasets.cifar10.load_data()
datamodder = ImageDataGenerator(
    featurewise_std_normalization=True,
    rotation_range=20,
    horizontal_flip=True)

datamodder.fit(X_Train)

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz




Creating the network of 3 layers, each layer coupled with relu activation layer, also giving different learning rates for different layers

In [0]:
#creating a initial list to hold our layers.
neuralnet = []

#first input layer
neuralnet.append(DenseLayer(x_train.shape[1],100))
neuralnet.append(ReLuLayer())
#first Hidden Layer
neuralnet.append(DenseLayer(100,200,learning_rate=0.2))
neuralnet.append(ReLuLayer())
#second Hidden Layer
neuralnet.append(DenseLayer(200,200,learning_rate=0.1))
neuralnet.append(ReLuLayer())
#final output layer
neuralnet.append(DenseLayer(200,10))

passing the neural network we created previously as a input and iterating through all the layers and calling forward function on each layer

In [0]:
def forward(neuralnet, x):
    ip = x
    acts = []
    for i in range(len(neuralnet)):
        acts.append(neuralnet[i].forward(x))
        x = neuralnet[i].forward(x)
    assert len(acts) == len(neuralnet)
    return acts


Predicting neural net predictions

In [0]:
def predict(neuralnet,x):
    lgs = forward(neuralnet,x)[-1]
    return lgs.argmax(axis=-1)

Now, we will train our network batch by batch x and y

In [0]:
def train(nn,x,y):
    
    # getting layer activations 
    layer_acts = forward(nn,x)
    lgs = layer_acts[-1]
    
    #lets compute init gradient and loss
    loss = softcentropy(lgs,y)
    l_grad = grad_softcentropy(lgs,y)
    for i in range(1, len(nn)):
        l_grad = nn[len(nn) - i].backward(layer_acts[len(nn) - i - 1], l_grad)
    return np.mean(loss)

Mini Batch Gradient Descent

In [0]:
def miniBatchMachineGun(ips, goals, sizeBatch, shuffle=False):
    assert len(ips) == len(goals)
    if shuffle:
        indexes = np.random.permutation(len(ips))
    for first_idx in trange(0, len(ips) - sizeBatch + 1, sizeBatch):
        if shuffle:
            ext = indexes[first_idx:first_idx + sizeBatch]
        else:
            ext = slice(first_idx, first_idx + sizeBatch)
        yield ips[ext], goals[ext]
        

Final training of the data and displaying the results


In [28]:
#two lists to hold the values of each cycle
training_data_logger = []

validation_data_logger = []

for eachCycle in range(5):

    #we are calling minibatch function that gives us batches of training data.

    for batchX,batchY in miniBatchMachineGun(x_train,y_train,sizeBatch=32,shuffle=True):

        train(neuralnet,batchX,batchY)
    
    #appending each cycle results accuracies to the lists
    training_data_logger.append(np.mean(predict(neuralnet,x_train)==y_train))

    validation_data_logger.append(np.mean(predict(neuralnet,x_val)==y_val))
    
    #printing the data results
    print("\n Round",eachCycle)
    print("\n Train accuracy:",training_data_logger[-1])
    print("\n Val accuracy:",validation_data_logger[-1])

100%|██████████| 1562/1562 [00:04<00:00, 353.81it/s]
  2%|▏         | 36/1562 [00:00<00:04, 355.15it/s]


 Round 0

 Train accuracy: 0.98984

 Val accuracy: 0.9551


100%|██████████| 1562/1562 [00:04<00:00, 359.48it/s]
  2%|▏         | 38/1562 [00:00<00:04, 369.10it/s]


 Round 1

 Train accuracy: 0.99352

 Val accuracy: 0.9577


100%|██████████| 1562/1562 [00:04<00:00, 365.92it/s]
  2%|▏         | 35/1562 [00:00<00:04, 346.38it/s]


 Round 2

 Train accuracy: 0.99296

 Val accuracy: 0.9565


100%|██████████| 1562/1562 [00:04<00:00, 360.88it/s]
  2%|▏         | 38/1562 [00:00<00:04, 372.88it/s]


 Round 3

 Train accuracy: 0.99136

 Val accuracy: 0.9514


100%|██████████| 1562/1562 [00:04<00:00, 362.34it/s]



 Round 4

 Train accuracy: 0.99188

 Val accuracy: 0.9534


Finally, lets get confusion matrix metrics

In [33]:
y_pred = predict(neuralnet,x_test)

from sklearn.metrics import confusion_matrix

array([7, 2, 1, ..., 4, 5, 6])

In [34]:
y_pred.shape

(10000,)

#confusion matrix

In [37]:
confusion_matrix(y_test,y_pred)

array([[ 954,    0,    1,    2,    0,    3,   15,    1,    4,    0],
       [   0, 1120,    3,    2,    1,    0,    3,    2,    4,    0],
       [   7,    2,  979,   10,    4,    1,    6,    3,   19,    1],
       [   1,    0,   12,  961,    0,   14,    3,    3,   13,    3],
       [   2,    2,    4,    0,  937,    0,   20,    1,    2,   14],
       [   6,    0,    2,   26,    4,  828,   10,    0,   10,    6],
       [   8,    3,    1,    0,    8,    4,  931,    0,    3,    0],
       [   3,    8,   22,    5,    4,    1,    0,  966,    4,   15],
       [   4,    0,    8,    6,    5,    3,   14,    4,  929,    1],
       [   7,    4,    1,    8,   26,    7,    1,   14,   12,  929]])

#accuracy score

In [38]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)

0.9534