Our objective is to build a neural network for the classification of the MNIST dataset. This neural network will comprise of: an output layer with 10 nodes, a hidden layer of 128 nodes, and an input layer with 784 nodes corresponding to the image pixels. The specific structure of the neural network is outlined below, where $X$ represents the input, $A^{[0]}$ denotes the first layer, $Z^{[1]}$ signifies the unactivated layer 1, $A^{[1]}$ stands for the activated layer 1, and so forth. The weights and biases are represented by $W$ and $b$ respectively:

<div align="center">

$A^{[0]}=X$

$Z^{[1]}=W^{[1]}A^{[0]}+b^{[1]}$

$A^{[1]}=\text{ReLU}(Z^{[1]})$

$Z^{[2]}=W^{[2]}A^{[1]}+b^{[2]}$

$A^{[2]}=\text{softmax}(Z^{[2]})$

$Loss=\text{cross-entropy-loss}(A^{[2]})$
</div>

You have the flexibility to create any function within or outside the class, allowing you to modify parameters as needed

In [1]:
import pandas as pd
import numpy as np
import tensorflow as tf
mnist = tf.keras.datasets.mnist
import matplotlib.pyplot as plt

Now you will implement the activation function(ReLU) and softmax function

In [5]:
def ReLU(Z):
    return np.maximum(0, Z)

def softmax(Z):
    expZ = np.exp(Z - np.max(Z, axis=0, keepdims=True))
    return expZ / expZ.sum(axis=0, keepdims=True)

Now comes the important part.

In this, you will implement the NN class, which will be the model you will be using to train data on and later use it to predict.

You have been given the init function, you have to implement all the other functions yourself, in any way you like ... you may even skip some of them if you don't need them in the final implementation of the class.

The description of each function has been given in the comments

In [6]:
class NN:
    def __init__(self, input_size=784, hidden_size=128, output_size=10, learning_rate=0.01):
        # Initialized basic stats of NN
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.learning_rate = learning_rate

        # Initialized weights and biases
        self.W1 = np.random.randn(hidden_size, input_size) * 0.01
        self.b1 = np.zeros((hidden_size, 1))
        self.W2 = np.random.randn(output_size, hidden_size) * 0.01
        self.b2 = np.zeros((output_size, 1))

        # Initialized activations and gradients
        self.A0 = None
        self.Z1 = None
        self.A1 = None
        self.Z2 = None
        self.A2 = None
        self.dW2 = None
        self.db2 = None
        self.dW1 = None
        self.db1 = None


    # do the forward pass and evaluate the values of A0, Z1, A1, Z2, A2
    def forward_propagation(self, X):
        self.A0 = X
        self.Z1 = np.dot(self.W1, self.A0) + self.b1
        self.A1 = ReLU(self.Z1)
        self.Z2 = np.dot(self.W2, self.A1) + self.b2
        self.A2 = softmax(self.Z2)
        return self.A2

    # convert the input y, into a one hot encoded array.
    '''
    one hot encoding is:
    you have an array with values [2, 5, 6] and you know the max value can be 8, then one hot encoded array will be:
    [[0,0,1,0,0,0,0,0,0], [0,0,0,0,0,1,0,0,0], [0,0,0,0,0,0,1,0,0]]
    Note that the index 2, 5, 6 have values 1 and all others have values 0
    '''
    def one_hot(self, y):
        one_hot_y = np.eye(self.output_size)[y.reshape(-1)]
        return one_hot_y.T

    # calculate the derivative of the loss function with respect to W2, b2, W1, b1 in dW2, db2, dW1, db1 respectively
    def backward_propagation(self, X, y):
        m = X.shape[1]
        one_hot_y = self.one_hot(y)
        self.dZ2 = self.A2 - one_hot_y
        self.dW2 = np.dot(self.dZ2, self.A1.T) / m
        self.db2 = np.sum(self.dZ2, axis=1, keepdims=True) / m
        self.dA1 = np.dot(self.W2.T, self.dZ2)
        self.dZ1 = self.dA1 * (self.Z1 > 0)
        self.dW1 = np.dot(self.dZ1, self.A0.T) / m
        self.db1 = np.sum(self.dZ1, axis=1, keepdims=True) / m

    # update the parameters W1, W2, b1, b2
    def update_params(self):
        self.W1 -= self.learning_rate * self.dW1
        self.b1 -= self.learning_rate * self.db1
        self.W2 -= self.learning_rate * self.dW2
        self.b2 -= self.learning_rate * self.db2

    # get the predictions for the dataset
    def get_predictions(self):
        A2 = self.forward_propagation(X)
        return np.argmax(A2, axis=0)

    # get the accuracy of the model on the dataset
    def get_accuracy(self, X, y):
        predictions = self.get_predictions(X)
        return np.mean(predictions == y)

    # run gradient descent on the model to get the values of the parameters
    def gradient_descent(self, X, y, iters=1000):
         for i in range(iters):
            self.forward_propagation(X)
            self.backward_propagation(X, y)
            self.update_params()
            if i % 100 == 0:
                print(f"Iteration {i}: Loss = {self.cross_entropy_loss(X, y)}")

    # evaluate loss using cross-entropy-loss formula.
    def cross_entropy_loss(self, X, y):
        m = X.shape[1]
        one_hot_y = self.one_hot(y)
        log_likelihood = -np.log(self.A2[y, range(m)])
        loss = np.sum(log_likelihood) / m
        return loss

    # Let me help a bit hehe :)
    def show_predictions(self, X, y, num_samples=10):
        random_indices = np.random.randint(0, X.shape[0], size=num_samples)

        for index in random_indices:
            sample_image = X[index, :].reshape((28, 28))
            plt.imshow(sample_image, cmap='gray')
            plt.title(f"Actual: {y[index]}, Predicted: {self.get_predictions()[index]}")
            plt.show()

Now we are splitting the dataset into training and testing data

In [7]:
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()

miu = np.mean(X_train, axis=(0, 1), keepdims=True)
stds = np.std(X_train, axis=(0, 1), keepdims=True)

mius = np.mean(X_test, axis=(0, 1), keepdims=True)
stdse = np.std(X_test, axis=(0, 1), keepdims=True)

X_normal_train = (X_train - miu) / (stds + 1e-7)
X_normal_test = (X_test - mius) / (stdse + 1e-7)

X_normal_train = X_normal_train.reshape((60000, -1))
X_normal_test = X_normal_test.reshape((10000, -1))

Now you will train the model on X_normal_train and Y_train dataset

Then print the accuracy of model on X_normal_test and Y_test dataset

In [11]:
Y_train_oh = np.eye(10)[Y_train].T
Y_test_oh = np.eye(10)[Y_test].T

model = NN()

model.gradient_descent(X_normal_train, Y_train, iters=1000)

train_accuracy = model.get_accuracy(X_normal_train, Y_train)
test_accuracy = model.get_accuracy(X_normal_test, Y_test)

print("Train Accuracy : " + str(train_accuracy*100))
print("Test Accuracy : " + str(test_accuracy*100))

ValueError: shapes (128,784) and (60000,784) not aligned: 784 (dim 1) != 60000 (dim 0)