Importing NumPy and DatasetGenerator class which contains custom function for handling MNIST

In [1]:
import numpy as np
from dataset_generator import DatasetGenerator

ReLU activation and gradient function; the activation function gets an array and sets the data inside it to non-negative values. The gradient function converts a given array into a boolean based array where each value above 0 is set to 1 (True) or 0 (False) if lower

In [2]:
def get_relu(func, activation=False, gradient=False):
    if activation and not gradient:
        relu = np.maximum(0, func)
        return relu
    elif not activation and gradient:
        gradient = np.where(func > 0, 1, 0)
        return gradient
    else:
        return None

Sigmoid activation and gradient function; the activation function gets an array and dervives each element in it with the mathimatical function 1 divided by the sum of 1 divided by the negative value of each array element. The gradient function reuses the sigmoid activation function but supplements it with a multiplication of derivative Sigmoid's gradient

In [3]:
def get_sigmoid(func, activation=False, gradient=False):
    sigmoid = 1 / (1 + np.exp(-func))
    if activation and not gradient:
        return sigmoid
    elif not activation and gradient:
        gradient = sigmoid * (1 - sigmoid)
        return gradient
    else:
        return None

Only the Softmax gradient; have had trouble implementing the activation code. The gradient assumes the model of dataset labels and gets the sample size of MNIST as the batch size, it then calculates the gradient by calculating the difference between the model (prediction) and the real answers (truth labels) which is then divided by the batch size obtained above.

In [4]:
def get_softmax(func=None, model_labels=None, truth_labels=None, activation=False, gradient=False):
    batch_size = model_labels.shape[0]
    gradient = (model_labels - truth_labels) / batch_size
    return gradient

Initialisation of the neural network with hyperparameters and help from the DatasetGenerator class to calculate the input and output layers, total layer architecture is then built using user choice of first and second hidden layers.

The biases calculation is done by first creating an range of non-negative numbers which are then used to create an array where each element is neuron connected to the next layer.

The weights are calculated creating a maxtrix of random integers based on the layers given; each value is then scaled to the squareroot of half the value of the current layer in the interation.

In [5]:
class NeuralNetwork:
    def __init__(self, hidden_layers, activation):
        input_layers, output_layers = DatasetGenerator().get_layers()
        hidden_layers.insert(0, input_layers)
        hidden_layers.append(output_layers)

        self.layers = hidden_layers
        self.num_layers = len(self.layers)
        self.biases = [np.zeros((1, self.layers[i + 1])) for i in range(self.num_layers - 1)]
        self.activation = activation
        self.activations = []
        self.vect_transfer_list = []

        self.weights = []
        for layer in range(self.num_layers - 1):
            random_matrix = np.random.randn(self.layers[layer], self.layers[layer + 1])
            scale = np.sqrt(2 / self.layers[layer])
            self.weights.append(random_matrix * scale)

Forward pass returning the output which is the dot product of the weight of the current layer added with the value of the bias value of the same layer. This vector transformation is added to a list of transformations - then depending on the choice of activation (sigmoid or relu) the output is "activated" and added into the list of activated outputs.

In [6]:
    def forward(self, data):
        self.activations = []
        self.vect_transfer_list = []

        outputs = data
        self.activations.append(outputs)

        for layer in range(self.num_layers - 1):
            i = np.dot(outputs, self.weights[layer])
            vect_transfer = i + self.biases[layer]
            self.vect_transfer_list.append(vect_transfer)

            if self.activation == 'sigmoid':
                outputs = get_sigmoid(vect_transfer, activation=True)
            elif self.activation == 'relu':
                outputs = get_relu(vect_transfer, activation=True)

            self.activations.append(outputs)

        return outputs

Backward pass calculates the gradients from the output layer back to the input layer (reverse order of layers). The gradients take in the delta sum of the current and next layers then adjusting the weights and biases of the traversed layers to reduce loss for when the epoch is set initialised. The delta, error rate, between current and previous layers are calculated used softmax gradient

In [7]:
    def backward(self, data, truth_labels, model_labels, learning_rate):
        batch_size = data.shape[0]
        gradients_weights = []
        gradients_biases = []

        if self.activation == 'sigmoid':
            activation_gradient = get_sigmoid
        elif self.activation == 'relu':
            activation_gradient = get_relu

        delta = get_softmax(model_labels=model_labels, truth_labels=truth_labels, gradient=True)

        for layer in range(self.num_layers - 1, 0, -1):
            output = self.activations[layer - 1]
            vect_transfer = self.vect_transfer_list[layer - 1]

            gradients_vect_transform = activation_gradient(vect_transfer, gradient=True)
            gradients_vect_transform = gradients_vect_transform * delta

            gradients_weight = np.dot(output.T, gradients_vect_transform)
            gradients_weight = gradients_weight / batch_size

            gradients_bias = np.sum(gradients_vect_transform, axis=0, keepdims=True)
            gradients_bias = gradients_bias / batch_size

            delta = np.dot(gradients_vect_transform, self.weights[layer - 1].T)

            gradients_weights.append(gradients_weight)
            gradients_biases.append(gradients_bias)

        gradients_weights.reverse()
        gradients_biases.reverse()

        for layer in range(self.num_layers - 1):
            self.weights[layer] -= learning_rate * gradients_weights[layer]
            self.biases[layer] -= learning_rate * gradients_biases[layer]

The loss calculation and prediction functions are both using the labels from the data subsets to:

    Divide the negative sum of labels (truth) multiplied by non-infinity log values of predicited labels (model) divided by the number of samples in the data subset (this is a interpertation of cross-entropy loss)
    
    Run the predicted answers through the nueral network and identify the answers using the highest score for the given label

In [8]:
    def calculate_loss(self, data, truth_labels):
        model_labels = self.forward(data)
        loss = -np.sum(truth_labels * np.log(model_labels + 1e-10)) / data.shape[0]
        return loss

    def predict(self, data):
        model_label = self.forward(data)
        return np.argmax(model_label, axis=1)