In [1]:
import numpy as np
np.random.seed(0)

# 1. Introduction

This is a notebook to make a working Numpy Neural Network and discuss its working. All variable names will be in bold when referred to in markdown cells.

## 1.1. Setup

First thing we need to do is create the data set that we need. For that, we need to decide how we take the input and how we give the output. In this case, I have decided to take input as a string with 5 numbers separated by spaces. The first two and will be the first binary number we need and the third and fourth digits will be the second binary number. The last number will decide whether we want an XOR or an XNOR. The output will be an array of two numbers which will be our binary result.

I'm going to use numpy arrays to represent inputs and outputs. The input will be a numpy array of shape 5 and the output/label array will have a size of 2.

For this, we'll need to make a function to compute the XOR and another to compute the XNOR. Once we do that, we generate the required dataset with a function which will output our training set __train__ and their respective __labels__

In [2]:
# We send an input of a numpy array sized 4 which contains the input binary numbers.

def xor(arr):
    res = [0, 0]
    
    if arr[0] != arr[2]:
        res[0] = 1
    
    if arr[1] != arr[3]:
        res[1] = 1
    
    return res

def xnor(arr):
    res = [0, 0]
    
    if arr[0] == arr[2]:
        res[0] = 1
    
    if arr[1] == arr[3]:
        res[1] = 1
        
    return res

In [3]:
"""
Now we create a function to map all possible values of the inputs to all their respective labels 
and create two numpy arrays shaped (32, 5) and (32, 2) respectively called train and labels.
"""

def createTrain():
    train = np.zeros((32, 5))
    labels = np.zeros((32, 2))
    
    temp = np.zeros((1, 7))
    count = 0
    
    for a1 in range(2):
        for a2 in range(2):
            for a3 in range(2):
                for a4 in range(2):
                    for choice in range(2):
                        if choice is 0:
                            temp = np.array([a1, a2, a3, a4] + [choice])
                            train[count] = temp
                            labels[count] = xor([a1, a2, a3, a4])
                            count += 1
                        else:
                            temp = np.array([a1, a2, a3, a4] + [choice])
                            train[count] = temp
                            labels[count] = xnor([a1, a2, a3, a4])
                            count += 1
    
    return (train.astype(int), labels.astype(int))    

In [4]:
train, labels = createTrain()

## 1.2. Structure of Neural Network

Now, we need to decide how our Neural Network is going to look. In this case, I'm going to take a simple feed forward neural network with 2 hidden layers. Our input layer will have a dimension of 5 and both the hidden layers will have dimensions of 8 each and the output layer will have dimensions 2. The activation functions for all of them will be ReLU except for the last layer which will have a sigmoid so that our final values will be between 0 and 1.

For clarity, we'll declare a variable called __nn_structure__ to define these layers.

In [5]:
nn_structure = [{"input_dim" : 5, "output_dim" : 8, "activation" : 'sigmoid'},
                {"input_dim" : 8, "output_dim" : 8, "activation" : 'sigmoid'},
                {"input_dim" : 8, "output_dim" : 2, "activation" : 'sigmoid'}]

## 1.3. Activation Functions

Let's define the activation functions and their derivatives which will be needed for backward

In [13]:
def relu(Z):
    return np.maximum(0, Z)

def sigmoid(Z):
    return 1 / (1 + np.exp(-Z))

def softmax(Z):
    return np.exp(Z) / float(sum(np.exp(Z)))

def relu_backward(dA, Z):
    dA[Z < 0] = 0
    return dA

def sigmoid_backward(dA, Z):
    return dA * sigmoid(Z) * (1 - sigmoid(Z))

SyntaxError: invalid syntax (<ipython-input-13-78cb1e686247>, line 7)

# 2. Setup Neural Network

Now that we have setup our datasets and know what our neural network is going to look like, let's create a model class. So our model class needs to train on the given dataset and then be able to take input and then predict the output. 

For this we'll need the following functions:
1. \_\_init\_\_ : This will store the structure required and create some useful variables
2. init_layers : This will initialise weights and biases randomly
3. forward : This will only go through one layer
4. full_forward : This will apply forward on all the layers
5. loss : This will calculate the cross-entropy loss for each entry
6. backward : This will go back one layer and store what needs to be updated
7. full_backward : This will apply backward on all layers
8. update : This will update all our parameters by Gradient Descent
9. get_accuracy : This will calculate accuracy of our predictions by comparing with the respective labels

Our weights, biases and calculated neuron values of the next layer will be represented by _W , b and Z_ followed by the index of the layer (starting from layer 1) they are in. The actual values of the layers (activated neuron values) will be represented by A followed by the index of the layer (starting from layer 0). The final value will be represented by __A__ which will be our output. _A0_ will represent the actual neuron values of the first layer (the input)

In [7]:
class NumpyNeuralNetwork():
    def __init__(self, nn_structure):
        self.structure = nn_structure
        self.parameters = {}
        self.memory = {}
        self.update = {}
        self.cost_history = []
        self.accuracy_history = []
        
        for i, layer in enumerate(self.structure):
            layer_index = i + 1

            input_dim = layer["input_dim"]
            output_dim = layer["output_dim"]

            self.parameters['W' + str(layer_index)] = np.random.randn(output_dim, input_dim) * 0.1
            self.parameters['b' + str(layer_index)] = np.random.randn(output_dim, 1) * 0.1
    
    # A_ represents the neuron values of the current layer. A represents the neuron values of the next layer
    def forward(self, A_, W, b, activation):
        Z = np.dot(W, A_) + b

        if activation is "relu":
            act = relu
        elif activation is "sigmoid":
            act = sigmoid
        else:
            raise Exception("Activation not found")
        
        A = act(Z)
        
        return (A, Z)
    
    # In the following function, X is our input in the form of a numpy array of size 5
    def full_forward(self, X):
        A = X

        for i, layer in enumerate(self.structure):
            layer_index = i + 1

            A_ = A

            act = layer["activation"]

            W = self.parameters['W' + str(layer_index)]
            b = self.parameters['b' + str(layer_index)]

            A, Z = self.forward(A_, W, b, act)
            
            self.memory['A' + str(i)] = A_
            self.memory['Z' + str(layer_index)] = Z

        return A
    
    # Next, we'll calculate the cross entropy loss for our parameters in the next function
    def loss(self, A, Y):
        cost = np.sum(np.dot(Y, np.log(A).T) + np.dot(1-Y, np.log(1-A).T))
        return np.squeeze(cost)
    
    def MSEloss(self, A, Y):
        cost = 1 / 2 * np.sum((A - Y)**2)
        return np.squeeze(cost)

    # Next, we'll calculate the backward by differentiating the cross entropy loss
    def backward(self, dA, W, b, Z, A_, activation):
        if activation is "relu":
            act = relu_backward
        elif activation is "sigmoid":
            act = sigmoid_backward
        else:
            raise Exception("Activation not found")

        dZ = act(dA, Z)
        dW = np.dot(dZ, A_.T)
        db = np.sum(dZ, axis=1, keepdims=True)
        dA_ = np.dot(W.T, dZ)

        return dA_, dW, db
    
    # Same as the full_forward function, we make a corresponding full_backward function
    def full_backward(self, A, labels):
        labels = labels.reshape(A.shape)

        #dA_ = - (A - labels)
        dA_ = - (np.divide(labels, A) - np.divide(1 - labels, 1 - A))

        for i_, layer in reversed(list(enumerate(self.structure))):
            i = i_ + 1
            act = layer["activation"]

            dA = dA_

            A_ = self.memory["A" + str(i_)]
            Z = self.memory["Z" + str(i)]

            W = self.parameters["W" + str(i)]
            b = self.parameters["b" + str(i)]

            dA_, dW, db = self.backward(dA, W, b, Z, A_, act)

            self.update["dW" + str(i)] = dW
            self.update["db" + str(i)] = db
    
    # Now that we have calculated the backward for all the parameters, we update them
    def update_parameters(self, learning_rate):
        for i, layer in enumerate(self.structure):
            layer_index = i + 1

            self.parameters['W' + str(layer_index)] -= learning_rate * self.update['dW' + str(layer_index)]
            self.parameters['b' + str(layer_index)] -= learning_rate * self.update['db' + str(layer_index)]
     
    # We could use a function to calculate the accuracy of the model too
    def get_accuracy(self, A, labels, x=0.5):
        A_ = np.array(A, copy=True)
        A_[A_ < x] = 0
        A_[A_ >= x] = 1

        return (A_ == labels).mean()
    
    # Let's make a function to train the model
    def train(self, train, labels, num_epochs, lr=0.1):
        self.cost_history, self.accuracy_history = [], []
        for _ in range(num_epochs):
            for i in range(len(train)):
                X = train[i].reshape((-1, 1))
                Y = labels[i].reshape((-1, 1))

                A = self.full_forward(X)

                self.full_backward(A, Y)
                
                self.update_parameters(lr)

                accuracy = self.get_accuracy(A, Y, 0.5)
                self.accuracy_history.append(accuracy)

            cost = self.MSEloss(A, Y)
            print(cost)
            self.cost_history.append(cost)
    
    # Lastly, let's make a function so that we can pass any random input
    def __call__(self, X):
        X = np.array([int(x) for x in X.split(' ')]).reshape((-1, 1))
        
        A = X
        
        for i, layer in enumerate(self.structure):
            layer_index = i + 1

            A_ = A

            act = layer["activation"]

            W = self.parameters['W' + str(layer_index)]
            b = self.parameters['b' + str(layer_index)]

            A, Z = self.forward(A_, W, b, act)
        
        print(A)
        A[A > 0.5] = 1
        A[A <= 0.5] = 0

        return A

In [8]:
model = NumpyNeuralNetwork(nn_structure)

In [9]:
model.train(train, labels, 3, lr=0.2)

0.33144915212718484
0.3239250221836148
0.3174885989523412


In [10]:
acc = np.array(model.accuracy_history)

In [11]:
acc[acc==0.5] = 0
acc

array([0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0.,
       0., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0.,
       0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0.,
       0., 1., 0., 1., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0.,
       1., 0., 1., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 1.,
       0., 1., 0., 0., 0., 0., 0., 1., 0., 1., 0.])

In [12]:
acc.mean()

0.25