<h2>Creating a Nerual Network without using ML libraries</h2>

<p>
This notebook will be going over creating a simple nerual network using the classifcal approach of using Stochastic Gradient Descent (SGD) with logistic/sigmoid actication functions and a sqaured mean error cost function.
</p>
<p>
This will be based on the free online book: <a href="http://neuralnetworksanddeeplearning.com/chap1.html">Neural Networks And Deep Learning</a>. This is a decent resource for an theredical and pratical introduction to deep learning with nural networks for beginners in this topic.
</p>
<p>
This is mainly going to be an exercise to brush up on some of the roots of neural networks. Sometimes we get so used to using ML libaries we can get rusty on the core implementations. I'll be incuding theory and code explanations as apart of this notebook.
</p>

In [2]:
# Random and Math libraries along with sigmoid and it's derivative 
import numpy as np
import random

<p>This Sigmoid function below ($\sigma$) is the activation function that each sigmoid applies to it's input. In otherwords this is the function ran on the input <code>z</code> to give a continuous output between 0 and 1. The derivative of this function is also defined which will be used later for backpropagation ($\sigma'$)</p>

In [3]:
class Network():
    def __init__(self, layers):
        self.layers = layers
        self.num_layers = len(layers)
        self.biases = [np.random.randn(x, 1) for x in layers[1:]]
        self.weights = [np.random.randn(x, y)
                        for y, x in zip(layers[:-1], layers[1:])]
    def sigmoid(self, z):
        return 1.0/(1.0+np.exp(-z))

    def sigmoid_derivative(self, z):
        return sigmoid(z)*(1-sigmoid(z))
        
net = Network([2, 3, 1]) # Example of what biases and weights look like

print("biases:")
print(net.biases)
print("weights:")
print(net.weights)

biases:
[array([[-1.09105595],
       [-0.39226673],
       [-0.03119774]]), array([[1.30886869]])]
weights:
[array([[-3.82196634,  0.74251693],
       [ 1.76050347,  0.34314825],
       [ 0.18219338,  1.81397955]]), array([[ 0.45351881, -1.13342251,  0.34959664]])]


<p>
Our network is first defined with an array containing the number of neurons for each layer called <code>layers</code>. Next the <code>biases</code> contain weights between layers. In this case <code>biases[0]</code> is the inital biases for the second layer (the first layer is the input layer and doesn't have biases), and <code>biases[1]</code> is for the output layer.
</p>

In [4]:
class Network(Network):
    def feed_forward(self, x):
        for bias_layer, weight_layer in zip(self.biases, self.weights):
            x = sigmoid(np.dot(weight_layer, x) + bias_layer)
        return x    

<p>
The method <code>feed_forward</code> takes the input of the network being <code>x</code> and feeds it through the network for an output. This iterates through each layer and applies the activation function being the sigmoid on the input being:

$$Output = \sigma(\vec{weight}\cdot \vec{x} + \vec{bias})$$

In [7]:
class Network(Network):
    # This makes random min batches to apply GD to
    def SGD(self, training, learning_rate, epochs, batch_size):
        for i in range(epochs):
            random.shuffle(training) # Shuffling makes subarrays a random batch
            batches = []
            # Then we can each batch getting each subarray of length batch_size
            for b in range(0, len(training), batch_size):
                batches.append(training[b:b+batch_size])
            for batch in batches:
                self.GD(batch, learning_rate) # Apply GD on each batch
            print("epoch {0}/{1}".format(i, epochs)) # Print epoch we are on 
    
    # Applys GD to a mini batch
    def GD(batch, learning_rate):
        # A 0'd version of weights and biases to keep track of delta changes
        change_w = [np.zeros(w.shape) for w in self.weights]
        change_b = [np.zeros(b.shape) for b in self.biases]
        for x, y in batch:
            # Runs backpropagation to get delta's for updating w and b 
            chg_w_delta, chg_b_delta = self.backpropagate(x, y)
            # Go through store all the delta's in the 0'd structures
            change_w = [nw+nwd for nw, nwd in zip(change_w, chg_w_delta)]
            change_b = [nb+nbd for nb, nbd in zip(change_b, chg_b_delta)]
        # Updates the network weights and biases with the changes
        effect = learning_rate/len(batch)
        self.weights = [w-effect*nw for w, nw in zip(self.weights,change_w)]
        self.biases = [b-effect*nb for b, nb in zip(self.biases,change_b)]
        
    # Applys backpropagation to the network back to front    
    def backpropagate(self, x, y):
        # A 0'd version of weights and biases to keep track of delta changes
        change_w = [np.zeros(w.shape) for w in self.weights]
        change_b = [np.zeros(b.shape) for b in self.biases]
        # Feedforward and apply activation functions
        activations, a = ([x], x)
        zs = [] # Stores all z vectors
        for w, b in zip(self.weights, self.biases):
            z = np.dot(w, a) + b
            zs.append(z)
            a = self.sigmoid(z)
            activations.append(a)
        # Feed backwards to find delta's starting with last layer
        cost_derivative = activations[-1] - y
        delta = cost_derivative * self.sigmoid_derivative(zs[-1])
        change_w[-1] = np.dot(delta, activations[-2].transpose())
        change_b[-1] = delta
        # Now lets do the same with the rest of the layers (l)
        for l in range(2, self.num_layers):
            z = zs[-1]
            layer_weights = self.weights[-l+1].transpose()
            layer_activations = activations[-l-1].transpose()
            delta = np.dot(layer_weights, delta) * self.sigmoid_derivative(z)
            change_w[-1] = np.dot(delta, layer_activations)
            change_b[-1] = delta
        return (change_w, change_b)
    
    # Gives the number of results correct given test data by returning the
    # Output layer that has the highest activation output
    def results(self, test):
        results = [(np.argmax(self.feed_forward(x)), y) for (x, y) in test]
        return sum(int(x == y) for (x,y) in results)

<p>
Above the method <code>SGD</code> which splits the training data into groups of random sets of a user defined size. Then <code>GD</code> creates an empty structure to keep track the changes in the delta's and updates the weights and biases for the delta's. The method <code>backpropagate</code> is what takes the mini batch of training data and ther correct response and applies backpropagation through the network to get the delta the weights and biases should be changed by.
</p>
<p>
The last method <code>results</code> simply will go through the input data and feed it through the network. It gets the output nuron which returns the highest value and uses that as the selection. If the selection matches it counts as correct, otherwise it's wrong.
</p>

<p>Next lets load in the MNIST dataset for training. I will be using a Keras module to easily load this in purly so we don't need to download a local copy for this example.</p>

In [8]:
import keras
from keras.datasets import mnist
from keras import backend as K

def loadMNIST():
    # input image dimensions
    rows, cols = 28, 28
    (x_train, y_train), (x_test, y_test) = mnist.load_data()

    if K.image_data_format() == 'channels_first':
        x_train = x_train.reshape(x_train.shape[0], 1, rows, cols)
        x_test = x_test.reshape(x_test.shape[0], 1, rows, cols)
        input_shape = (1, rows, cols)
    else:
        x_train = x_train.reshape(x_train.shape[0], rows, cols, 1)
        x_test = x_test.reshape(x_test.shape[0], rows, cols, 1)
        input_shape = (rows, cols, 1)

    x_train = x_train.astype('float32')
    x_test = x_test.astype('float32')
    x_train /= 255
    x_test /= 255
    print('x_train shape:', x_train.shape)

    # convert class vectors to binary class matrices
    y_train = keras.utils.to_categorical(y_train, num_classes)
    y_test = keras.utils.to_categorical(y_test, num_classes)
    return (x_train, y_train, x_test, y_test, input_shape)

Using TensorFlow backend.


ModuleNotFoundError: No module named 'tensorflow'