# Neural Networks from Scratch

Using a neural net to recognize handwritten digits.

## Perceptrons

A perceptron is a type of neuron which in no longer in use by modern neural netwoks. Modern neural networks use sigmoid neurons. A perceptron takes several binary inputs and produces a single binary output. To compute the output of a perceptron the perceptron introduces weights (w), real numbers expressing the importance of the respective input to the output. The output of the perceptron depends on whether the weighted sum of the input and weights is greater than a threshold.

## Sigmoid Neurons

The goal is to change the values of the weights and biases by a small amount so that the output of the network also changes by a small amount. This isn't possible with perceptrons since a small change in the value of a bias or weight of a perceptron can cause the perceptron to flip from one value to the other for example from 0 to 1 which would cause the rest of the network to change the behaviour compleately. This can be overcome by the introduction of the Sigmoid neuron which is similar to the perceptron but the difference is that it accepts inputs other than 0s and 1s and the output is not just the weigted sum of the inputs and bias but it is inserted into the sigmoid function which is defined by:

\begin{equation}
\sigma(z) = \frac{1}{1 + e^{-z}}
\end{equation}

This just means that if z is a large positive number than the output of the sigmoid neuron is close to 1, and if it is very negative then it is close to 0, but when it is a modes size number it will give a number in between 0 and 1 which is different from the perceptron.

In [1]:
import numpy as np

In [10]:
import random
import numpy as np

class Network(object):

    def __init__(self, sizes):
        self.num_layers = len(sizes)
        self.sizes = sizes
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        self.weights = [np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])]

    def feedforward(self, a):
        for b, w in zip(self.biases, self.weights):
            a = sigmoid(np.dot(w, a)+b)
        return a

    def SGD(self, training_data, epochs, mini_batch_size, eta, test_data=None):
        if test_data: n_test = len(test_data)
        n = len(training_data)
        for j in range(epochs):
            random.shuffle(training_data)
            mini_batches = [training_data[k:k+mini_batch_size] for k in range(0, n, mini_batch_size)]
            for mini_batch in mini_batches:
                self.update_mini_batch(mini_batch, eta)
            if test_data:
                print("Epoch {0}: {1} / {2}".format(j, self.evaluate(test_data), n_test))
            else:
                print("Epoch {0} complete".format(j))

    def update_mini_batch(self, mini_batch, eta):
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        for x, y in mini_batch:
            delta_nabla_b, delta_nabla_w = self.backprop(x, y)
            nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
            nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
        self.weights = [w-(eta/len(mini_batch))*nw for w, nw in zip(self.weights, nabla_w)]
        self.biases = [b-(eta/len(mini_batch))*nb for b, nb in zip(self.biases, nabla_b)]

    def backprop(self, x, y):
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        activation = x
        activations = [x]
        zs = []
        for b, w in zip(self.biases, self.weights):
            z = np.dot(w, activation)+b
            zs.append(z)
            activation = sigmoid(z)
            activations.append(activation)
        delta = self.cost_derivative(activations[-1], y) * \
            sigmoid_prime(zs[-1])
        nabla_b[-1] = delta
        nabla_w[-1] = np.dot(delta, activations[-2].transpose())
        for l in range(2, self.num_layers):
            z = zs[-l]
            sp = sigmoid_prime(z)
            delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
            nabla_b[-l] = delta
            nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
        return (nabla_b, nabla_w)

    def evaluate(self, test_data):
        test_results = [(np.argmax(self.feedforward(x)), y) for (x, y) in test_data]
        return sum(int(x == y) for (x, y) in test_results)

    def cost_derivative(self, output_activations, y):
        return (output_activations-y)

def sigmoid(z):
    return 1.0/(1.0+np.exp(-z))

def sigmoid_prime(z):
    return sigmoid(z)*(1-sigmoid(z))

In [11]:
from tensorflow.keras.datasets import mnist

def load_data():
    (x_train, y_train), (x_test, y_test) = mnist.load_data()

    split_index = 50000
    x_train, x_val = x_train[:split_index], x_train[split_index:]
    y_train, y_val = y_train[:split_index], y_train[split_index:]

    training_data = (x_train.reshape(50000, 784).astype('float32') / 255, y_train.astype('int32'))
    validation_data = (x_val.reshape(10000, 784).astype('float32') / 255, y_val.astype('int32'))
    test_data = (x_test.reshape(10000, 784).astype('float32') / 255, y_test.astype('int32'))
    
    return (training_data, validation_data, test_data)

def load_data_wrapper():
    tr_d, va_d, te_d = load_data()

    training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]]
    training_results = [vectorized_result(y) for y in tr_d[1]]
    training_data = list(zip(training_inputs, training_results))

    validation_inputs = [np.reshape(x, (784, 1)) for x in va_d[0]]
    validation_data = list(zip(validation_inputs, va_d[1]))

    test_inputs = [np.reshape(x, (784, 1)) for x in te_d[0]]
    test_data = list(zip(test_inputs, te_d[1]))
    
    return (training_data, validation_data, test_data)

def vectorized_result(j):
    e = np.zeros((10, 1))
    e[j] = 1.0
    return e

In [12]:
net = Network([784,30,10])

In [13]:
training_data, validation_data, test_data = load_data_wrapper()

In [14]:
net.SGD(training_data, 30, 10, 3.0,test_data=test_data)

Epoch 0: 9025 / 10000
Epoch 1: 9244 / 10000
Epoch 2: 9279 / 10000
Epoch 3: 9333 / 10000
Epoch 4: 9361 / 10000
Epoch 5: 9345 / 10000
Epoch 6: 9410 / 10000
Epoch 7: 9403 / 10000
Epoch 8: 9445 / 10000
Epoch 9: 9405 / 10000
Epoch 10: 9440 / 10000
Epoch 11: 9432 / 10000
Epoch 12: 9472 / 10000
Epoch 13: 9475 / 10000
Epoch 14: 9487 / 10000
Epoch 15: 9464 / 10000
Epoch 16: 9470 / 10000
Epoch 17: 9494 / 10000
Epoch 18: 9492 / 10000
Epoch 19: 9478 / 10000
Epoch 20: 9474 / 10000
Epoch 21: 9502 / 10000
Epoch 22: 9492 / 10000
Epoch 23: 9510 / 10000
Epoch 24: 9490 / 10000
Epoch 25: 9482 / 10000
Epoch 26: 9509 / 10000
Epoch 27: 9503 / 10000
Epoch 28: 9503 / 10000
Epoch 29: 9513 / 10000
