# Neural Network Module
A module to implement the stochastic gradient descent learning
algorithm for a feedforward neural network.  Gradients are calculated
using backpropagation. The aim is not to create an optimized NN module with all bells and whistles, but rather a simple implementation of the method that will help me understand its inner workings.

In [3]:
# Libraries
import numpy as np
import random

Define a class `Network` which will be used to represent a neural network.

Initializes with `sizes` matching a list of number of neurons in respective layers. E.g.: `[2,3,1]` would mean a three-layer network, with first layer containing 2 neurons, second layer with 3 neurons, and the third layer a single neuron.

Biases and weights are initializes from gaussian distributions. Better starting points will be incorporated in the code later. Since the first layer (the input layer) has no bias/weight, they are not assigned for `sizes[0]`.

In [5]:
class Network(object):
    
    def __init__(self, sizes):
        self.num_layers = len(sizes)
        self.sizes = sizes
        self.biases = [np.random.randn(y,1) for y in sizes[1:]]
        self.weights = [np.random.randn(y,x)
                       for x,y in zip(sizes[:-1],sizes[1:])]

In [16]:
net = Network([2,3,1])
net.biases

[array([[0.04229294],
        [1.45833354],
        [0.17911392]]),
 array([[-1.00109272]])]

The above arrays show the randomly generated biases. The first array are the bias from the 3 neurons in the second layer to the neuron in the final layer. The second array is the bias of the output layer.

In the next step, I define the sigmoid activation function and the feedforward method to the Network class.

$a' = \sigma((w \cdot a) + b)$

In [20]:
    def sigmoid(z):
        return 1.0/(1.0+np.exp(-z))
    
    def feedforward(self, a):
        for b,w in zip(self.biases, self.weights):
            a = sigmoid(np.dot(w,a)+b)
        return a

Now I implement the stochastic gradient descent method. 

In [22]:
    def SGD(self, train_dat,epochs,batch_size,eta,test_dat=None):
        
        train_dat = list(train_dat)
        n = len(train_dat)
        
        if test_dat: 
            test_dat = list(test_dat)
            n_test = len(test_dat)
            
        for j in range(epochs):
            random.shuffle(train_dat)
            mini_batches = [
                train_dat[k:k+batch_size] for k in range(0,n,batch_size)]
            for mini_batch in mini_batches:
                self.update_mini_batch(mini_batch,eta)
            if test_dat:
                print("Epoch {} : {}/{}".format(j,self.evaluate(test_dat),n_test));
            else:
                print("Epoch {} complete".format(j))         

 For each epoch, the `SGD` program shuffles the training data and creates mini batches based on the `batch_size`. Then for each mini batch a single step of gradient descent is applied (this is done by the `update_mini_batch()` code that will be written in the next chunk). It basically updates the network weights and bias. `test_dat` is an optional argument. The `SGD` program will evaluate the network on the test data after each epoch, if it has been supplied.
 
Lets define the `update_mini_batch()` program.

In [None]:
    def update_mini_batch(self,mini_batch,eta):
        
        