# Recognizing Handwritten Digits using Neural Networks

http://neuralnetworksanddeeplearning.com/chap1.html

This is a program that will learn how to **_recognize handwritten digits_** using **_stochastic gradient descent_** and the **_MNIST training data_** found [here](https://github.com/mnielsen/neural-networks-and-deep-learning/archive/master.zip).

### The Neural Network

The centerpiece is a 'Network' class, which we use to represent a neural network.

In [1]:
import numpy as np

class Network(object):

    def __init__ (self, sizes):
        self.num_layers = len(sizes)
        self.sizes = sizes
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        self.weights = [np.random.randn(y, x)
                        for x, y in zip(sizes[:-1], sizes[1:])]

**`sizes`** contains the number of neurons in the respective layers. 

**`biases`** and **`weights`** are randomly initialized. This gives our stochastic gradient descent algorithm a starting place.

To a create **`Network`** object with 784 neurons in the first layer, 15 neurons in the second layer and 10 neurons in the output layer, we'd do this code: 

In [2]:
net = Network([784, 15, 10])

![img](http://neuralnetworksanddeeplearning.com/images/tikz12.png)

### Computing the output of our Network

With this in mind, it's easy to write code to begin computing the output of the **`Network`** class. Let **`σ`** be the **`sigmoid function`**:

$$σ(z)≡\frac{1}{1+e^{-z}}$$


In [3]:
def sigmoid(z):
    1.0/(1.0+np.exp(-z))

In [4]:
net.weights[1][1]

array([-0.228924  ,  1.04213608,  0.32398541, -0.77347495, -1.06165523,
        0.88446953,  0.15387377,  0.84646595, -0.25830617, -1.05593415,
       -0.55326975,  0.35402689,  0.3477687 ,  0.67332864,  1.39453236])

`net.weights[1]` denotes the weights connecting the second and third layer of the network. Let's call that matrix **`w`** for now. Let **`a`** be the vector of activations of the second layer of neurons and let **`b`** be the vector of biases. Let **`a′`** be the **`feed forward function`**:

$$a′ =σ(wa+b)$$

In [5]:
 def feedforward(self, a):
        """Return the output of the network if "a" is input."""
        for b, w in zip(self.biases, self.weights):
            a = sigmoid(np.dot(w, a)+b)
        return a

We then want a way to apply this `feedforward` method. Let's do this using **[stochastic gradient descent](http://alexminnaar.com/deep-learning-basics-neural-networks-backpropagation-and-stochastic-gradient-descent.html)**.

The idea is to use gradient descent to find the weights w<sub>k</sub> and biases b<sub>l</sub> which minize the cost function such that the output from the network approximates y(x) for all training inputs x. In other words, our "position" now has components w<sub>k</sub> and b<sub>l</sub>, and the gradient vector ∇C has corresponding components ∂C/∂w<sub>k</sub> and ∂C/∂b<sub>l</sub>. Writing out the gradient descent update rule in terms of components, we have

$$w_k \rightarrow w_k' = w_k-\frac{\eta}{m}
  \sum_j \frac{\partial C_{X_j}}{\partial w_k}\\$$

$$ b_l  \rightarrow  b_l' = b_l-\frac{\eta}{m}
  \sum_j \frac{\partial C_{X_j}}{\partial b_l}\\$$
  
By repeatedly applying this update rule we can "roll down the hill", and hopefully find a minimum of the cost function. In other words, this is a rule which can be used to learn in a neural network.

In [7]:
def SGD(self, training_data, epochs, mini_batch_size, eta, test_data=None):
    if test_data: n_test = len(test_data)
    n = len(training_data)
    for j in xrange(epochs):
        random.shuffle(training_data)
        mini_batches = [training_data[k:k+mini_batch_size]
                        for k in xrange(0, n, mini_batch_size)]
        for mini_batch in mini_batches:
            self.update_mini_batch(mini_batch, eta)
        if test_data:
            print("Epoch {0}: {1} / {2}".format(
                    j, self.evaluate(test_data), n_test))
        el