## Download MNIST Data, loader

In [0]:
!wget https://github.com/HariharasudhanAS/HandcraftedNets/raw/master/data/mnist.pkl.gz

--2019-10-21 05:17:53--  https://github.com/HariharasudhanAS/HandcraftedNets/raw/master/data/mnist.pkl.gz
Resolving github.com (github.com)... 192.30.255.112
Connecting to github.com (github.com)|192.30.255.112|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/HariharasudhanAS/HandcraftedNets/master/data/mnist.pkl.gz [following]
--2019-10-21 05:17:54--  https://raw.githubusercontent.com/HariharasudhanAS/HandcraftedNets/master/data/mnist.pkl.gz
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 17051982 (16M) [application/octet-stream]
Saving to: ‘mnist.pkl.gz’


2019-10-21 05:17:54 (145 MB/s) - ‘mnist.pkl.gz’ saved [17051982/17051982]



In [0]:
!wget https://raw.githubusercontent.com/HariharasudhanAS/HandcraftedNets/master/mnist_loader.py

--2019-10-21 05:23:26--  https://raw.githubusercontent.com/HariharasudhanAS/HandcraftedNets/master/mnist_loader.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3485 (3.4K) [text/plain]
Saving to: ‘mnist_loader.py’


2019-10-21 05:23:26 (75.0 MB/s) - ‘mnist_loader.py’ saved [3485/3485]



## Theory

Representation of a single neuron

![Neuron](http://neuralnetworksanddeeplearning.com/images/tikz9.png)

If x1, x2, x3 are inputs, we have weights w1, w2, w3 and bias b.

Output z is calculated as z = σ(w⋅x+b) where σ represents the sigmoid function.


```
z = w1*x1 + w2*x2 + w3*x3 + b
```

 
<img src="https://hvidberrrg.github.io/deep_learning/activation_functions/assets/sigmoid_function.png" alt="Sigmoid Fn" width="300" height="200"> 
 

The sigmoid function is used to bring in non-linearity to the network.

<img src="http://neuralnetworksanddeeplearning.com/images/tikz11.png" alt="Multi-layer NN" width="400" height="200"> 

Other resources:

[3Blue1Brown YouTube series](https://www.3blue1brown.com/neural-networks)


[Nielsen's online NN book](http://neuralnetworksanddeeplearning.com)

[cs231n backprop tutorial](http://cs231n.github.io/optimization-2/)


## Neural Network forward and backward pass

![..](https://i.imgur.com/h2z7NDB.png)

## Implementation

In [0]:
import numpy as np
import random

Complete the following class

In [0]:
class Network(object):

    def __init__(self,sizes):
        # number of layers
        self.num_layers = len(sizes)
        # size of each layer as a list
        self.sizes = sizes
        # creates biases for all layers except the first layer
        self.biases = [np.random.randn(y,1) for y in sizes[1:]]
        # matrix of weights for
        self.weights = [np.random.randn(y,x) for x, y in zip(sizes[:-1], sizes[1:])]

    def feedforward(self, a):
        
        for b, w in zip(self.biases, self.weights):
            a = '''Fill here'''
        
        return a

    def SGD(self, training_data, epochs, mini_batch_size, eta, test_data=None):
        
        if test_data:
            n_test = len(test_data)
        
        n = len(training_data)
        
        for j in xrange(epochs):
            random.shuffle(training_data)
            mini_batchs = [training_data[k:k+mini_batch_size] for k in xrange(0,n,mini_batch_size)]
            
            for mini_batch in mini_batchs:
                self.update_mini_batch(mini_batch, eta)
            
            if test_data:
                print ("Epoch {0} : {1} / {2}".format(j, self.evaluate(test_data), n_test))
            else:
                print("Epoch " + j +" complete")

    def update_mini_batch(self, mini_batch, eta):
        
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        
        for x,y in mini_batch:
            delta_nabla_b, delta_nabla_w = self.backprop(x,y)
            nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
            nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
        
        self.weights = [w-(eta/len(mini_batch))*nw for w, nw in zip(self.weights, nabla_w)]
        
        self.biases = [b-(eta/len(mini_batch))*nb for b,nb in zip(self.biases, nabla_b)]

    def backprop(self,x,y):
        
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]

        activation = x
        activations =[x]
        zs = []
        
        for b, w in zip(self.biases, self.weights):
            z = np.dot(w, activation)+b
            zs.append(z)
            activation = sigmoid(z)
            activations.append(activation)
        delta = self.cost_derivative(activations[-1],y) * sigmoid_prime(zs[-1])
        nabla_b[-1] = delta
        nabla_w[-1] = np.dot(delta, activations[-2].transpose())
        
        for l in xrange(2, self.num_layers):
            z = zs[-l]
            sp = sigmoid_prime(z)
            delta = np.dot(self.weights[-l + 1].transpose(), delta) * sp
            nabla_b[-l] = delta
            nabla_w[-l] = np.dot(delta, activations[-l - 1].transpose())
        
        return (nabla_b, nabla_w)

    def evaluate(self, test_data):
        
        test_results = [(np.argmax(self.feedforward(x)), y) for (x, y) in test_data]
        
        return sum(int(x == y) for (x, y) in test_results)

    def cost_derivative(self, output_activations, y):
        
        return (output_activations-y)


###
def sigmoid(z):
    
    return 1.0/(1.0+np.exp(-z))

def sigmoid_prime(z):
    
    return(sigmoid(z)*(1-sigmoid(z)))

In [0]:
import mnist_loader

training_data, validation_data, test_data = mnist_loader.load_data_wrapper()

net = network.Network([784, 30, 10]) #Architecture

net.SGD(training_data, 30, 10, 3.0, test_data=test_data)