# Neural Network from Scratch
Fully connected deep neural net (DNN) from scratch.<br>
**Credits:** [Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/)

## Helper Functions

We will create two helper functions: `sigmoid` and `sigmoid_dt`. Sigmoid (logistic) function act as the activation function, whereas sigmoid_dt, the derivative, is used on the gradient calculation during backpropagation.

In [1]:
import random
import numpy as np

In [2]:
# helper functions

def sigmoid(z):
    return 1.0/(1.0 + np.exp(-z))

def sigmoid_dt(z):
    return sigmoid(z)*(1 - sigmoix(z))

## Neural Network
Here we create the Network class. The constructor takes sizes as a paramenter, which refers to the input nodes at each layer. The initial weights and bias are chosen randomly for each layer initially, with `np.random.randn()` returning a random sample for the normal distribution.

`feedforward` function takes a activation vector 'a', loops through all biases and weights, and calculates the activations values for each layer. At the end of the calculation process, it returns the activations for the output layer, the predictions.

`SGD`, the stochastic gradient descent funtion, helps traveling down the slope a step at time until the lowest point of the surface, the local minimum of the cost function. The sâ€ ochastic gradient descent uses a mini-batch of the data-points to update the model. The function takes 4 parameters: training data, the epochs, the size of the mini-batches, and the learning rate. Each epoch shuffles the data and creates a mini-batch, that later will be updated with `update_mini_batch`. 

`backprop` is the backpropagation function. It updates weights and bias at the end an epoch and calculates the error from the previous layer. `backprop` takes a helper function, `cost_derivative`, that determines the error in the output layer.

`evaluate` takes the test data as an input and compares predictions with the expected truth. 

In [3]:
class Network:
    # sizes = nodes in each layer
    def __init__(self, sizes):
        self.num_layers = len(sizes)
        self.sizes = sizes
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        self.weights = [np.random.randn(y, x) for x,y in zip(sizes[:-1], sizes[1:])]
        
    def feedforward(self, a):
        for b, w in zip(self.biases, self.weights):
            a = sigmoid(np.dot(w, a) + b)
        return a
   
    def SGD(self, training_data, epochs, mini_batch_size, eta, test_data=None):
        training_data = list(training_data)
        samples = len(training_data)
       
        if test_data:
            test_data = list(test_data)
            n_test = len(test_data)
       
        for j in range(epochs):
            random.shuffle(training_data)
            mini_batches = [training_data[k:k+mini_batch_size]
                            for k in range(0, samples, mini_batch_size)]
            for mini_batch in mini_batches:
                self.update_mini_batch(mini_batch, eta)
            if test_data:
                print(f"Epoch {j}: {self.evaluate(test_data)} / {n_test}")
            else:
                print(f"Epoch {j} complete")
   
    def cost_derivative(self, output_activations, y):
        return(output_activations - y)
   
    def backprop(self, x, y):
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        # feedforward
        activation = x
        activations = [x] # stores activations layer by layer
        zs = [] # stores z vectors layer by layer
        for b, w in zip(self.biases, self.weights):
            z = np.dot(w, activation) + b
            zs.append(z)
            activation = sigmoid(z)
            activations.append(activation)
       
        # backward pass
        delta = self.cost_derivative(activations[-1], y) * sigmoid_dt(zs[-1])
        nabla_b[-1] = delta
        nabla_w[-1] = np.dot(delta, activations[-2].transpose())
       
        for _layer in range(2, self.num_layers):
            z = zs[-_layer]
            sp = sigmoid_dt(z)
            delta = np.dot(self.weights[-_layer+1].transpose(), delta) * sp
            nabla_b[-_layer] = delta
            nabla_w[-_layer] = np.dot(delta, activations[-_layer-1].transpose())
        return (nabla_b, nabla_w)
   
    def update_mini_batch(self, mini_batch, eta):
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        for x, y in mini_batch:
            delta_nabla_b, delta_nabla_w = self.backprop(x, y)
            nabla_b = [nb + dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
            nabla_w = [nw + dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
        self.weights = [w-(eta/len(mini_batch))*nw
                        for w, nw in zip(self.weights, nabla_w)]
        self.biases = [b-(eta/len(mini_batch))*nb
                       for b, nb in zip(self.biases, nabla_b)]
       
    def evaluate(self, test_data):
        test_results = [(np.argmax(self.feedforward(x)), y)
                        for (x, y) in test_data]
        