# New net framework

Here's how we could define new neural nets:

**Neural nets are a series of layers that pass information forwards based on what they receive as <em>input</em> and pass information backwards based on what they receive as the <em>loss</em>.**

Let's code this up. For getting me started on this code, I'm indebted to this guy:

<div class="visual">
    <img src="img/andersbll.png">
</div>

[Anders' GitHub](https://github.com/andersbll)

## Basic neural net framework

First, a helper function to set up the layers of the neural net:

In [31]:
def setup_layers(hidden_neurons, outputs):
    layers = []
    for i in range(len(hidden_neurons)):
        layer = FullyConnected(neurons=hidden_neurons[i], activation_function=sigmoid)
        layers.append(layer)

    output_layer = FullyConnected(neurons=outputs, activation_function=sigmoid)
    layers.append(output_layer)
    return layers

### Basic neural net framework

Now, a simple framework for running observations through a neural net:

In [32]:
class NeuralNetwork(object):
    def __init__(self, hidden_neurons, outputs, loss_function):
        self.hidden_neurons = hidden_neurons
        self.outputs = outputs
        self.loss_function = loss_function
        self.layers_setup = False

    def forwardpass(self, X):
        """ Calculate an output Y for the given input X. """
        # If it is our first time doing a forward pass, set up the
        # layers of the network:
        if not self.layers_setup:
            self.layers = setup_layers(self.hidden_neurons, self.outputs)
            self.layers_setup = True

        X_next = X
        for layer in self.layers:
            X_next = layer.fprop(X_next)
        prediction = X_next
        return prediction
    
    def loss(self, prediction, Y):
        """ Calculate the loss on the data and send the 
        result backwards through the net. """
        loss = self.loss_function(prediction, Y)
        return self.loss_function(prediction, Y, bprop=True)

    def backpropogate(self, loss):
        """ Backpropogate the loss through the net. """
        loss_next = loss
        for layer in reversed(self.layers):
            loss_next = layer.bprop(loss_next)
        return loss

## Basic layer definition

Next, let's define what a "layer" should be.

In [33]:
class Layer(object):

    def fprop(self, input):
        """ Calculate layer output for given input (forward propagation). """
        raise NotImplementedError()

    def bprop(self, output_grad):
        """ Calculate input gradient. """
        raise NotImplementedError()

## Fully connected layer

We know that our fully connected layer must have three components:

* `__init__` to set it up
* `fprop` that will take in a layer input and send it forward to the next layer appropriately
* `bprop` that will take in a loss from the following layer and send it backwards through the network appropriately.

### Fully connected layer

In addition, during the forward pass and backpropogation, we use the following abbreviations:

* LI = "Layer Input"
* AI = "Activation Input"
* AO = "Activation Output"
* LG = "Layer Gradient" -> the quantity that a layer is receiving from the layer above it.

In [34]:
class FullyConnected(Layer):
    
    def __init__(self, neurons, activation_function):
        self.n_neurons = neurons
        self.activation_function = activation_function        
        self.iterations = 0
        self.weights_initialized = False

    def fprop(self, layer_input):
        self.LI = layer_input
        
        if not self.weights_initialized:
            self.W = np.random.normal(size=(self.LI.shape[1], self.n_neurons))
            self.weights_initialized = True
        
        self.AI = np.dot(self.LI, self.W)
        return self.activation_function(self.AI, bprop=False)
    
    def bprop(self, layer_gradient):
        
        dAOdAI = self.activation_function(self.AI, bprop=True)
        dLGdAI = layer_gradient * dAOdAI
        dAIdW = self.LI.T

        weight_update = np.dot(dAIdW, dLGdAI)
        W_new = self.W - weight_update
        self.W = W_new
        
        self.iterations += 1
        
        output_grad = np.dot(dLGdAI, self.W.T)
        return output_grad

## Activation functions

We'll need to redefine our functions to have `bprop` option:

In [35]:
def sigmoid(x, bprop=False):
    if bprop:
        return sigmoid(x) * (1-sigmoid(x))
    else:
        return 1.0/(1.0+np.exp(-x))

In [36]:
def mean_square_error(prediction, Y, bprop=False):
    if bprop:
        return -1.0 * (Y - prediction)
    else:
        return 0.5 * (Y - prediction) ** 2

## Defining the net

In [37]:
nn_mnist = NeuralNetwork(
    hidden_neurons=[50],
    outputs=10,
    loss_function=mean_square_error)

## Training

In [38]:
from neural_net import *

In [89]:
def train(net, X_train, Y_train, epochs=5, print=True):
    X_train, Y_train = shuffle_data(X_train, Y_train)
    
    for i in range(epochs):
        one_epoch(net, X_train, Y_train)
        if print:
            print("Done with epoch", i+1)

In [40]:
if train_all:
    train(nn_mnist, X_train, Y_train, epochs=5)

## Does it work?

In [41]:
if train_all:
    accuracy = accuracy(nn_mnist, X_test, Y_test)

Yes...kind of.

## Deep Learning Illustration

Yes, in fact, we can use this framework to do Deep Learning. Let's define a neural net with two hidden layers.

In [42]:
nn_mnist_2 = NeuralNetwork(
    hidden_neurons=[75, 25],
    outputs=10,
    loss_function=mean_square_error)

In [43]:
if train_all:
    train(nn_mnist_2, X_train, Y_train, epochs=3)

In [44]:
if train_all:
    accuracy = accuracy(nn_mnist_2, X_test, Y_test)

Again, it only "kind of" works.

# Deep Learning Tricks

Now we get to the fun part: tuning our deep learning models using the many tricks researchers have discovered increase the performance of said models. 

In this talk, we'll get through as many of these as we can:

* Learning rate tuning
* Learning rate decay
* Varying learning rates by layer
* Learning rate momentum

* Dropout
* Dropconnect

* Weight initializations
* Different activation functions



# Learning rate tuning

<img src="img/bengio.png">

"The learning rate is the single most important hyperparameter and one should always make sure it is tuned."

-[Yoshua Bengio](http://www.iro.umontreal.ca/~bengioy/yoshua_en/)

## Learning rate definition

The learning rate is just a number that we multiply the weight update by during each iteration. So if the learning rate is $\alpha$, the weight update equation for a weight matrix $W$ becomes:

$$ W = W - \alpha * \frac{\partial l}{\partial W}$$

## Coding this up

We'll modify the `bprop` function within the `FullyConnected` class, we'll add the learning rate to the weight update:

In [90]:
class FullyConnectedLR(FullyConnected):
    
    def bprop(self, layer_gradient):
        
        dAOdAI = self.activation_function(self.AI, bprop=True)
        dLGdAI = layer_gradient * dAOdAI
        dAIdW = self.LI.T

        weight_update = np.dot(dAIdW, dLGdAI)
        W_new = self.W - self.learning_rate * weight_update
        self.W = W_new
        
        self.iterations += 1
        
        output_grad = np.dot(dLGdAI, self.W.T)
        return output_grad

### Coding this up

We'll modify the a new `setup_layers` function to give each layer a learning rate: 

In [91]:
def setup_layers(hidden_neurons, outputs, learning_rate=1.0):
    layers = []
    for i in range(len(hidden_neurons)):
        layer = FullyConnectedLR(neurons=hidden_neurons[i], activation_function=sigmoid)
        setattr(layer, "learning_rate", learning_rate)
        layers.append(layer)

    output_layer = FullyConnectedLR(neurons=outputs, activation_function=sigmoid)
    setattr(output_layer, "learning_rate", learning_rate)
    layers.append(output_layer)
    return layers   

### Coding this up

In the `NeuralNetwork` class, we change `__init__` to add a learning rate and `forwardpass` to add the new `setup_layers` function:

In [92]:
class NeuralNetworkLR(NeuralNetwork):
    def __init__(self, hidden_neurons, outputs, loss_function, learning_rate):
        NeuralNetwork.__init__(self, hidden_neurons, outputs, loss_function)
        self.learning_rate = learning_rate

    def forwardpass(self, X):
        """ Calculate an output Y for the given input X. """
        
        if not self.layers_setup:
            self.layers = setup_layers(self.hidden_neurons, 
                                       self.outputs, 
                                       self.learning_rate)
            self.layers_setup = True

        X_next = X
        for layer in self.layers:
            X_next = layer.fprop(X_next)
        prediction = X_next
        return prediction

## API

In [101]:
def net_accuracy(net, X_test, Y_test):
    P = net.forwardpass(X_test)
    preds = [np.argmax(x) for x in P]
    actuals = [np.argmax(x) for x in Y_test]

    accuracy = sum(np.array(preds) == np.array(actuals)) * 1.0 / len(preds)
    print("Neural Net MNIST Classification Accuracy:", round(accuracy, 3) * 100, "percent")
    return accuracy

In [102]:
def accuracy_net_lr(learning_rate):
    nn_mnist_lr = NeuralNetworkLR(
    hidden_neurons=[75, 25],
    outputs=10,
    loss_function=mean_square_error, 
    learning_rate=learning_rate)
    
    train(nn_mnist_lr, X_train, Y_train, epochs=2, print=False)
    
    return net_accuracy(nn_mnist_lr, X_test, Y_test)

In [103]:
learning_rates = np.arange(0.1, 1.6, 0.1)
accuracies = [accuracy_net_lr(lr) for lr in learning_rates]

Neural Net MNIST Classification Accuracy: 90.6 percent
Neural Net MNIST Classification Accuracy: 92.5 percent
Neural Net MNIST Classification Accuracy: 93.2 percent
Neural Net MNIST Classification Accuracy: 92.3 percent
Neural Net MNIST Classification Accuracy: 92.0 percent


KeyboardInterrupt: 