# 1c. Training procedures

Now that I've got some backpropagation code, I can start training my net and check that it's actually learning. So the order of the day is:

1. Fix net initialization process to use random weights and biases.
2. Implement a general training procedure that takes a dataset, some training parameters (batch size, etc.) and the neural net to be trained, and applies the backpropagation code from the previous journal to train the net.
3. Implement a general testing procedure to check how the training went.
4. Try this all out on the iris dataset.

## 1. Initialization

Some sources I should have read previously on weight initialization, both from Jason Brownlee / Machine Learning Mastery:
* [Why initialize a neural network with random weights?](https://machinelearningmastery.com/why-initialize-a-neural-network-with-random-weights/)
* [Weight Initialization for Deep Learning Neural Networks](https://machinelearningmastery.com/weight-initialization-for-deep-learning-neural-networks/)

Some standard methods to implement:
* Weights pulled randomly and uniformly from within a small fixed range (e.g. -0.3 to 0.3 or 0 to 1)
* Xavier and normalized Xavier - pull from uniform distribution determined by number of inputs to current neuron (good for sigmoid and tanh neurons, supposedly)

One very silly thing about my implementation is that since I'm using a graph model rather than a set of matrices, and all the neuron links are bidirectional to make feedforward and backpropagation calculations easier, every weight is represented in to places - weight $w^l_{jk}$ shows up in the upstream weights for neuron $j$ in layer $l$ _and_ in the downstream weights for neuron $k$ in layer $l - 1$. So I have to make sure that when I pick a random value for $w^l_{jk}$, it gets updated in both places. This will also be true when we're adjusting weights during training. To make it a bit easier to follow, I'm going to just define a getter and setter for weight $w^l_{jk}$ and add it to the Network class.

In [15]:
# Set path as needed depending on how you're trying to run this
import sys
sys.path.append('/home/dgmorrison/Projects/neural/src/')
from nn import NeuralNet

def get_weight(self, l, j, k):
    if l <= 0 or l >= len(self.layers):  # Note layer 0 has no weights
        return None
    layer = self.layers[l]
    prev_layer = self.layers[l - 1]
    if j < 0 or j >= len(layer) or k < 0 or k >= len(prev_layer):
        return None
    neuron_l_j = layer[j]
    neuron_prev_k = prev_layer[k]
    downstream_w = neuron_prev_k.downstream_weights[j]
    upstream_w = neuron_l_j.upstream_weights[k]
    assert downstream_w == upstream_w
    return downstream_w

def set_weight(self, l, j, k, val):
    if l <= 0 or l >= len(self.layers):  # Note layer 0 has no weights
        return None
    layer = self.layers[l]
    prev_layer = self.layers[l - 1]
    if j <= 0 or j >= len(layer) or k <= 0 or k >= len(prev_layer):
        return None
    neuron_l_j = layer[j]
    neuron_prev_k = prev_layer[k]
    neuron_prev_k.downstream_weights[j] = val
    neuron_l_j.upstream_weights[k] = val
    
NeuralNet.get_weight = get_weight
NeuralNet.set_weight = set_weight

In [37]:
import random

# Weight-picker functions for the initialization strategies listed above
# uniform is second-order to make general initialization consistent (parametrized by lower and upper bounds)
# I miss haskell
def uniform(alpha, beta):
    def f(**kwargs):
        return random.uniform(alpha, beta)
    return f

# xavier looks at the actual net structure
def xavier(**kwargs):
    layer = kwargs["layer"]
    j = kwargs["j"]
    neuron = layer[j]
    n_inputs = len(neuron.upstream_weights)
    alpha = 1 / (n_inputs ** 0.5)
    return random.uniform(-alpha, alpha)

def norm_xavier(**kwargs):
    layer = kwargs["layer"]
    j = kwargs["j"]
    neuron = layer[j]
    n_inputs = len(neuron.upstream.weights)
    layer_size = len(layer)
    root_six = 2.44949
    alpha = root_six / ((n_inputs + layer_size) ** 0.5)
    return random.uniform(-alpha, alpha)

# General initializer
def initialize_weights(net, f):
    for l in range(1, len(net.layers)):
        prev_layer = net.layers[l-1]
        layer = net.layers[l]
        for j in range(len(layer)):
            for k in range(len(prev_layer)):
                w = f(net=net, prev_layer=prev_layer, layer=layer, l=l, j=j, k=k)
                net.set_weight(l, j, k, w)
                

In [39]:
iris_net = NeuralNet()
iris_net.build_from_spec([("identity", 4), ("sigmoid", 5), ("sigmoid", 3)])
initialize_weights(iris_net, norm_xavier)