# Lecture 7: How to train your network

Neural networks must be trained with (loads of) labelled data before they can produce more than just random noise. Once we have compiled our training data, the established training algorithm works as follows:

- Feed-forward a member of the training set
- Calculate the loss, i.e. the difference between the target output and the actual output
- Back-propagate the loss into the network to determine each weight's contribution to the total loss
- Apply an optimization algorithm (e.g. gradient-descend) to modify the network weights

In our custom neural network, the task is carried out by the method ``train`` that takes a training image and a target vector as input:

```Python
    def train(self, imgArr, target):
        # Transform the image to a vector
        inputs = imgArr.flatten()
        # Move signal into hidden layer
        hiddenInputs = np.dot(self.wih, inputs)
        # Apply the activation function
        hiddenOutputs = self.actFunc(hiddenInputs)
        # Move signal into output layer
        outputs = np.dot(self.who, hiddenOutputs)
        # Apply the activation function
        prediction = self.actFunc(outputs)
        # output layer error is (target-actual)
        outputErrors = target - prediction
        # for the hidden layer error, we need to invert who
        whoT = self.who.T
        whoSq = np.dot(self.who,whoT)
        whoInv = np.linalg.inv(whoSq)
        whoInv = np.dot(whoT,whoInv)
        hiddenErrors = np.dot(whoInv, outputErrors)
        # update the weights between the hidden and output layer
        err = outputErrors*prediction*(1.-prediction)
        self.who += self.lRate * np.dot(err[:,np.newaxis], hiddenOutputs[np.newaxis,:])
        # update the weights between the input and the hidden layer
        err = hiddenErrors * hiddenOutputs * (1.-hiddenOutputs)
        self.wih += self.lRate * np.dot(err[:,np.newaxis], inputs[np.newaxis,:])
```

Let's feed a training image into the network. For that, we first need to create an instance of the network class:

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from network import neuralNetwork

iNodes = 784 # The images are 28x28 pixels
hNodes = 100 # An educated guess
oNodes = 10 # Ten digits

lRate = 0.6

testNet = neuralNetwork(iNodes, hNodes, oNodes, lRate) # Create an instance of the network

print(testNet)

Input nodes: 784 
Hidden nodes: 100 
Output nodes: 10 
Learning rate: 0.6 
wih matrix shape: (100, 784) 
who matrix shape: (10, 100)


Now we can load the test image and label arrays back into our notebooks:

In [2]:
data = np.load("mnistDataTest.npy")
labels = np.load("mnistLabelsTest.npy")
n = len(labels)

print(data.shape, data.dtype)
print(labels.shape, labels.dtype)
print(data[0])

(10000, 28, 28) float64
(10000,) int32
[[0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.   

To tell the network which output we expect, we need to prepare the target vector. As we have ten output nodes, our target vector needs ten entries. In an ideal world, a perfect network would produce a perfect output consisting only of zeros and a single one at the position given by the label. For instance, if we feed in an image of the digit 2, we would love to see an output like

``[0, 0, 1, 0, 0, 0, 0, 0, 0, 0]``

but because of certain similarities all digits share (certain pixels are black in almost every image), we introduce a tiny error margin by setting the target vector to

``[0.01, 0.01, 0.99, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01]``:

In [3]:
idx = 0 # Index of some image (to avoid magic numbers ...)
lbl = labels[idx] # Get the label of the first image

target = np.zeros(10, dtype='float') + 0.01 # Set the target vector
target[lbl] = 0.99

Before feeding in images into the network, we need to make sure that our data is as economic as possible. For a neural network, every value that is larger than 0 is considered as signal that must be learned and depending on the image format, 255 may be white and 0 black. Here, the data has already been optimised to account for signal economy, but at times it may be necessary to invert the images:

```python
for i in range(n):
    data[i,:,:] = 1.-data[i,:,:]   
print(data[0,:,:])
```

All what's left is a single line of code to feed in an image and improve the network:

In [4]:
testNet.train(data[idx,:,:], target)

Of course, a single image does not suffice. Instead, we need to train the network with thousands of handwritten images and their respective labels to get reasonable results - a task that has been outsourced to the problem sheet :-)