# Elementary Neural Networks

In the introduction notebook we clarified what precisely we mean by a neural network and some key terms such as *node, neuron, activiation function*. In this notebook we will begin to build more complicated and useful neural networks. The first neural network is a historical landmark example which provided the basis for not only a useful network topology, but also a working mathematical of memory: The Hopfield Network. We will then cover the multilayer perceptron in more detail. The key learning outcomes of this notebook are: understanding how neural networks work in practice; developing a working model of memory; understanding the blueprint model for modern neural networks.

## 1.0 Hopfield and Tank: Ascociative Memory

The Hopfield and Tank network was a landmark development in biology and Deep Learning: for the first time it gave a working model of asscociative (non-addressable) memory as well as a generic method to train a neural network to classify objects which was, in some sense, provably reliable. The network can be formulated in a very biological minded fashion which is helpful because it allows us to not only develop useful comptutational tools, but also draw biological insight.

The task the network aims to solve is: "Given a set of classifiable input data, map the data to their classification labels". We will assume the data is encoded in a vector $v \in \mathbb{R}^n$ and the classification labels in a vector $u \in \mathbb{R}^d$. Given the association $v_i \rightarrow u_i$, we aim to construct a matrix of weights $W$ and a vector of biases $b$ such that for a given activation function $f(W * v_i + b) = u_i$.

Let's simplfy our problem a little bit. We know that the vector $u$ contains class labels and a natural way to encode this would be to let $d$ be the number of classes: we can then associate every unit vector with a class e.g. (1,0,0) => apple, (0,1,0) => banana, (0,0,1) => pear. The simplest way to find a unit vector from our weights and biases estimate is to take the coordinate with the maximum value (convince yourself this is true for all monotonic activation functions). We can further reduce the problem by assuming that $b = \vec{0}$. This allows us to think about the problem just in terms of the $W$ vector.

### Training
Now, we know that the brain uses the Hebbian rule to learn which is often colloquially summarised as "neurones that fire togther, wire together". What this means is that neurones with strongly correlated activity patterns will have high connection weights. What does this mean for us? Well, we definitely want our network to identify the labelled data and each labels must be strongly correlated with itself. The Hebb rule then dictates for a single label $v_i$ we should examine its autocorrelation $v v^T$. Fortunately, this is a matrix! If we assume that all the data are independant then the natural thing to do is to sum them all up! Thus, we arrive at:

$$ W = \sum_{i=1}^{|\text{data}|} v_i v_i^T $$ 

In [None]:
# Insert Hopfield and Tank here

We now need to put the class labels in the right place and for this we use our labelled training data. This is nothing more than a simple hashing routine.

In [None]:
# Insert hashing routine

### Regression

We now have trained our first neural network! How do we use it? 

## 2.0 Multilayer Perceptrons: Forming the modern blueprint

In the previous notebook we encountered a challenge with the perceptron: it could not model the XOR gate. This was a susbtantial blow to early mathematical neuroscience as models of the brain needed to be able model logic (early thinking around AI assumed that the brain operated on primitive logical operations, like a computer; it doesn't). We now will demonstrate that we can solve our XOR problem with a multi-layered neural network. First, try this as an exercise!

In [None]:
# Insert XOR here