## 3. Getting Started with Neural Networks

### Anatomy of a Neural Network

Some key concepts of neural networks are:

- <b>layers</b> stacked together to form a <b>network</b>
- the <b>input data</b> and corresponding <b>targets / responses</b>
- the <b>loss function</b> which determines the performance of the model
- the <b>optimiser</b> which determines how the learning proceeds

The model below illustrates the relationship between each of the elements.

<img src="img1.png" width="600"/>

The network is made up of layers stacked together. Input data is fed to the network for training, and the network uses the loss function to determine the performance of its predictions against the labels. The optimiser then kicks in to tune the model parameters/weights for better predictive power on the training data.

In [1]:
from keras.datasets import mnist

from keras import models
from keras import layers
from keras.utils import to_categorical

Using TensorFlow backend.


#### Layer
The layer is a fundamental data structure of a neural network. It is a data processing module that takes an input as one or more tensors, and outputs one or more tensors. Some layers are stateless but most have a state, the layer's <b>weights</b> that are learnt by SGD.

Different layers are used for different tensor formats. 
- Simple vectors stored as 2D tensors use <b>densely connected / fully connected / dense</b> layers
- Sequence data stored as 3D tensors use <b>recurrent layers</b> like an LSTM layer
- Image data, stored as 4D tensors is usually processed by 2D <b>convolution layers</b>

Layers can only be connected if they are mutually compatible. Specifically, each layer only accepts tensors of a certain shape and output tensors of a certain shape.

In [3]:
l1 = layers.Dense(32, input_shape=(784,))

In the example above, we created a Dense layer where that only accepts inputs as 2D tensors where the first dimension is 784. The layer will return a tensor where the first dimension has been transformed to be 32.

The downstream layer can only accept 32-dimensional vectors as inputs.

#### Model: A network of layers
A deep-learning model is a directed, acyclic graph of layers. The most common instance is a linear stack of layers, mapping a single input to one output. But other networks exist, like two-branch networks, multihead networks and inception blocks. 

The topology of a network defines a hypothesis space. By choosing a network topology, we constraint the space of possibilities to a specific series of tensor operations, mapping input data to output data.

#### Loss Function & Optimiser
The loss function is the value to be minimised during training. It is the performance measure used to determine if the model is a good predictor or not

The optimiser determines how the network is updated based on the loss function. It implements a variant of SGD.

A neural network that has multiple outputs may have multiple loss functions but the gradient descent process must be based on a scalar loss value So for multiloss networks, all losses are combined by avaraging to a single scalar value.