Before we continue where we left off last week (fine tuning a pre-existing model) I want to give some background on image tasks in deep learning. 

Last weeks model was of a different type but today we will be looking at a fundamental deep learning model which is used in a hand written 
character recognition. You may have heard of the MNIST data set and this is what we will train our model on.

# So what is a Convolutional Neural Network (CNN)?
They are a type of neural network which are useful for processing grid like data. They do so by making use of the Convolution operation.
My way of thinking of the convolution is as reducing noise in the input (in an image we reduce noise and focus on what matters). 

## CNNs have three important properties
- Sparse connectivity
- Parameter sharing
- Equivariant representations

A CNN differs from a standard NN by way of this convolution, a typical NN has fully connected layers which use a matrix multiplication which 
means each input is connected to the output in the next layer along with a seperate parametet (WX + b). Essentially this means that every 
input unit interacts with the output units. However CNNs have <b>sparse connectivity</b>, this is accomplished by the kernel smaller than 
the input. For example, when processing an image it may have millions of pixels, but we can identify small meaningful features such as edges 
in small regions of thousands of pixels. This increases efficiency too, we store fewer parameters and computationally we have less
operations to perform. 

<b>Parameter sharing</b> is a property which arises from the nature of the convolution. In a typical NN, each parameter is only used once
when computing the output of a layer, its multiplied by one element of the input and never revisited. But in a CNN, the parameters are shared
across inputs. This is becuase of the nature of the kernel (filter) and it is applied across all the input data. This property means instead
of learning a bunch of different parameters we learn only one much smaller set of params. 

Lastly, due to parameter sharing we get the final property <b>equivariance to translation</b>. Translation means shifting the input and 
equivariance means any change in the input is the same in the output. This is kind of a confusing property so dont worry if its not too
clear as of yet. An example in image processing: convolution creates a 2D feature map where certain features appear in the input. If we move
the input by a small amount, its representation will move in the output the same amount. So in practise this means that the network will be
able to detect feature no matter where they are in the image, e.g if it detects a cats ear in the bottom left of the image it will also
be able to detect it in the top right (so long as scale and rotation has not changed).

## Pooling - the second key topic in CNNs
Pooling is a type of operation which modifies the output of the feature map. It is very simple, it replaces an area of the feature map with
a summary statistic of its neighbours (e.g. max of 2x2 area or avg of 3x3 area etc). The aim of pooling is to help make the input invariant
to small translations of the input, this is becuase if we translate the input a small amount the values of most of the pooled outputs
do not change.

[Further reading on Convolutional Neural Nets](https://www.deeplearningbook.org/contents/convnets.html)

# Lets take a look at that in pratice!
We will be working on implementing LeNet5 which was created by Yann Lecunn (a pioneer in computer vision) and we will implement the model 
and train it using PyTorch (which is way easier to download that TensorFlow xD)!

LeNet is a very early (1998!!!) deep learning model and one of the first to make use of CNNs. Despite its age the technology is still used
in models today and offers a great insight into how CNNs work and how we can create one ourselves.

To continue please get the paper [Gradient Based Learning Applied to Do cument Recognition](http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf). Here is the model architecure in all it's glory:
![LeNet5](https://www.datasciencecentral.com/wp-content/uploads/2021/10/1lvvWF48t7cyRWqct13eU0w.jpeg)

It employs a variety of techniques to establish it's goal - classifying hand written digits. We will train and test the model
using the MNIST dataset. So lets get started!

In [None]:
# We start off by importing the neccessary libraries