# Deep Learning

Deep Learning is a subset of learning of Machine Learning which itself may be considered a subset of Statistics. In its essence Deep Learning aims to achieve matching between two statistical distributions through a powerful graphical structure: the *neural network*. The deep in Deep Learning refers to the number of layers in the graph, and this will become clearer as we proceed through the course, but has to come to more generally imply the size of the models being used.

This course will address several topics: we will cover the basic theory and history of neural networks; the primitive structures and functions found in a neural network; how to train a neural network; key methods of optimising training; different computational frameworks to work in; and several modern network topologies and the situations where they are best used. These topics will be addressed through the lecture notes and a series of workbooks which can be worked through:

1. Introduction and Basics
2. Hopfield Networks, and Multi-Layer Perceptrons
3. Gradient descent, accelerated descent, and regularisation.
4. Flux: Deep Learning in Julia
5. PyTorch, Keras, and Tensor Flow: Deep Learning in Python
6. Recurrent Neural Networks
7. Convolutional Neural Networks
8. Variational Autoencoders
9. Generative Adversial Networks
10. Transformer Networks
11. Graph Neural Networks and beyond...

This notebook will cover the basics: the biology, some of the basic building blocks, some essential terms and concepts, and some history. By the end of the notebook we should understand what a neuron is and why collections of model neurons may be able to peform learning tasks. We will code our own learning neuron: the perceptron.

## Where are we going?

Before we start the course in earnest, we will give a simple example about what we are aiming for: how a neural network can look and what it might do. The cell block below contains code to load in a database and train a neural network to perform classification on unseen data *from scratch*. If you have a GPU it's recommended that you run the function with the keyword ``train=true``. If you do not, don't worry, the code will load in the necessary pre-trained parameters and you can imagine that it was just trained very quickly.

In [None]:
using Flux, Plots, 

function example_classifier(training_data, training_labels, nepochs; train=false)

    if train 
        network = Chain(Conv, Conv, Conv) |> gpu
        loss(x, y) = 
        grads = gradient(loss, )
        opt = ADAM()


        @showprogress for t = 1:(nepochs*length(data))
            train!()
        end
    else
        load()
    end
    
    function classifier(datum, class_labels)
        probs = softmax(network(dataum))
        predictor, predict_prob = findmax(probs)
        return datum, class_labels[predictor], "Prediction probability: $(predict_prob)"
    end
    
    return classifier
end

train_data = load()
train_labels = load()
nepochs = 100

predict = example_classifier(train_data, train_labels, nepochs; train=true)

In [None]:
test_data = load()
test_labels = load()
for i in test_data
    println(predict(i))
end

## Biology

Deep Learning is an offshoot of Theoretical Neuroscience: a field dedicated to modelling and explaining brain related phenomena. Its fundamental units, neurons, and fundamental topographic structure, networks, come together to form the principal tool in the deep learning playbook: neural networks. The etymology of the words suggest a relationship to physical structures found in the brain and this is no accident. Deep Learning has borrowed heavily from insights generated by explanations of experiments performed on the brain.

### Neurons

For many, the neuron is the fundamental brain unit: an analogy might be to the atom in chemistry. This is a rather simplistic view that is largely wrong, but it is nevertheless a useful starting point to begin developing a mental model of how the brain may work. We will present a somewhat simplified view of neurons to begin with.

At its most primitive level a neuron is nothing more than a specialised *cell* that specialises in its ability to develop long-range connections and communicate with other cells. It can be described with a few fundamental structures: the soma (cell body), the axon, and dendrites. These three structures work together to generate and communicate signalling patterns in the form of events called *action potentials* or *spikes*. 

An action potential is a wave of change in the membrane voltage of a cell. When a cells membrane voltage releases a certain level it opens gated ionic channels which cause a rapid increase in the potential up until a peak where it decays and enters a refractory period; see [Image x]. This wave of voltage modulates the voltage in the membrane patch directly next to it which allows the action potential to be *transmitted* from the soma down the axon and to another cell. The axon terminates at a location called the *synapse* which bridges the axon of one cell with the dendrites of another cell. When the action potential reaches the synapses it triggers a release of vesicles known as *neurotransmitters* which can modulate the potential at the corresponding dendrite. This can be up-modulation or excitatory causing the neighbouring cell to be more likely to fire, or down modulation or *inhibitory* causing the neighbouring cell to be less likely to fire. Neurons are often referred to as excitatory/inhibitory for this reason. Therefore, in tandem these three structures work (in conjunction with other electrical inputs such as stimulus from the sensory organs) to modulate spiking in themselves and other neurons. These patterns of spiking convey information and perform computation e.g. a high spiking rate in a muscular neuron might cause a muscle to contract.

Neurons are themselves categorised into specialised subdivisions. The beginning of this classification is often regarded as the beginning of modern neuroscience with Raman y Cajal producing a series of beautiful drawings of stained neurons. There are 


### Networks


1. Classification
2. Regression
    * Linear Regression
    * Weights and Biases
3. Models
4. Activation functions

## Mathematical Concepts


### Models

A model is simply a reduced explanation of some phenomena that generates insight about that phenomena. We can choose any format to outline our model in: words, equations, computational routines. Mathematical and computational models are desirable because they are *precise*. There is abosuletly no ambiguity about what they mean.

### Neurone Models

#### Biological Models

#### Integrate and Fire Models

#### Poisson Models

#### Activation functions

### Network Models

### Statistical Models

### Classification, Regression, and Generation


#### Classification

#### Regression

#### Generation


## Perceptron

We now would like to unify our biology and our statistical modelling paradigms. We note that, in a simplisitic sense, a neurons firing rate output can be modelled with a logisitic function. It can therefore be thought of as performing a classification, or some form of logisitic regression. We also note that a neurons output is dependent on the sum of its weighted inputs that arrive through the dendrites and some internal resting state. We can call these dendritic weights the weights $W$ and the resting points the biases $b$. We can write this as a function:

$$ u_i = \sigma\left(\sum_jW_{ij} v_j + b_i \right) $$

with $\sigma$ being the activation function and $v_j$ being the input to the network. When the activation function is a simple thresholding function (i.e. 1 above the threshold and 0 otherwise) we refer to it as a *perceptron* and it is one of the earliest forms of neurally inspired machine learning models. We can draw an immediate analogy to our statistical model and say that the neuron is performing the role of a classifier: when the firing threshold is crossed the neuron is activated and we classify the input as a different type. By tuning the dendritic weights and the baseline rate we can do biological statistics. 

Assume that the data for the input $v_j$ is $d_i$. The weights are tuned according to a very simple rule:

$$W_{ij}(t+1) = W_{ij}(t) + r (u_i - d_i) v_j$$.

If you have done a course in linear regression you might immediately realise this as a gradient descent on the square of the errors i.e. we are minimising the mean squared error of the classification data. Alternatively, if you are familiar with biological learning you might understand this as a proxy of a expectation-reward scheme: when a neuron is presented with a divergent output to what is expected neurotransmitters are released which change the weights to move the output closer to what is expected. We can naturally extend this definition to a series of outputs $i \in 1:N$ and perform the classification in a higher dimensional decision space. These procedures extend analagously. 

We therefore can see that the neurone (or perceptron) provides a powerful classification or regresion scheme with a natural biological motivation. It unifies very well with our models of linear regressors and classifiers and can be understood through a well-studied statistical lense. Let's train a perceptron on a simple classification task that an early human might have had to learn:

In [38]:
using Plots
domestic_class = ['🐈', '🐀', '🐔', '🐕']
wild_class = ['🦥', '🐗', '🦊', '🦓', '🦘', '🦉', '🦄']
domestic = [[0.740562, 0.74002, 1.15348, 0.773644], [0.32662, 0.835877,  1.04288,  1.04741]]
wild = [[3.17844  3.98878  3.19889  3.92041  3.10834  3.55384  3.37955], [4.60585  3.96469  4.68477  3.40337  3.21203  4.63704  4.6018]]


2-element Vector{Matrix{Float64}}:
 [3.17844 3.98878 … 3.55384 3.37955]
 [4.60585 3.96469 … 4.63704 4.6018]

In [51]:
unicodeplots()
p = plot(; title="Animals")
for i in 1:length(domestic[1])
    annotate!(p, (domestic[1][i], domestic[2][i], text(domestic_class[i])))
end
p

InexactError: InexactError: trunc(UInt16, 128008)

In [44]:
Pkg.add("UnicodePlots")

[32m[1m    Updating[22m[39m registry at `~/.julia/registries/General.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m   Installed[22m[39m MarchingCubes ──── v0.1.4
[32m[1m   Installed[22m[39m ConstructionBase ─ v1.4.1
[32m[1m   Installed[22m[39m Unitful ────────── v1.12.0
[32m[1m   Installed[22m[39m FreeType ───────── v4.0.0
[32m[1m   Installed[22m[39m FileIO ─────────── v1.16.0
[32m[1m   Installed[22m[39m UnicodePlots ───── v3.1.6
[32m[1m    Updating[22m[39m `~/.julia/environments/v1.8/Project.toml`
 [90m [b8865327] [39m[92m+ UnicodePlots v3.1.6[39m
[32m[1m    Updating[22m[39m `~/.julia/environments/v1.8/Manifest.toml`
 [90m [187b0558] [39m[92m+ ConstructionBase v1.4.1[39m
 [90m [5789e2e9] [39m[92m+ FileIO v1.16.0[39m
 [90m [b38be410] [39m[92m+ FreeType v4.0.0[39m
 [90m [299715c1] [39m[92m+ MarchingCubes v0.1.4[39m
 [90m [b8865327] [39m[92m+ UnicodePlots v3.1.6[39m
 [90m [1986cc42] [39m[92m+ Unitful v1.12.

In [43]:
using Pkg