Note: Much of this material comes from The Scientist and Engineer's Guide to Digital Signal Processing (free online version: http://www.dspguide.com/pdfbook.htm)

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns

## Neural Networks

The neuron is one of the fundamental cells in the nervous system. The function of a neuron is to act as a node in a relay chain, taking inputs from connected neurons, performing processing on the input and passing it along. These connected neurons are called neural networks and have an (amongst many) interesting propery of being capable of processing poorly defined information effectively. 

<center>**Obligitory figure of a simplified biological neural network**</center>
<img src="http://www.biologymad.com/nervoussystem/summation.jpg">

Artificial Neural Networks (herein called neural networks) are an attempt to mimic that property - albiet currently in a simple form.

## General Structure

Neural networks consist of at least 3 hierarchical layers:

**Input layer (1st layer):**
* Passively receive data.

**Hidden layer (2nd layers):**
* Where the magic happens.

**Output layer (final layer):**
* Profit? Backpropogation. Profit.

**Layers** can be either passive (do not modify data) or active (perform some function).

Each layer consists of one or more nodes.

Each node can be connected to some or all of the nodes down the hierarchy.

Connections are not physical but instead consist of numerical weights.

![](./images/fig1.gif)

Information flow can be either purely **feedforward** (as above) - also known as a Directed Acyclical Graph (DAG)

<img src=./images/fig4.png>

Or feedback on itself (**Recurrent**).
**Recurrent** networks can feedback information to previous layers or within the node itself. This gives the networks a "memory".

<img src=./images/fig5.svg>

Neural networks can also be **densely** connected (all nodes connect to each other) Or **sparsely** connected (take a guess).

<img src=./images/fig6.png>


## Basic method:

Inputs are weighted, passed to the hidden layer, summed, thresholded using an "activation function" (0 / 1) and passed to the output layer where a prediction from the "pattern of activation" in the hidden layer is made.

![](./images/fig2.gif)

The sigmoid function in the above image can be any transformation and is called the **transfer function** / **activation function**.

![](./images/fig3.png)

Activation of a neuron can also be altered by the **bias** - you may know this as the "intercept". Depending on the activation function used, the bias * the function may move the function on the x axis or the y axis.

# Some math:

![](./images/single_hidden_math.jpg)

![](./images/matrix_form.png)

Thanks to [Three Blue One Brown](http://www.3blue1brown.com/)

Neural networks are (like most algorithms you've been exposed to) iterative. They also tend to be supervised, but unsupervised neural networks exist.

### Steps:

1. Randomly set inital weights / bias.
2. Run data through with weights / bias / activation function.
3. On output check error (predicted - observed). 
4. Update weights via backpropagation (tweaking individual weights / bias and checking the effect on the overall loss function gradient).
5. Repeat until convergence (if convergence).
6. Profit.

![](./images/neural_network-w_b.png)

For most machine learning methods we've used there's only been a couple of parameters that need to be set / updated and these govern the entire system.

For neural networks each **weight** (connection) is a parameter and each instance of **bias** for the activation function is a parameter.

How does it do this?

<img src=./images/fig7.png>

**Gradient Decent** with a little help from **Backpropagation**

Backpropagation consists of updating the weights starting from the output layer, working back through the hidden layer then to the input layer using the **partial** derivative of each neuron's error using gradient decent to guide the way.

In English - We know the overall error of the model for a given iteration using a random allocation of weights. We change the value of each weight slightly and monitor the effect on the entire system. If the change in weight sends error downward, it continues in that direction until it doesn't.

Eventually (hopefully) the model converges and we get an accurate prediction.

## TERMS:

* Node
* Weight
* Bias
* Transfer / Activation Function
* Input layer
* Hidden layer
* Output layer
* Feedforward
* Feedback / Recurrent
* Dense
* Sparse
* Partial
* Backpropagation