# Basics of Deep Learning
In this notebook, we will cover the basics behind Deep Learning. I'm talking about building a brain....

![gif of some colours](https://www.fleetscience.org/sites/default/files/images/neural-mlblog.gif)

Only kidding. Deep learning is a fascinating new field that has exploded over the last few years. From being used as facial recognition in apps such as SnapChat or challenger banks, to more advanced use cases such as being used in [protein-folding](https://www.independent.co.uk/life-style/gadgets-and-tech/protein-folding-ai-deepmind-google-cancer-covid-b1764008.html).

In this notebook we will:
- Explain the building blocks of neural networks
- Go over some applications of Deep Learning

## Building blocks of Neural Networks

I have no doubt that you have heard/seen how similar neural networks are to....our brains. 


### The Perceptron

The building block of neural networks. The perceptron has a rich history (covered in the background section of this book). The perceptron was created in 1958 by Frank Rosenblatt (I love that name) in Cornell, however, that story is for another day....or section in this book (backgrounds!),

The perceptron is an algorithm that can learn a binary classifier (e.g. is that a cat or dog?). This is known as a threshold function, which maps an input vector *x* to an output decision *$f(x)$ = output*. Here is the formal maths to better explain my verbal fluff:

$ f(x) = { 1 (if: w.x+b > 0), 0 (otherwise) $


## Gradient Descent Algo

<todo> add info

### Simple Gradient Descent Implementation
with the help from our friends over at Udacity, please view below an implementation of the Gradient Descent Algo.

We begin by defining some functions.

In [1]:
import numpy as np 

def sigmoid(x):
    return 1/(1+np.exp(-x))

# derivation of sigmoid(x)
def sigmoid_prime(x):
    return sigmoid(x)*(1-sigmoid(x))

We begin by defining a simple neural network:
- two input neurons: x1 and x2
- one output neuron: y1

In [2]:
x = np.array([1,5])
y = 0.4

We now define the weights, w1 and w2 for the two input neurons; x1 and x2. Also, we define a learning rate that will help us control our gradient descent step

In [4]:
weights = np.array([-0.2,0.4])
learnrate = 0.5

we now start moving forwards through the network, known as feed forward. We can combine the input vector with the weight vector using numpy's dot product

In [5]:
# linear combination
# h = x[0]*weights[0] + x[1]*weights[1]
h = np.dot(x, weights)

We now apply our non-linearity, this will provide us with our output.

In [6]:
# apply non-linearity
output = sigmoid(h)

Now that we have our prediction, we are able to determine the error of our neural network. Here, we will use the difference between our actual and predicted.

In [7]:
error = y - output

The goal now is to determine how to change our weights in order to reduce the error above. This is where our good friend gradient descent and the chain rule come into play:
- we determine the derivative of our error with respoect to our input weights. Hence:
- change in weights = $ \frac{d}{dw_{i}} \frac{1}{2}{(y - \hat{y})^2}$
- simplifies to = learning rate * error term * $ x_{i}$
- where:
    - learning rate = $ n $
    - error term = $ (y - \hat{y}) * f'(h) $
    - h =  $ \sum_{i} W_{i} x_{i} $ 


We begin by calculating our f'(h)

In [8]:
# output gradient - derivative of activation function
output_gradient = sigmoid_prime(h)

Now, we can calcualte our error term

In [9]:
error_trm = error * output_gradient

With that, we can update our weights by combining the error term, learning rate and our x

In [10]:
#gradient desc step - updating the weights
dsc_step = [
    learnrate * error_trm * x[0],
    learnrate * error_trm * x[1]
]

Which leaves...

In [12]:
print(f'Actual: {y}')
print(f'NN output: {output}')
print(f'Error: {error}')
print(f'Weight change: {dsc_step}')

Actual: 0.4
NN output: 0.8581489350995123
Error: -0.45814893509951227
Weight change: [-0.02788508381144715, -0.13942541905723577]


### More in depth...