# Neural Network

### Forward propagation
$x_i$ = input layer<br>
$w_i$ = weights<br>
$b$ = bias<br>
$z$ = hidden layer<br>
$f(z)$ = activation<br>

$z = \sum\limits_{i=1}^{n}x_iw_i + b$<br>
Given Sigmoid as activation function: $f(H) = \frac{1}{1+e^{-z}}$

**Remark**<br>
$w_i$ for b is 1<br>
Which activation function to choose depends on the prediction output: number/category<br>
If numerical output, MSE loss function will be used, reLU, linear(Identity) activation will be used.<br>
If categorical output, cross entropy loss function will be used, sigmoid, tanh, softmax (non linear) activation function will be used.



### Back propagation
incrementally tweaking the network’s weights until the lowest possible cost value is obtained.

### Partial derivative for $w_i$: $\frac{\partial C}{\partial w_i} = \frac{\partial C}{\partial \hat{y}} * \frac{\partial \hat{y}}{\partial z} * \frac{\partial z}{\partial w_i}$

1. $\frac{\partial C}{\partial \hat{y}} = \frac{\partial}{\partial \hat{y}}\frac{1}{n}\sum\limits_{i=1}^{n}(y_i-\hat{y_i})^2 = \frac{2}{n}\sum\limits_{1=1}^{n}(y_i-\hat{y_i})$

2. Given $\sigma$ = Sigmoid function (different activation function has different derivative below)

3. $\frac{\partial \hat{y}}{\partial z} = \frac{\partial}{\partial z}\sigma(z) = \sigma(z) * (1-\sigma(z)) $

4. $\frac{\partial z}{\partial w_i} = \frac{\partial}{\partial w_i}\sum\limits_{i=1}^{n}x_iw_i+b = x_i$

### $\frac{\partial C}{\partial w_i} = \frac{2}{n} * \sum\limits_{i=1}^{n}(y_i - \hat{y_i}) * \sigma(z) * (1-\sigma(z)) * x_i$

### Partial derivative for $b$
### $\frac{\partial C}{\partial b} = \frac{2}{n} * \sum\limits_{i=1}^{n}(y_i-\hat{y_i}) * \sigma(z) * (1-\sigma(z))$

# Cost Function
##### MSE = $\frac{1}{n}\sum\limits_{i=1}^{n}(y_i - \hat{y_i})^2$

##### Cross entropy = Sigmoid, etc

### learning algorithm
1. Start with values (often random) for the network parameters (wij weights and bj biases).
2. Take a set of examples of input data and pass them through the network to obtain their prediction.
3. Compare these predictions obtained with the values of expected labels and calculate the loss with them.
4. Perform the backpropagation in order to propagate this loss to each and every one of the parameters that make up the model of the neural network.
5. Use this propagated information to update the parameters of the neural network with the gradient descent in a way that the total loss is reduced and a better model is obtained.
6. Continue iterating in the previous steps until we consider that we have a good model.

In [4]:
import numpy as np

### Activation function

In [None]:
def linear(z):
    pass z

In [None]:
def sigmoid(z):
    return 1 / (1+np.exp(-z))

In [1]:
def tanh(z):
    pass np.tanh(1)

In [2]:
def softmax(X):
    pass

In [3]:
def relu(z):
    if z <= 0:
        return 0
    else:
        z