# 🤔 What *is* Neural Network? 🤔

# 📚 Intro
Source on Neural Networks: https://enlight.nyc/projects/neural-network/

**Learning objectives:**
* Define *neural network* (or NN)
* Learn **forward** and **backward propagation** in NN
* Implement an NN entirely in ```numpy```

# 🙇‍ Define *neural network*
A computational system inspired by the **structure**, **processing method**, and **learning ability** similar to our biological brains!

### Characteristics of Artificial Neural Networks
* **Many** very simple neuron-like processing elements
* **Many** weighted connection between said elements
* Distributed representation of knowledge over the connections
* Knowledge is acquired by network through a learning process

### *Perceptron*:
* Anything that takes **multiple** inputs and produces **one** output

### *Multi-Layer Perceptron* (MLP):
* A stack of **perceptrons** nested together in what seems to be a roughly organized web.

# 💨 Forward and backward propagation in NN
First off, a **good source**: https://towardsdatascience.com/coding-neural-network-forward-propagation-and-backpropagtion-ccf8cf369f76

### *Forward propagation:*
* The **estimation process** during which NN
    * takes several **inputs**,
    * **processes** these inputs through multiple neurons from **multiple hidden layers**, and then
    * **returns** the result using an **output layer.**
    
### *Backward propagation:*
* The **assignation of weight to neurons**, such that
    * the NN total error (or **cost function**) is minimized.
* After **forward propagation**,
    * the **result** is compared to the **true output.**
    * We want the NN's output to be as close to the true output as possible, so
    * that's when **backward propagation** takes place!

# 🧠 Implement an NN using ```numpy```!
A source: https://www.analyticsvidhya.com/blog/2017/05/neural-network-from-scratch-in-python-and-r/
### Assume the following:
* We want to build and train (*obtain the weights*) of a **Multi-Layer Perceptron** (MLP)
    * such that the given input, ```X=np.array([[1,0,1,0],[1,0,1,1],[0,1,0,1]])```,
    * gives us this desired output: ```y=np.array([[1],[1],[0]])```
* We have only **one hidden layer** (with **three neurons**),
    * and the **activation function** for each **perceptron** is **sigmoid**.

In [1]:
import numpy as np

### Instantiate the input array & (true, not NN's) output array.

In [2]:
X=np.array([[1,0,1,0],[1,0,1,1],[0,1,0,1]])
y=np.array([[1],[1],[0]])

### Sigmoid Function & Derivative of Sigmoid Function

In [3]:
def sigmoid (x):
    return 1/(1 + np.exp(-x))

def derivatives_sigmoid(x):
    return x * (1 - x)

### Initialize alllll the variables!

In [4]:
epoch=5000 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = X.shape[1] #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer

### Initialize weight and bias

In [5]:
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))

### I can't pretend I know what's happening at this point, I'm so confused pls help. So this ```for``` loop is used for forward and backward propagation? Apparently? And it has like a million variables. Not comprehending rn. Is k. Will ask questions. Will deep delve into the Internet

In [6]:
for i in range(epoch):
    #Forward Propogation
    hidden_layer_input1=np.dot(X,wh)
    hidden_layer_input=hidden_layer_input1 + bh
    hiddenlayer_activations = sigmoid(hidden_layer_input)
    output_layer_input1=np.dot(hiddenlayer_activations,wout)
    output_layer_input= output_layer_input1+ bout
    output = sigmoid(output_layer_input)

    #Backward Propagation
    D = y-output
    slope_output_layer = derivatives_sigmoid(output)
    slope_hidden_layer = derivatives_sigmoid(hiddenlayer_activations)
    d_output = D * slope_output_layer
    Error_at_hidden_layer = d_output.dot(wout.T)
    d_hiddenlayer = Error_at_hidden_layer * slope_hidden_layer
    wout += hiddenlayer_activations.T.dot(d_output) *lr
    bout += np.sum(d_output, axis=0,keepdims=True) *lr
    wh += X.T.dot(d_hiddenlayer) *lr
    bh += np.sum(d_hiddenlayer, axis=0,keepdims=True) *lr

print(output)

[[0.98689883]
 [0.97241031]
 [0.03815065]]


# ❗️How would we update weights to minimize the error?

### (1) DEFINE the cost function.
* For our example here, the **MSE** (mean squared error) is our cost function:
    * $E= \frac{1}{2} ({\bf y}_t - {\bf y}_p)^T ({\bf y}_t - {\bf y}_p)$

### (2) UPDATE the weight (${\bf W}_i$ and ${\bf W}_h$) such that the error, $E$, is minimized.
* The most popular algorithm is **Gradient Descent**:
    * ${\bf W}_h = {\bf W}_h + \eta {\partial E}/{\partial {\bf W}_h} $
* For our example above, we can show that:
    * ${\partial E}/{\partial {\bf W}_h} = ({\bf y}_t - {\bf y}_p) {\bf y}_p (1 - {\bf y}_p)\bf {h}$ , where
    * ${\bf h} = \sigma({\bf W}_i {\bf x}_i + {\bf b}_i)$
* In the **actual code** above,
    * $D = {\bf y}_t - {\bf y}_p$
    * ${\bf y}_p (1 - {\bf y}_p)$ = ```slope_hidden_layer```
    * $\bf {h}$ = ```hiddenlayer_activations```