# Neural Network Cheat Sheet
Turning the calculus into code for a Neural Network with 1 Hidden Layer

# Part 1: Calculate the Forward Pass

## Step 1: Calculate the Linear Combination for the Inputs and Weights Layer #1

### Linear Combination Equation:

$$\large \Sigma^m_1(w_i*x_i)+b=h_{(1)}$$

 $m=\,total\,number\,of\,inputs$
 
 $w =\,weight\,layer\,1$
 
 $x =inputs\,to\,neural\,network$
 
 $b=bias$ (Optional)
 
 $\Sigma=sum$
 
 $i=iteration$
 
Therefore $\Sigma^m_1(w_i*x_i)=the\,sum\,of\,the\,product\,of\,weight\,layer\,one * input\,to\,the\,neural\,network\,for\,each\,iteration\,in\,range\,1\,to\,the\,total\,number\,of\,inputs$

$h_{(1)}=\,input\,to\,hidden\,layer\,1$






In [None]:
## In code, looks like this:
import numpy as np
input_to_hidden_layer_1 = np.dot(weight_layer_one,inputs_to_neural_network) + bias #np.dot() = numpy dot product function
# or:
h_one = np.dot(w,x) + b

## Step 2: Define the Activation Function for the Hidden Layer (aka Activation Function Node, Perceptron or neuron) and Calculate the Output 

Common Types of Activation Functions: Heaviside Step, Logistic (Sigmoid), tanh and softmax

### Logistic (Sigmoid) Function

$$\large f(h_{(1)})=sigmoid(h_{(1)})=\frac{1}{(1+e^{-h_{(1)}})}= k_{(1)}$$

$f=activation\,function(sigmoid)$

$h_{(1)}=\,input\,to\,hidden\,layer\,one$

$e=\,Euler's\,Number\,(an\,irrational\,number,\,approximately\,2.718281)$

$k_{(1)}=output\,of\,hidden\,layer\,one$


In [3]:
## In code, looks like this:
import numpy as np
def sigmoid(input_to_hidden_layer_one):
    return 1 / (1 + np.exp(-input_to_hidden_layer_one)) # np.exp = numpy exponential function
output_of_hidden_layer_one = sigmoid(input_to_hidden_layer_one)
# or:
def sigmoid(h_one):
    return 1 / (1 + np.exp(-h_one)) 
k_one = sigmoid(h_one)
# Commonly, x is used as the parameter in the definition of the sigmoid function
# however, I used h as this function is used on the input h(n), not on the initial inputs (x)
    

## Step 3: Calculate the Linear Combination for the Output of Hidden Layer #1 and Weights Layer #2

For each new layer, the output of the previous hidden layer becomes the input to the current hidden layer, and there is no longer a bias factor:

Therefore the Linear Combination Equation becomes:

$$\large \Sigma^{m}_1(W_i*k_{(1)i})=h_{(2)}$$

$m=number\,of\,nodes\,in\,hidden\,layer\,one$

$i=iteration$

$W= weight\,layer\,2$

$k_{(1)}= output\,of\,hidden\,layer\,one$

Therefore: $\Sigma^p_1(W_i*k_{(1)i})=the\,sum\,of\,the\,product\,of\,weight\,layer\,2 * the\,output\,of\,hidden\,layer\,one\,for\,each\,iteration\,in\,range\,1\,to\,the\,total\,number\,of\,nodes\,in\,hidden\,layer\,one$

$h_{(2)}= input\,to\,ouput\,layer$


In [4]:
# In code, looks like this:
input_to_output_layer = np.dot(weight_layer_two, output_of_hidden_layer_one)
#or:
h_two = np.dot(W,k_one)

## Step 4: Calculate the Activation Function for the Output Layer


$$\large f(h_{(2)})=sigmoid(h_{(2)})=\frac{1}{(1+e^{-h_{(2)}})}= \hat y$$

$\hat y = output\,of\,the\,neural\,network = prediction$

In [None]:
#In code, looks like this:
output_of_neural_network = sigmoid(input_to_output_layer)
#or:
prediction = sigmoid(h_n)

# Part 2: Calculate the Backwards Pass

## Step 1: Calculate the Error Gradient for Weight Layer #2

$$\large \delta_{(W)}=(y-\hat y)* \hat y * (1 - \hat y)$$

$\delta_{(W)} = Error\,gradient\,for\,weight\,layer\,2$

$y = target\,values$

$\hat y = output\,of\,the\,neural\,network = prediction$

In [None]:
# In code, looks like:
Error_gradient_weight_layer_two = (target_values - output_of_neural_network) * output_of_neural_network * (1 - output_of_neural_network)
# or:
Err_grad_W = (y - prediction) * prediction * (1 - prediction)

## Step 2: Calculate Error Gradient for Weight Layer #1

$$\large \delta_{(w)}=W_i*\delta_{(W)} * k_{(1)} * (1-k_{(1)})$$

$\delta_{(w)} = Error\,gradient\,for\,weight\,layer\,1$

$W= weight\,layer\,2$

$\delta_{(W)} = Error\,gradient\,for\,weight\,layer\,2$

$k_{(1)}= output\,of\,hidden\,layer\,one$

In [None]:
# In code, looks like:
Error_gradient_weight_layer_one = weight_layer_two * Error_gradient_weight_layer_two * output_of_hidden_layer_one * (1 - output_of_hidden_layer_one)
# or:
Err_grad_w = W * Err_grad_W * k_one * (1 - k_one)

## Step 3: Calculate the Weight Change Step for Weight Layer #2

$$\large \Delta W = \eta *\delta_{(W)}*k_{(1)}$$

$\Delta W = weight\,change\,step\,for\,weight\,layer\,2$

$\eta = learning\,rate$

$\delta_{(W)} = Error\,gradient\,for\,weight\,layer\,2$

$k_{(1)} = output\,of\,hidden\,layer\,one$

In [None]:
# In code, looks like:
weight_change_step_weight_layer_two = learning_rate * Error_gradient_weight_layer_two * output_of_hidden_layer_one
# or:
del_W = learnrate * Err_grad_W * k_one

## Step 4: Calculate the Weight Change Step for Weight Layer #1

$$\large \Delta w = \eta *\delta _{(w)} *x$$

$\Delta w = weight\,change\,step\,for\,weight\,layer\,1$

$\eta = learning\,rate$

$\delta_{(w)} = Error\,gradient\,for\,weight\,layer\,1$

$x = inputs\,to\,neural\,network$

In [None]:
# In code, looks like:
weight_change_step_weight_layer_one = learning_rate * Error_gradient_weight_layer_one * inputs_to_neural_network
# or:
del_w = learnrate * Err_grad_w * x
# If this throws back a Value Error, add [:,None] to the end of the line to clear

## Step 5: Update Weights for Weight Layer #1

$$\large w = w + \frac {\eta \Delta w}{r}$$

$The\,first\,w = Updated\,weight\,layer\,1$

$The\,second\,w = Current\,weight\,layer\,1$

$\eta = learning\,rate$

$\Delta w = weight\,change\,step\,for\,weight\,layer\,1$

$r = number\,of\,records$

In [None]:
# In code, looks like:
weight_layer_one += (learning_rate * weight_change_step_weight_layer_one)/number_of_records
# or:
w += learnrate * del_w / r

## Step 6: Update Weights for Weight Layer #2

$$\large W = W + \frac {\eta \Delta W}{r}$$

$The\,first\,W = Updated\,weight\,layer\,2$

$The\,second\,W = Current\,weight\,layer\,2$

$\eta = learning\,rate$

$\Delta w = weight\,change\,step\,for\,weight\,layer\,2$

$r = number\,of\,records$

In [None]:
# In code, looks like:
weight_layer_two += (learning_rate * weight_change_step_weight_layer_two)/number_of_records
# or:
W += learnrate * del_W / r

## Putting it all Together

In [None]:
#Import your Packages:
import numpy as np

# Define your Activation Function:
def sigmoid(input_to_hidden_layer_one):
    return 1 / (1 + np.exp(-input_to_hidden_layer_one)) # np.exp = numpy exponential function
#or: def sigmoid(h_one):
    #return 1 / (1 + np.exp(-h_one)) 
    
# Part 1: Calculate the Forward Pass

## Step 1: Calculate the Linear Combination for the Inputs and Weights Layer #1
input_to_hidden_layer_1 = np.dot(weight_layer_one,inputs_to_neural_network) + bias #np.dot() = numpy dot product function
# or: h_one = np.dot(w,x) + b

## Step 2: Calculate the Activation Function for the Hidden Layer
output_of_hidden_layer_one = sigmoid(input_to_hidden_layer_one)
# or: k_one = sigmoid(h_one)

## Step 3: Calculate the Linear Combination for the Output of Hidden Layer #1 and Weights Layer #2
input_to_output_layer = np.dot(weight_layer_two, output_of_hidden_layer_one)
#or: h_two = np.dot(W,k_one)

## Step 4: Calculate the Activation Function for the Output Layer
output_of_neural_network = sigmoid(input_to_output_layer)
#or: prediction = sigmoid(h_n)

# Part 2: Calculate the Backwards Pass

## Step 1: Calculate the Error Gradient for Weight Layer #2
Error_gradient_weight_layer_two = (target_values - output_of_neural_network) * output_of_neural_network * (1 - output_of_neural_network)
# or: Err_grad_W = (y - prediction) * prediction * (1 - prediction)

## Step 2: Calculate Error Gradient for Weight Layer #1
Error_gradient_weight_layer_one = weight_layer_two * Error_gradient_weight_layer_two * output_of_hidden_layer_one * (1 - output_of_hidden_layer_one)
# or: Err_grad_w = W * Err_grad_W * k_one * (1 - k_one)

## Step 3: Calculate the Weight Change Step for Weight Layer #2
weight_change_step_weight_layer_two = learning_rate * Error_gradient_weight_layer_two * output_of_hidden_layer_one
# or: del_W = learnrate * Err_grad_W * k_one

## Step 4: Calculate the Weight Change Step for Weight Layer #1
weight_change_step_weight_layer_one = learning_rate * Error_gradient_weight_layer_one * inputs_to_neural_network
# or: del_w = learnrate * Err_grad_w * x
# If this throws back a Value Error, add [:,None] to the end of the line to clear

## Step 5: Update Weights for Weight Layer #1
weight_layer_one += (learning_rate * weight_change_step_weight_layer_one)/number_of_records
# or: w += learnrate * del_w / r

## Step 6: Update Weights for Weight Layer #2
weight_layer_two += (learning_rate * weight_change_step_weight_layer_two)/number_of_records
# or: W += learnrate * del_W / r