### Neural networks can be intimidating, especially for people new to machine learning. There are several Neural Network libraries in Python (ex. Keras & TensorFlow) abstracting complicated details and computations (ex. matrix multiplications and Gradiend Decent). However, for educational purposes, we will build our own NN from scratch using Numpy only. This will help us break down how exactly NN works.

#### Lets build this NN:
<img src="images/NN_with_weights_notations_updated.png" alt="FeedForwardNeuralNetwork" title="FeedForwardNeuralNetwork" height="620" width="420"/>

### Import necessary libraries

In [1]:
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42) # to reproduce same results

### activation function
The output of each artificial neuron is the sum of inputs times weights passed through an activation function. There are a lot of activation functions, each has its own pros & cons...
<img src="images/activation_functions.png" alt="activation_functions" title="activation_functions" height="220" width="820"/>
Here, we will be using Sigmoid.

In [2]:
# define sigmoid and its derivative

def sigmoid(x):
    return 1 / (1+np.exp(-x))
    

def sigmoid_prime(x):
    return x * (1-x)
    

### Preparing training data
<img src="images/training_data.png" alt="training_data" title="training_data" height="220" width="420"/>

In [4]:
# X = (hours sleeping, hours studying), y = score on test
x = np.array([[3,18],[6,15],[8,12],[10,8]])  # 4 X 2
y = np.array([[89],[84],[79],[72]]) # 4 X 1
print(x)
print(y)

[[ 3 18]
 [ 6 15]
 [ 8 12]
 [10  8]]
[[89]
 [84]
 [79]
 [72]]


### Normalization (scaling)
Sometimes, the range of values of raw data varies widely. In that case, objective functions will not work properly without normalization. Normalizatoin also helps gradient descent to converge faster.

In [6]:
x = x/ 24
y = y/ 100
print(x)

[[ 0.125       0.75      ]
 [ 0.25        0.625     ]
 [ 0.33333333  0.5       ]
 [ 0.41666667  0.33333333]]


### Initialize learnable parameters: weights and biases
usually as small random values, but 0.5 here to compare with our manual solution. Ignore bias for simplicity

In [7]:
w1 = np.ones((2,3)) * 0.5 # 2 X 3 --> 3 neurons, each has two inputs
w2 = np.ones((1,3)) * 0.5 # 3 X 1 --> single neuron with three inputs

### Training

In [None]:
lr = 0.8 # learning rate
epochs = np.arange(10000)
sse = []

# return output of hidden layer & output layer
def feed_forward(x, w1, w2):
    hidden_output = sigmoid( np.matmul(x,w1) ) # 4 X 3
    output = sigmoid( np.matmul(hidden_output,w2) ) # 4 X 1
    return hidden_output, output

# return gradients of last & hidden layer
def backpropagation(x, y, y_hat, hidden_output):
    delta = (y-y_hat) * sigmoid_prime(y_hat) # 4 X 1
    gradient = np.transpose( np.matmul(delta.T, hidden_output) ) # 3 X 1
    
    delta2 = delta.dot(w1.T) * sigmoid_prime(hidden_output) # 4 X 3
    gradient2 = x.T.dot(delta2) # 2 X 3
    
    return gradient, gradient2

for i in epochs:

    # feed forward : calculate y_hat
    
    
    # calculate SSE and append to list

    # backpropagation : calculate gradients
    
    # adjusting weights
    

### Plot SSE during training