In this lab, we're going to make a tiny neural network with one hidden layer and one output layer. This neural net will take the input, multiply them by the weights and add them together as signal.

$$ h = \sum_{i} w_ix_i $$

Activation function will be applied to the signal to produce output :

$$ \hat y = f(h) $$

<img src="assets/neural_net.png" alt="Drawing" style="width: 800px;"/>

The neural network will learn from the data by updating its weights so that it makes less error in prediction. Here we use sum of squared error (SSE) as an example to derive the mat. 

The sum of squared error is:
$$ E = \frac{1}{2}(y-\hat y)^2 = \frac{1}{2}(y-f(\sum w_ix_i))^2 $$

So our goal in learning is to find the weights that minimize the error. We're going to achieve using gradient descent. 

<img src="assets/gradient_descent.png" alt="Drawing" style="width: 800px;"/>

$$ w_i = w_i + \Delta w_i $$

$$ \Delta w_i \propto -\frac{\partial E}{ \partial w_i}$$

$$ \Delta w_i = -\eta \frac{\partial E}{ \partial w_i} $$

<p style="text-align: center;">
Using chain rule, we can see that :

$$ \frac{\partial E}{\partial w_i} = \frac{\partial}{\partial w_i}\frac{1}{2}(y-\hat y)^2 = \frac{\partial}{\partial w_i}\frac{1}{2}(y-\hat y(w_i))^2 = (y-\hat y)\frac{\partial}{\partial w_i}(y-\hat y) \\ = -(y-\hat y)\frac{\partial \hat y}{\partial w_i} = -(y-\hat y)f'(h)\frac{\partial}{\partial w_i}\sum w_ix_i = -(y-\hat y)f'(h)x_i $$

<p style="text-align: center;">
Now we get the weight update :

$$ \Delta w_i = \eta(y-\hat y)f'(h)x_i $$

<p style="text-align: center;">
To make things easier, we define the error term :

$$ \sigma = (y-\hat y)f'(h) $$ 

<p style="text-align: center;">
So the weights can be updated by :

$$ w_i = w_i + \eta\sigma x_i $$

<p style="text-align: center;">
We can easily extend this to neural net with multiple neurons:

<img src="assets/neural_net_multi.png" alt="Drawing" style="width: 800px;"/>
<img src="assets/error_multi.png" alt="Drawing" style="width: 800px;"/>

<p style="text-align: center;">
For sigmoid function,

$$ f(x) = \frac {1}{1 + e^{-x}} $$

<p style="text-align: center;">
the derivative is :

$$ f'(x) = -(1+e^{-x})^{-2}\cdot e^{-x}\cdot (-1 ) = \frac{1}{1=e^{-x}}\frac{1+e^{-x}-1}{1=e^{-x}} = f(x)(1-f(x))$$

In [15]:
import numpy as np

def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1/(1+np.exp(-x))

def sigmoid_prime(x):
    """
    # Derivative of the sigmoid function
    """
    return sigmoid(x) * (1 - sigmoid(x))

learnrate = 0.5
x = np.array([1, 2, 3, 4])
y = np.array(0.5)

# Initial weights
w = np.array([0.5, -0.5, 0.3, 0.1])

### Calculate one gradient descent step for each weight
### Note: Some steps have been consilated, so there are
###       fewer variable names than in the above sample code

# TODO: Calculate the node's linear combination of inputs and weights
h = np.dot(w.T,x)

# TODO: Calculate output of neural network
nn_output = sigmoid(h)

# TODO: Calculate error of neural network
error = y-nn_output

# TODO: Calculate the error term
#       Remember, this requires the output gradient, which we haven't
#       specifically added a variable for.
error_term = error * nn_output * (1 - nn_output)
# Note: The sigmoid_prime function calculates sigmoid(h) twice,
#       but you've already calculated it once. You can make this
#       code more efficient by calculating the derivative directly
#       rather than calling sigmoid_prime, like this:
# error_term = error * nn_output * (1 - nn_output)

# TODO: Calculate change in weights
del_w = learnrate*error_term*x

print('Neural Network output:')
print(nn_output)
print('Amount of Error:')
print(error)
print('Change in Weights:')
print(del_w)

Neural Network output:
0.6899744811276125
Amount of Error:
-0.1899744811276125
Change in Weights:
[-0.02031869 -0.04063738 -0.06095608 -0.08127477]


If you do it right, you should see :   
Neural Network output:     
0.6899744811276125   
Amount of Error:   
-0.1899744811276125   
Change in Weights:   
[-0.02031869 -0.04063738 -0.06095608 -0.08127477]   