<h1>Simple Neural Network</h1><br><b>Our feed forward neural network is going to predict Xor operator. This code is very simple, just to ilustrate how neural networks work<b><br>

In [2]:
import numpy as np # numpy because it rocks

<h4>Activation function</h4> introduces non-linearity into our network. We use a sigmoid function $$f(x)=\frac{1}{1+e^{-x}}$$<br>And along the line we will need its derivative $$f'(x)=x (1-x)$$ <br>This function is used to introduce non-linearity to the neural network. We can use other functions different than sigmoid, for example tanh.<br> We will see later how its being used

In [3]:
def activation(x,deriv=False):
    if(deriv==False):
        return 1 / (1 + np.exp(-x))
    else:
        return x * (1 - x)

<h4>Inputs and Outputs</h4>
Our data is just going to be binary vectors, and we are going to predict XOR function. We will feed 4 vectors in our Neural Network, and we want it to learn to predict the outputs.<br>
So here we see our input, and our expected output.

In [4]:
inputarray = np.array([[0,0], 
                       [0,1],
                       [1,1],
                       [1,0]])

outputarray = np.array([[0],
                        [1],
                        [0],
                        [1]])

<h4>Defining Neural Network</h4>
Our Network is going to have 3 layers. Lots of examples show neural network ilustrating nodes and arrows. I think more helpful way to think about it is matricies. We have 3 matricies representing our layers, and 2 matricies representing our weights.
<ul>
  <li>Input layer - <b>layer1 - 4x2 Matrix </b></li>
  <li>Hidden layer - <b>layer2  - 4x5 matrix</b></li>
  <li>Output layer - <b>layer3  - 4x1 matrix</b></li>
</ul>
We also define weights between layers
<ul>
  <li>Weights between layer1 and layer2 - weights1 - <b>matrix 2x5</b></li>
  <li>Weights between layer2 and layer3 - weights2 - <b>5x1</b></li>
</ul>
To save computational power, we dont feed the vectors in our neural network one by one, but simply feed all of them at the same time. 

In [5]:
np.random.seed(1) # seed is better for debugging, we generate same random numbers every time we run

#defining weights and layers - all are matrixies (numpy arrays)
weights1 = np.random.random((2,5)) # random for now
weights2 = np.random.random((5,1)) 
layer1 = np.zeros((4,2)) 
layer2 = np.zeros((4,5))
layer3 = np.zeros((4,1)) 

layer3_error = np.zeros((4,1)) # error term, saying how far our prediction is from a correct output

Now we have a Neural Net initialized and we can make a prediction ! (although a crappy one, because we have not trained our network yet).<br><b>What we do in the cell bellow this is called Forward propagation</b>

In [6]:
layer1 = inputarray # we assign our input to layer1
layer2 = np.dot(layer1,weights1) # We matrix multiply our first layer with the weights to get our second layer
layer2 = activation(layer2) # But we also need to apply our activation function to our second layer
# This activation function is simply applied to all the elements in the matrix

layer3 = activation(np.dot(layer2,weights2)) # We do the same process to get our 3rd layer
# now we have our crappy prediction in the layer3 variable
layer3_error = outputarray - layer3
print(layer3_error) # we can take a look how far are we from a correct result, this error rate is what we want to minimize

[[-0.75160406]
 [ 0.22170815]
 [-0.81373132]
 [ 0.20825635]]


Our prediction is useless, because the weights are random for now. We want to adjust the weights, so that our prediction is closer to our result. The weights our adjusted with gradient descent using back propagation. This process is repeated many times, in each step getting closer to our desired result.<br>
<h4>Backpropagation</h4>
So we have computed our error term (layer3_error) and we want to minimize it. For each element in our Output matrix (layer3) we compute the derivative of the sigmoid function (in that given point), multiply it by the error term to get layer3_delta, and finally updade the weights2 by the multiplication of layer3 delta and transpose of layer2 (as we can see in the code). Similar approach is used to update weights1 (and generaly other weights in NNs with more layers).
<h4>Training</h4>
We train the neural net by simply forward propagating, getting the error term, and then backpropagating with the error term and adjusting the weights. Here we use 50 000 steps, and write the error term every 5000 steps to see if its decreasing.<br>
Tip : Run the cell multiple times to see how fast is the error rate decreasing

In [7]:
for i in range(50000):
    # this is just forward propagation that we already did
    layer1=inputarray
    layer2 = activation(np.dot(layer1,weights1))
    layer3 = activation(np.dot(layer2,weights2))
   
    layer3_error = outputarray - layer3
    # we get our error, like we did before
   
    # now we start backpropagation
    layer3_delta = layer3_error*activation(layer3, deriv=True)

    layer2_error = layer3_delta.dot(weights2.T)

    layer2_delta = layer2_error * activation(layer2,deriv=True)

    #update weights (no learning rate term)
    weights2 += layer2.T.dot(layer3_delta)
    weights1 += layer1.T.dot(layer2_delta)
    
    if(i % 5000) == 0:   # Print our error rate every 5000 steps 
        print("Error: " + str(np.mean(np.abs(layer3_error))))

Error: 0.498824970753
Error: 0.0240046602858
Error: 0.0162288274875
Error: 0.013027624303
Error: 0.0111711562607
Error: 0.00992267210558
Error: 0.00900949548553
Error: 0.00830426646812
Error: 0.00773847060385
Error: 0.0072715382651


We can see our prediction is really close to our desired result. <br>

In [8]:
print(layer3) # prediction
print(outputarray) # desired result

[[ 0.00876447]
 [ 0.99350773]
 [ 0.0055496 ]
 [ 0.99329519]]
[[0]
 [1]
 [0]
 [1]]


If we now want to predict output for a new vector, we can simply do it with forward propagation like this.
We have used all the possible combinations of two binary values in our training data, so [1,0] is what the network was trained on.

In [9]:
l1=np.array([1,0])
l2 = activation(np.dot(l1,weights1))
l3 = activation(np.dot(l2,weights2))
print(l3)

[ 0.99329526]
