# Understanding a Basic 2 Layer Neural Network

#### This ipynb is written to assist my personal understanding of neural networks. It would make me even happier if it helped others understand also :) . I will go in-depth with every part of the code as I write it. For added ease of use, anyone can follow along as test the code by commenting out print statements to understand what exactly is going in each step.


## Step 1 - importing dependencies :

- The only dependency we will need for running this network is numpy (for mathematically manipulating multidimensional arrays and matrices.)

In [201]:
import numpy as np

## Step 2 - Defining the non-linearity :

- For this example, I have used the sigmoid non-linearity function. It maps any value that is passed in the function to a value between 0 & 1. This is very helpful as it helps convert numbers to probabilities.
- The sigmoid function can generate derivative of a sigmoid when the derivative is set to "True".

##### A very helpful property of sigmoid function is that its output can be used to create its derivative. For example if the sigmoid has an output of "x", then the derivative is calculated simply as x * (1 - x)

In [303]:
# non linearity that is utilized to gauge output confidence.
# values are between 0 & 1. The sigmoid function can also generate
# a derivative if (deriv=True)
def sigmoidFn(x, deriv=False):
    if(deriv==True):
        return x*(1-x)
    return 1/(1+np.exp(-x))

# Uncomment the code below to see slope calculation in action as we provide an input(x1) array.
# x1 = np.array([[2],[0],[-1]])

# When we pass x1 values through the sigmoidFunction, we get corresponding points(y1) on the sigmoid graph.
# y1 = sigmoidFn(x1, False)

# The values that are returned above(y1) can now be used to calculate the derivative(slopes) of those points
# slopes = sigmoidFn(y1, True)

# print "x1 values: ", x1
# print "y1 Values: ",y1
# print "slope values: ",slopes


## Step 3 - Initializing the input matrix using numpy :

- For all intents and purposes, I will be creating a very basic input-dataset. It is a simple 6 X 3 matrix (6 rows by 3 columns)


- Each row is a "training-example" and each column is an input "node" that is fed into the network. This means, the network has 3 inputs and 6 training examples per input which the network can be trained from.

In [241]:
# Input matrix where every row is a training example and each column is
# an input node to the network.
X = np.array([[0,0,1],
              [0,1,1],
              [1,0,1],
              [1,1,1],
              [1,1,1],
              [1,1,0]])

# print X

## Step 4 - Initializing the output matrix using numpy :

- In accordance with the simple training data, I deviced a simple output (so that the network can be quickly trained), which is a 1 X 6 (1 row by 6 columns) array.


- The next step would be transpose the array, which would change the shape of the matrix above to a 6 X 1. This is done to ensure the output is in accordance with the input. i.e, Each row is a training example and each column is an output node.
### The above statement means that the output layer has 6 inputs and one output.

In [304]:
# Output dataset where each row is a training example
# .T transposes the matrix, so [0,0,0,1,1,1] becomes [[0,
#                                                      0,
#                                                      0,
#                                                      1,
#                                                      1,
#                                                      1]]

Y = np.array([[0,0,0,1,1,1]]).T
# print Y

## Step 5 - Random seeding 

- Seeding is a numpy method so that there is a uniform random distribution of numbers everytime the network is trained.

In [243]:
np.random.seed(1)

## Step 6 - Defining a weight matrix for this Neural Network :

- Because there are only 2 layers in the neural network, the input layer & the output layer, only one weight matrix is necessary to connect the two layers.

- The dimension of the weight matrix is 3 X 1 because the input layer(as discussed in step 3) is of size 3 and the output layer(as discussed in step 4) is a size of 1.

- The initialization of the weight is very theory intensive, but as best practice for this simple example, we initialize the weight randomly with a zero mean.
    ##### "zero mean" means that the sum of the entries vector divided by the dimension of the matrix equates to 0.

In [253]:
# Initializing the weight matrix randomly.
weightMatrix = 2*np.random.random((3,1)) - 1
# print weightMatrix

## Step 7 - Training the Neural Network :

- We create a "for" loop that iterates many times over our training data defined in step 3 & 4 to optimize the network to the specified dataset.

### Forward Propagation :

- We are going to utilize a technique known as full-batch training which will process all the input data(defined as X in step 3) at the same time.

- We then proceed towards letting the netword predict (I use the term predict loosely because each prediction is a constant iteration towards an even better prediction) the output based on the input.

- The next bit of code below can be interpreted in 2 steps. Firstly, our inputLayer matrix, which is X(defined in step 3) of size 6 X 3 undergoes dot product (matrix multiplication) with our initialized weightMatrix(from step 6) of size 3 X 1. As a rule, the result of the dot product is a matrix of size 6 X 1.
```python 
outputLayer = sigmoidFn(np.dot(inputLayer,weightMatrix))
```

- Another important thing occurance to note in the above line of code is that the output of the dot product passes through the sigmoidFn declared way up in step 2.

In [262]:
for iter in xrange(10):

    # forward propagation
    inputLayer = X
    outputLayer = sigmoidFn(np.dot(inputLayer,weightMatrix))
#         print "W-Mtrix:",weightMatrix
#         print "iputLayer",inputLayer
#         print "DotProd",np.dot(inputLayer, weightMatrix)
    
                #################
                #               #
                # Read STEP : 8 #
                #               #
                #################
    outputLayerError = Y - outputLayer
#         print "OutputLayerError: ",outputLayerError

                #################
                #               #
                # Read STEP : 9 #
                #               #
                #################
    errorBasedChange = outputLayerError * sigmoidFn(outputLayer, True) 
#         print "Derivative of the outputLayer:", sigmoidFn(outputLayer, True)
#         print "Change based on outputLayerError:",errorBasedChange
        
        
                #################
                #               #
                # Read STEP : 10 #
                #               #
                #################
    weightMatrix += np.dot(inputLayer.T, errorBasedChange)
#     print "T",inputLayer.T
#     print "E",errorBasedChange
#     print "W",weightMatrix
#         print "Dot Product:", np.dot(inputLayer.T, errorBasedChange)
#         print "Updated Weight Matrix", weightMatrix
#         print "Outputs after each iterations:", outputLayer

print "OUTPUT"
print outputLayer


OUTPUT
[[  1.28236047e-06]
 [  1.14091723e-02]
 [  1.14091723e-02]
 [  9.90463826e-01]
 [  9.90463826e-01]
 [  9.99999988e-01]]


## Step 8 - Evaluating calculation error :


- The line of code below is simply a means to calculate the accuracy of our predicted value(outputLayer) compared to the actual, predefined output value (Y).
- Remember, that Y is a 6 X 1 matrix(As defined in part 4). The outputLayer is also a 6 X 1 matrix, therefore it is fairly intuitive that in order to calculate the error, we would have to subtract the two matrices.

## Step 9 - Calculating the "Error Based Change" :

- To understand this step fully, we can take it step-by-step.
    - Firstly, after calculating the derivative sigmoidFn(outputLayer, True), each calculated value is always between 0 & 1. That is a property of the sigmoid function.
    - Secondly, we multiply the returned 6 X 1 derivative consisting matrix above to the outputLayerError, which is also a 6 X 1 matrix.
###### What the second step does is that it multiplies the derivatves to the error "elementwise" and that helps to reduce the errors of predictions that carry a high confidence.


## Step 10 - Updating the weight matrix based on previously achieved results :

- This update step is highly dependent on how much of a difference there is between the actual output(Y) and the estimated output(outputLayer).
- If one thinks about it intuitively, it makes sense that if a weight is already accurately predicting the correct output value, it does not need to be tampered with much.

# Implementing a 3 Layer Neural Network using the methodology defined in steps 1 through 10.

#### Uncomment the print statements at the end of the code to see step-by-step what happens.

In [None]:
import numpy as np

def sigmoidFn(x, deriv=False):
    if deriv == True:
        return x * (1-x)
    return 1/(1+np.exp(-x))
# 4 X 3 Matrix
X = np.array([[0,0,1], [0,1,1], [1,0,1], [1,1,1]])

# 4 X 1 Matrix
y = np.array([[0],[1],[1],[0]])

# To make sure all our randomly generated values are the same in all runs
np.random.seed(1)

# 3 X 4 Matrix
weightMatrix1 = 2 * np.random.random((3,4)) - 1
# 4 X 1 Matrix
weightMatrix2 = 2 * np.random.random((4,1)) - 1

for j in xrange(60000):
    inputLayer = X
#     We estimate the hiddenLayer
    hiddenLayer = sigmoidFn(np.dot(inputLayer, weightMatrix1))
#     We estimate the outputLayer
    outputLayer = sigmoidFn(np.dot(hiddenLayer, weightMatrix2))
    
#     How much is our estimation off by for the outputLayer
    outputLayerError = y - outputLayer
    
#     In what direction is the target value outputLayer? How accurate was our estimation
    outputLayerErrorBasedChange = outputLayerError * sigmoidFn(outputLayer, True)
    
#     How much did each hiddenLayerValue Contribute to the outputLayerError?
# Back Propagation occurs here
    hiddenLayerError = outputLayerErrorBasedChange.dot(weightMatrix2.T)
    
#     In what direction is target value hiddenLayer and how accurate was the estimation
    hiddenLayerErrorBasedChange = hiddenLayerError * sigmoidFn(hiddenLayer, True)
    
    weightMatrix2 += hiddenLayer.T.dot(outputLayerErrorBasedChange)
    weightMatrix1 += inputLayer.T.dot(hiddenLayerErrorBasedChange)
    
# When Running This Kernel, please allow a couple of seconds for the iteration to complete

# print "InputLayer"
# print inputLayer
# print "-------------"
# print "weightMatrix1"
# print weightMatrix1
# print "-------------"
# print "dot Product of InputLayer & weightMatrix1"
# print np.dot(inputLayer, weightMatrix1)
# print "-------------"
# print "HiddenLayer"
# print hiddenLayer
# print "-------------"
# print "weightMatrix2"
# print weightMatrix2
# print "-------------"
# print "dot Product of HiddenLayer & WeightMatrix2"
# print np.dot(hiddenLayer, weightMatrix2)
# print "-------------"
# print "OutputLayer:"
# print outputLayer
# print "OutputLayerError:"
# print outputLayerError
# print "-------------"
# print "OutputLayerErrorBasedChange:"
# print outputLayerErrorBasedChange
# print "-------------"
# print "HiddenLayerError:"
# print hiddenLayerError
# print "-------------"
# print "HiddenLayerErrorBasedChange:"
# print hiddenLayerErrorBasedChange
# print "-------------------------------"
# print "MEAN OUTPUT ERROR"
# print str(np.mean(np.abs(outputLayerError)))