# Understanding a Basic 2 Layer Neural Network

#### This ipynb is written to assist my personal understanding of neural networks. It would make me even happier if it helped others understand also :) . I will go in-depth with every part of the code as I write it.

## Step 1 - importing dependencies :

- The only dependency we will need for running this network is numpy (for mathematically manipulating multidimensional arrays and matrices.)




In [183]:
import numpy as np

## Step 2 - Defining the non-linearity :

- For this example, I have used the sigmoid non-linearity function. It maps any value to a value between 0 & 1. This is very helpful as it helps convert numbers to probabilities.


##### A very helpful property of sigmoid function is that its output can be used to create its derivative. For example if th sigmoid has an output of "x", then the derivative is calculated simply as x * (1 - x)

In [184]:
# non linearity that is utilized to gauge output confidence.
# values are between 0 & 1. The sigmoid function can also generate
# a derivative if (deriv=True)
def sigmoidFn(x, deriv=False):
    if(deriv==True):
        return x*(1-x)
    return 1/(1+np.exp(-x))

# def nonlin(x,deriv=False):
#     if (deriv==True):
#         return x*(1-x)
#     return 1/(1+np.exp(-x))

## Step 3 - Initializing the input matrix using numpy :

- For all intents and purposes, I will be creating a very basic input-dataset. It is a simple 6 X 3 matrix (6 rows by 3 columns)


- Each row is a "training-example" and each column is an input "node" that is fed into the network. This means, the network has 3 inputs and 6 training examples per input which the network can be trained from.

In [185]:
# Input matrix where every row is a training example and each column is
# an input node to the network.
X = np.array([[0,0,1],
              [0,1,1],
              [0,1,0],
              [1,0,1],
              [1,1,1],
              [1,1,0]])

# print X

## Step 4 - Initializing the output dataset using numpy :

- In accordance with the simple training data, I deviced a simple output (so that the network can be quickly trained), which is a 1 X 6 (1 row by 6 columns) array.


- The next step would be transpose the array, which would change the shape of the matrix above to a 6 X 1. This is done to ensure the output is in accordance with the input. i.e, Each row is a training example and each column is an output node.
### The above statement means that the output layer has 6 inputs and one output.

In [186]:
# Output dataset where each row is a training example
# .T transposes the matrix, so [0,0,0,1,1,1] becomes [[0,
#                                                      0,
#                                                      0,
#                                                      1,
#                                                      1,
#                                                      1]]

Y = np.array([[0,0,0,1,1,1]]).T

# print Y

# y = np.array([[1,1,0,0]]).T

## Step 5 - Random seeding 

- Seeding is a numpy method so that there is a uniform random distribution of numbers everytime the network is trained.

In [187]:
np.random.seed(1)
# print np.random.seed(1)

## Step 6 - Defining a weight Matrix for this Neural Network :

- Because there are only 2 layers in the neural network, the input layer & the output layer, only one weight matrix is necessary to connect the two layers.

- The dimension of the weight matrix is 3 X 1 because the input layer(as discussed in step 3) is of size 3 and the output layer(as discussed in step 4) is a size of 1.

- The initialization of the weight is very theory intensive, but as best practice for this simple example, we initialize the weight randomly with a zero mean.
    ##### "zero mean" means that the sum of the entries vector divided by the dimension of the matrix equates to 0.

In [188]:
# Initializing the weight matrix randomly.
# weightMatrix = 2*np.random.random((3,1)) - 1
# print weightMatrix
weightMatrix = 2*np.random.random((3,1)) - 1

## Step 7 - Training the Neural Network :

- We create a "for" loop that iterates many times over our training data defined in step 3 & 4 to optimize the network to the specified dataset.

### Forward Propagation :

- We are going to utilize a technique known as full-batch training which will process all the input data(defined as X in step 3) at the same time.

- We then proceed towards letting the netword predict (I use the term predict loosely because each prediction is a constant iteration towards an even better prediction) the output based on the input.

- The next bit of code below can be interpreted in 2 steps. Firstly, our inputLayer matrix, which is X(defined in step 3) of size 6 X 3 undergoes dot product (matrix multiplication) with our initialized weightMatrix(from step 6) of size 3 X 1. As a rule, the result of the dot product is a matrix of size 6 X 1.
```python 
outputLayer = sigmoidFn(np.dot(inputLayer,weightMatrix))
```

- Another important thing occurance to note in the above line of code is that the output of the dot product passes through the sigmoidFn declared way up in step 2.

In [197]:
for iter in xrange(100000):

    # forward propagation
    inputLayer = X
    outputLayer = sigmoidFn(np.dot(inputLayer,weightMatrix))
#     print outputLayer
#     print "W-Mtrix:",weightMatrix
#     print "iputLayer",inputLayer
#     print "DotProd",np.dot(inputLayer, weightMatrix)
#     print "SigmoidFnCalc:",sigmoidFn(np.dot(inputLayer, weightMatrix))
#     print "Output Layer:",outputLayer


## Step 8 - Evaluating Calculation Error :


- The line of code below is simply a means to calculate the accuracy of our predicted value(outputLayer) compared to the actual, predefined output value (Y).
- Remember, that Y is a 6 X 1 matrix(As defined in part 4). The outputLayer is also a 6 X 1 matrix, therefore it is fairly intuitive that in order to calculate the error, we would have to subtract the two matrices.

In [198]:
outputLayerError = Y - outputLayer
    # print Y
print outputLayer
    # print "O-put Error",outputLayerError

[[ 0.325416  ]
 [ 0.42596688]
 [ 0.60603278]
 [ 0.4634023 ]
 [ 0.57052986]
 [ 0.73360785]]


## Step 9 - 

- To understand this step fully, we can take it step-by-step.
    - Firstly, after calculating the derivative sigmoidFn(outputLayer, True), each calculated value is always between 0 & 1. That is a property of the sigmoid function.
    - Secondly, we multiply the returned 6 X 1 derivative consisting matrix above to the outputLayerError, which is also a 6 X 1 matrix.
###### What the second step does is that it multiplies the derivatves to the error "elementwise" and that helps to reduce the errors of predictions that carry a high confidence.


In [199]:
errorBasedChange = outputLayerError * sigmoidFn(outputLayer, True) 
print outputLayer
    # print weightMatrix
    # print outputLayerError
    # print "Derivs", sigmoidFn(inputLayer, True)
    # print errorBasedChange


[[ 0.325416  ]
 [ 0.42596688]
 [ 0.60603278]
 [ 0.4634023 ]
 [ 0.57052986]
 [ 0.73360785]]


## Step 10 - Updating the weight matrix based on previously achieved results.

- 

In [200]:
#     print "weightMatrix1",weightMatrix
#     print "IPUT LAYER",inputLayer
#     print "EBC",errorBasedChange
weightMatrix += np.dot(inputLayer.T, errorBasedChange)
    
#     print "DOTPROD", np.dot(inputLayer.T, errorBasedChange)
#     print "WMAtrix2", weightMatrix
#     print "weigh:",updatedWeights

print "OUTPUT", outputLayer

OUTPUT [[ 0.325416  ]
 [ 0.42596688]
 [ 0.60603278]
 [ 0.4634023 ]
 [ 0.57052986]
 [ 0.73360785]]
