# A Basic 2 Layer Neural Net Implementation (Using Numpy)

### This code is a follow along/documentation of the code by Siraj Raval on https://www.youtube.com/watch?v=h3l4qz76JhQ&t=28s

## Step 1 - Importing dependencies :

- The only dependency we will need for running this network is numpy (for mathematically manipulating multidimensional arrays and matrices.)

In [34]:
import numpy as np

## Step 2 - Defining the non-linearity :

- For this example, the sigmoid non-linearity function is utilized. It maps any value that is passed in the function to a value between 0 & 1. This is very helpful as it helps convert numbers to probabilities.
- The sigmoid function can generate derivative of a sigmoid when the derivative is set to "True".

##### A very helpful property of sigmoid function is that its output can be used to create its derivative. For example, if the sigmoid has an output of "x", then the derivative is calculated simply as x * (1 - x)

In [35]:
# non linearity that is utilized to gauge output confidence.
# values are between 0 & 1. The sigmoid function can also generate
# a derivative if (deriv=True)
def sigmoidFn(x, deriv=False):
    if(deriv==True):
        return x*(1-x)
    return 1/(1+np.exp(-x))

# Uncomment the code below to see slope calculation in action as we provide an input(x1) array.
# x1 = np.array([[2],[0],[-1]])

# When we pass x1 values through the sigmoidFunction, we get corresponding points(y1) on the sigmoid graph.
# y1 = sigmoidFn(x1, False)

# The values that are returned above(y1) can now be used to calculate the derivative(slopes) of those points
# slopes = sigmoidFn(y1, True)

# print("x1 values: ", x1)
# print("y1 Values: ",y1)
# print("slope values: ",slopes)

## Step 3 - Initializing the input matrix using numpy :

- For all intents and purposes, I will be creating a very basic input-dataset. It is a simple 4 X 3 matrix (4 rows by 3 columns)


- Each row is a "training-example" and each column is an input "node" that is fed into the network. This means, the network has 3 inputs and 4 training examples per input which the network can be trained from.

In [36]:
X = np.array([[0,0,1], 
            [0,1,1],
            [1,0,1],
            [1,1,1]])

print("X", X)

X [[0 0 1]
 [0 1 1]
 [1 0 1]
 [1 1 1]]


## Step 4 - Initializing the output matrix using numpy :

- In accordance with the simple training data, a simple output matrix is defined (so that the network can be quickly trained), which is a 1 X 4 (1 row by 4 columns) array.


- The next step would be transpose the array, which would change the shape of the matrix above to a 4 X 1. This is done to ensure the output is in accordance with the input. i.e, Each row is a training example and each column is an output node.
### The above statement means that the output layer has 4 inputs and one output.

In [37]:
Y = np.array([[0,0,1,1]]).T

## Step 5 - Random seeding 

- Seeding is a numpy method so that there is a uniform random distribution of numbers everytime the network is trained.

In [38]:
np.random.seed(1)

## Step 6 - Defining a weight matrix for this Neural Network :

- Because there are only 2 layers in the neural network, the input layer & the output layer, only one weight matrix is necessary to connect the two layers.

- The dimension of the weight matrix is 3 X 1 because the input layer(as discussed in step 3) is of size 3 and the output layer(as discussed in step 4) is a size of 1.

- The initialization of the weight is very theory intensive, but as best practice for this simple example, we initialize the weight randomly with a zero mean.
    ##### "zero mean" means that the sum of the entries vector divided by the dimension of the matrix equates to 0.

In [39]:
# Initializing the weight matrix randomly.
weightMatrix = 2*np.random.random((3,1)) - 1
# print weightMatrix

#### Step 7 - Training the Neural Network :

- We create a "for" loop that iterates many times over our training data defined in step 3 & 4 to optimize the network to the specified dataset.

### Forward Propagation :

- We are going to utilize a technique known as full-batch training which will process all the input data(defined as X in step 3) at the same time.

- We then proceed towards letting the netword predict (I use the term predict loosely because each prediction is a constant iteration towards an even better prediction) the output based on the input.

- The next bit of code below can be interpreted in 2 steps. Firstly, our inputLayer matrix, which is X(defined in step 3) of size 4 X 3 undergoes dot product (matrix multiplication) with our initialized weightMatrix(from step 6) of size 3 X 1. As a rule, the result of the dot product is a matrix of size 4 X 1.
```python 
outputLayer = sigmoidFn(np.dot(inputLayer,weightMatrix))
```

- Another important thing occurance to note in the above line of code is that the output of the dot product passes through the sigmoidFn declared way up in step 2.

In [58]:
for iter in range(10000):
    # forward propagation
    inputLayer = X
    outputLayer = sigmoidFn(np.dot(inputLayer,weightMatrix))
#     Print out the following at every 500th iteration.
#     if iter % 500 == 0:
#         print("inputputLayer:",inputLayer)
#         print("Weight-Matrix:",weightMatrix)
#         print("DotProduct:",np.dot(inputLayer, weightMatrix))
#         print("outputLayer", outputLayer)
    
                #################
                #               #
                # Read STEP : 8 #
                #     Below     #
                #               #
                #################
    outputLayerError = Y - outputLayer
#     Print out the OutputLayerError at every 500th iteration.
#     if iter % 500 == 0:     
#         print("OutputLayerError: ",outputLayerError)

                #################
                #               #
                # Read STEP : 9 #
                #     Below     #
                #               #
                #################
    errorBasedChange = outputLayerError * sigmoidFn(outputLayer, True)
#     Print out the OutputLayer's Derivative at every 500th iteration.
#     if iter % 500 == 0:
#         print("Derivative of the outputLayer:", sigmoidFn(outputLayer, True))
#         print("Change based on outputLayerError:",errorBasedChange)
        
        
                #################
                #               #
                # Read STEP : 10#
                #     Below     #
                #               #
                #################
    weightMatrix += np.dot(inputLayer.T, errorBasedChange)
#     if iter % 500 == 0:
#     print("T",inputLayer.T)
#     print("E",errorBasedChange)
#     print("W",weightMatrix)
#     print("Dot Product:", np.dot(inputLayer.T, errorBasedChange))
#     print("Updated Weight Matrix", weightMatrix)

    # The calculated output error for the first iteration
    if iter == 1:
        print("-----------------------")
        print("1st Iteration Output :")
        print("-----------------------")
        print(outputLayer)

        print("-----------------------")
        print("1st iteration Output Error :")
        print("-----------------------")
        print(outputLayerError)

    if iter == 1000:
        print("-----------------------")
        print("1000th Iteration Output :")
        print("-----------------------")
        print(outputLayer)

        print("-----------------------")
        print("1000th iteration Output Error :")
        print("-----------------------")
        print(outputLayerError)

    if iter == 5000:
        print("-----------------------")
        print("5000th Iteration Output :")
        print("-----------------------")
        print(outputLayer)

        print("-----------------------")
        print("5000th iteration Output Error :")
        print("-----------------------")
        print(outputLayerError)

print("-----------------------")
print("Final Trained Output :")
print("-----------------------")
print(outputLayer)

print("-----------------------")
print("Final Training Error :")
print("-----------------------")
print(outputLayerError)


-----------------------
1st Iteration Output :
-----------------------
[[ 0.00287634]
 [ 0.00234601]
 [ 0.99808547]
 [ 0.99765245]]
-----------------------
1st iteration Output Error :
-----------------------
[[-0.00287634]
 [-0.00234601]
 [ 0.00191453]
 [ 0.00234755]]
-----------------------
1000th Iteration Output :
-----------------------
[[ 0.0028633 ]
 [ 0.00233538]
 [ 0.99809414]
 [ 0.9976631 ]]
-----------------------
1000th iteration Output Error :
-----------------------
[[-0.0028633 ]
 [-0.00233538]
 [ 0.00190586]
 [ 0.0023369 ]]
-----------------------
5000th Iteration Output :
-----------------------
[[ 0.00281278]
 [ 0.00229422]
 [ 0.99812771]
 [ 0.99770431]]
-----------------------
5000th iteration Output Error :
-----------------------
[[-0.00281278]
 [-0.00229422]
 [ 0.00187229]
 [ 0.00229569]]
-----------------------
Final Trained Output :
-----------------------
[[ 0.00275326]
 [ 0.00224572]
 [ 0.99816728]
 [ 0.99775287]]
-----------------------
Final Training Error :

## Step 8 - Evaluating calculation error :


- The code under step 8 simply shows a means to calculate the accuracy of our predicted value(outputLayer) compared to the actual, predefined output value(Y).
- Remember, that Y is a 4 X 1 matrix(As defined in step 4). The outputLayer is also a 4 X 1 matrix, therefore it is fairly intuitive that in order to calculate the error, we would have to subtract the two matrices.

#### Uncomment parts of the code in step 8, to check out the various values of the  and make sure the calculations make sense to you.

## Step 9 - Calculating the "Error Based Change" :

- To understand this step fully, we can take it step-by-step.
    - Firstly, after calculating the derivative sigmoidFn(outputLayer, True), each calculated value is always between 0 & 1. That is a property of the sigmoid function.
    - Secondly, we multiply the returned 4 X 1 matrix consisting of the derivative to the outputLayerError matrix, which is also a 4 X 1 matrix.

###### What the second step(multiplying the derivative matrix with the outputLayerError) does is that it multiplies the derivatves to the error "elementwise" and that helps to reduce the errors of predictions that carry a high confidence.

## Step 10 - Updating the weight matrix based on previously achieved results :

- This update step is highly dependent on how much of a difference there is between the pre-defined output(Y) and the estimated output(outputLayer).
- If one thinks about it intuitively, it makes sense that if a weight is already accurately predicting the correct output value, it does not need to be tampered with much.
- In the final line, you can see, that after 10000 iterations, the outputLayerError has the lowest error compared to the 1st, 1000th and 5000th iteration errors. 

# Part 2 - Adding a 3rd Layer

In [92]:
import numpy as np

def nonlin(x,deriv=False):
    if(deriv==True):
        return x*(1-x)

    return 1/(1+np.exp(-x))
    
X = np.array([[0,0,1],
            [0,1,1],
            [1,0,1],
            [1,1,1]])
                
y = np.array([[0],
            [1],
            [1],
            [0]])

np.random.seed(1)

# randomly initialize our weights with mean 0
syn0 = 2*np.random.random((3,4)) - 1
print("Initial Random Weight Matrix  Syn0")
print(syn0)
syn1 = 2*np.random.random((4,1)) - 1
print("Initial Random Weight Matrix Syn1")
# print(syn1)

for j in range(60000):

    # Feed forward through layers 0, 1, and 2
    l0 = X
    l1 = nonlin(np.dot(l0,syn0))
    l2 = nonlin(np.dot(l1,syn1))

    # how much did we miss the target value?
    l2_error = y - l2

    if (j==1 or j==2 or j==200 or j==2000 or j==20000):
#         print("Weight Matrix for Syn0 in second iteration")
#         print(syn0)
#         print("Weight Matrix for Syn1 in second iteration")
#         print(syn1)
        print("Output for the j iteration")
        print(l2)
#         print("l2_Error in the second iteration")
#         print(l2_error)
        print("Error:")
        print(str(np.mean(np.abs(l2_error))))
        
    # in what direction is the target value?
    # were we really sure? if so, don't change too much.
    l2_delta = l2_error*nonlin(l2,deriv=True)

    # how much did each l1 value contribute to the l2 error (according to the weights)?
    l1_error = l2_delta.dot(syn1.T)
    
    # in what direction is the target l1?
    # were we really sure? if so, don't change too much.
    l1_delta = l1_error * nonlin(l1,deriv=True)

    syn1 += l1.T.dot(l2_delta)
    syn0 += l0.T.dot(l1_delta)

print("Final Output at the 60,000th iteration")
print(l2)
print("Final Error")
print(str(np.mean(np.abs(l2_error))))

Initial Random Weight Matrix  Syn0
[[-0.16595599  0.44064899 -0.99977125 -0.39533485]
 [-0.70648822 -0.81532281 -0.62747958 -0.30887855]
 [-0.20646505  0.07763347 -0.16161097  0.370439  ]]
Initial Random Weight Matrix Syn1
Output for the j iteration
[[ 0.47070349]
 [ 0.48716227]
 [ 0.54111272]
 [ 0.54275466]]
Error:
0.496295788944
Output for the j iteration
[[ 0.46828579]
 [ 0.48584984]
 [ 0.53898089]
 [ 0.54126706]]
Error:
0.496180526782
Output for the j iteration
[[ 0.20672607]
 [ 0.63950689]
 [ 0.71254865]
 [ 0.58436571]]
Error:
0.359759062286
Output for the j iteration
[[ 0.01980049]
 [ 0.97453961]
 [ 0.97847519]
 [ 0.0297021 ]]
Error:
0.0241219498915
Output for the j iteration
[[ 0.00473102]
 [ 0.99397271]
 [ 0.9946546 ]
 [ 0.00705414]]
Error:
0.00578945986251
Final Output at the 60,000th iteration
[[ 0.00260572]
 [ 0.99672209]
 [ 0.99701711]
 [ 0.00386759]]
Final Error
0.00318353073559
