# Neural Network Demo

By Grant R. Vousden-Dishington

This notebook is heavily based on two blogs **A Neural Network in 11 lines of Python** ([Part 1](http://iamtrask.github.io/2015/07/12/basic-python-network/) & [Part 2](http://iamtrask.github.io/2015/07/27/python-network-part2/)) and **Hinton's Droptout in 3 lines of Python**, both by [*@iamtrask*](iamtrask.github.io)

In [1]:
import numpy as np

# Sigmoid function: "squashes" numbers into probabilities
def nonlin(x, deriv=False):
    x = x * (1-x) if deriv else 1 / (1 + np.exp(-x))
    return x

# Input: 4 training examples, 3-dimensional (4 x 3)
nnInput = np.array([
        [0, 0, 1],
        [0, 1, 1],
        [1, 0, 1],
        [1, 1, 1]
    ])

## 2-layer feed-forward network

In [2]:
# Output: expected classifications (4 x 1)
nnOutput2 = np.array([[0, 0, 1 ,1]]).T

# Initialize weights with mean of 0, seed random generator
np.random.seed(261015)
syn0 = 2 * np.random.random((3, 1)) - 1  # (3 x 1)

# Do full-batch training
for i in range(60000):
    
    # Forward propagation
    l0 = nnInput
    l1 = nonlin(l0 @ syn0)
    
    # Error: expected minus predicted (4 x 1)
    errl1 = nnOutput2 - l1
    
    # Calculate delta: the error times slope of the sigmoid at l1 values
    deltal1 = errl1 * nonlin(l1, deriv=True)
    
    # Update weights: matrix multiply the input times the delta values all at once
    syn0 += l0.T @ deltal1
    
print("Output after training")
print(l1)

Output after training
[[ 0.00390254]
 [ 0.00318154]
 [ 0.99740413]
 [ 0.99681544]]


## 3-layer back-propagating network

In [3]:
# Output: expected classifications (4 x 1)
nnOutput3 = np.array([[0, 1, 1 ,0]]).T

# Initialize weights with mean of 0, seed random generator
np.random.seed(261015)
syn0 = 2 * np.random.random((3, 4)) - 1  # (3 x 4)
syn1 = 2 * np.random.random((4, 1)) - 1  # (4 x 1)

# Do full-batch learning
for i in range(60000):
    
    # Forward Propagation
    l0 = nnInput
    l1 = nonlin(l0 @ syn0)
    l2 = nonlin(l1 @ syn1)
    
    # Output error: expected minus predicted (4 x 1)
    errl2 = nnOutput3 - l2
    
    # Getting a running view of the error every 10000 steps
    if not i % 10000:
        print("Error: " + str(np.mean(np.abs(errl2))))
        
    # Calculate delta for output layer: the error times slope of the sigmoid at l2 values (4 x 1)
    deltal2 = errl2 * nonlin(l2, deriv=True)
    
    # Hidden error: how much hidden layer contributed to output error (4 x 4)
    # THIS IS THE BACK-PROPAGATION STEP
    errl1 = deltal2 @ syn1.T
    
    # Calculate delta for hidden layer: the error times the sigmoid at l2 values (4 x 4)
    deltal1 = errl1 * nonlin(l1, deriv=True)
    
    # Update weights
    syn0 += l0.T @ deltal1
    syn1 += l1.T @ deltal2

Error: 0.500646983197
Error: 0.0108313939135
Error: 0.00721546010071
Error: 0.00571646037978
Error: 0.00485302997199
Error: 0.00427728962702


## 3-layer back-propagating network Mk. 2: Gradient Descent (alpha tuning)

In [14]:
# Output: expected classifications (4 x 1)
nnOutput3 = np.array([[0, 1, 1 ,0]]).T

alphas = [.001, .01, .1, 1, 10, 100, 1000]

# Try training with each different alpha-value
for a in alphas:
    # Initialize weights with mean of 0, seed random generator
    np.random.seed(271015)
    syn0 = 2 * np.random.random((3, 4)) - 1  # (3 x 4)
    syn1 = 2 * np.random.random((4, 1)) - 1  # (4 x 1)

    # This time, we track the previous weights for syn0 and syn1, as well as the direction of change
    syn0prev = np.zeros_like(syn0)
    syn1prev = np.zeros_like(syn1)
    syn0dir = np.zeros_like(syn0)
    syn1dir = np.zeros_like(syn1)
    
    print("Training with \u03B1 " + str(a))  # 03B1 is unicode for lowercase alpha
    
    # Do full-batch learning
    for i in range(60000):

        # Forward Propagation
        l0 = nnInput
        l1 = nonlin(l0 @ syn0)
        l2 = nonlin(l1 @ syn1)

        # Output error: expected minus predicted (4 x 1)
        errl2 = nnOutput3 - l2

        # Getting a running view of the error every 10000 steps
        if not i % 10000:
            print("Error: " + str(np.mean(np.abs(errl2))))

        # Calculate delta for output layer: the error times slope of the sigmoid at l2 values (4 x 1)
        deltal2 = errl2 * nonlin(l2, deriv=True)

        # Hidden error: how much hidden layer contributed to output error (4 x 4)
        # THIS IS THE BACK-PROPAGATION STEP
        errl1 = deltal2 @ syn1.T

        # Calculate delta for hidden layer: the error times the sigmoid at l2 values (4 x 4)
        deltal1 = errl1 * nonlin(l1, deriv=True)

        # Calculate the updates to the weights
        syn0update = l0.T @ deltal1
        syn1update = l1.T @ deltal2
        
        # For tracking, we see if the direction of the update has changed since the last step 
        syn0dir += (syn0update * syn0prev) < 0
        syn1dir += (syn1update * syn1prev) < 0
        
        # Update weights: this is where the alpha values come in
        syn0 += a * syn0update
        syn1 += a * syn1update
        
        syn0prev = syn0update
        syn1prev = syn1update
        
    # Print out the results for this alpha
    print("Syn0")
    print(syn0)
    print("Syn0 direction changes")
    print(syn0dir)

    print("Syn1")
    print(syn1)
    print("Syn1 direction changes")
    print(syn1dir)

Training with α 0.001
Error: 0.496410031903
Error: 0.495164025493
Error: 0.493596043188
Error: 0.491606358559
Error: 0.489100166544
Error: 0.485977857846
Syn0
[[-0.28448441  0.32471214 -1.53496167 -0.47594822]
 [-0.7550616  -1.04593014 -1.45446052 -0.32606771]
 [-0.2594825  -0.13487028 -0.29722666  0.40028038]]
Syn0 direction changes
[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 1.  0.  1.  1.]]
Syn1
[[-0.61957526]
 [ 0.76414675]
 [-1.49797046]
 [ 0.40734574]]
Syn1 direction changes
[[ 1.]
 [ 1.]
 [ 0.]
 [ 1.]]
Training with α 0.01
Error: 0.496410031903
Error: 0.457431074442
Error: 0.359097202563
Error: 0.239358137159
Error: 0.143070659013
Error: 0.0985964298089
Syn0
[[ 2.39225985  2.56885428 -5.38289334 -3.29231397]
 [-0.35379718 -4.6509363  -5.67005693 -1.74287864]
 [-0.15431323 -1.17147894  1.97979367  3.44633281]]
Syn0 direction changes
[[ 1.  1.  0.  0.]
 [ 2.  0.  0.  2.]
 [ 4.  2.  1.  1.]]
Syn1
[[-3.70045078]
 [ 4.57578637]
 [-7.63362462]
 [ 4.73787613]]
Syn1 direction changes
[[ 2.

## 3-layer back-propagating network Mk. 3: Parameterizing the hidden layer size

For this dataset, the previous excercise demonstrated that an alpha of 10 worked best, and the hidden layer size was fixed at 4 x 1. In the original iamtrask post, this step only tested the performance of a network with 32 nodes. Instead of testing just one size, we'll try several here and keep the alpha constant.

In [16]:
# Output: expected classifications (4 x 1)
nnOutput3 = np.array([[0, 1, 1 ,0]]).T

# Set alpha and the hidden layer sizes we want to test
a = 10
l1sizes = [2, 4, 8, 16, 32, 64]

# Try training with each different alpha-value
for s in l1sizes:
    # Initialize weights with mean of 0, seed random generator
    # This is where the different hidden layer sizes come in
    np.random.seed(271015)
    syn0 = 2 * np.random.random((3, s)) - 1  # (3 x s)
    syn1 = 2 * np.random.random((s, 1)) - 1  # (s x 1)
    
    print("Training with {} hidden layer nodes ".format(s))
    
    # Do full-batch learning
    for i in range(60000):

        # Forward Propagation
        l0 = nnInput
        l1 = nonlin(l0 @ syn0)
        l2 = nonlin(l1 @ syn1)

        # Output error: expected minus predicted (4 x 1)
        errl2 = nnOutput3 - l2

        # Getting a running view of the error every 10000 steps
        if not i % 10000:
            print("Error: " + str(np.mean(np.abs(errl2))))

        # Calculate delta for output layer: the error times slope of the sigmoid at l2 values (4 x 1)
        deltal2 = errl2 * nonlin(l2, deriv=True)

        # Hidden error: how much hidden layer contributed to output error (4 x 4)
        # THIS IS THE BACK-PROPAGATION STEP
        errl1 = deltal2 @ syn1.T

        # Calculate delta for hidden layer: the error times the sigmoid at l2 values (4 x 4)
        deltal1 = errl1 * nonlin(l1, deriv=True)

        # Calculate the updates to the weights
        syn0 += a * (l0.T @ deltal1)
        syn1 += a * (l1.T @ deltal2)

Training with 2 hidden layer nodes 
Error: 0.500542514772
Error: 0.375678086516
Error: 0.375611385993
Error: 0.375562977412
Error: 0.375527817871
Error: 0.375502700424
Training with 4 hidden layer nodes 
Error: 0.498898932827
Error: 0.00277424125556
Error: 0.00194289672755
Error: 0.00157880649911
Error: 0.00136305538713
Error: 0.00121639933639
Training with 8 hidden layer nodes 
Error: 0.498573920102
Error: 0.00247201806838
Error: 0.00170567040317
Error: 0.00137593796584
Error: 0.00118242179508
Error: 0.00105172558577
Training with 16 hidden layer nodes 
Error: 0.496319067934
Error: 0.00230574062297
Error: 0.00159283504815
Error: 0.00128457712047
Error: 0.0011032857294
Error: 0.000980712209009
Training with 32 hidden layer nodes 
Error: 0.493817543054
Error: 0.00332116672913
Error: 0.0018904703995
Error: 0.00144658328493
Error: 0.00121176944923
Error: 0.00106133852859
Training with 64 hidden layer nodes 
Error: 0.495697784403
Error: 0.499999999843
Error: 0.499999999842
Error: 0.4999999