<h1>Phoenix AI & Machine Learning Group</h1>
<h2>Let's get Neural: Solving the MNIST Dataset (Python) Part 1/3</h2>

<h4>Community Support</h4>

As always, a generous shoutout to the community, without these contributions, my understanding of Neural Networks would have taken much longer! <br>

Most code directly copied from @Trask: http://iamtrask.github.io/2015/07/12/basic-python-network/

<i>Other Works Cited: </i>
- http://cs231n.github.io/neural-networks-1/
- http://www.cse.unsw.edu.au/~billw/mldict.html
- https://en.wikipedia.org/wiki/Backpropagation
- https://www.quora.com/What-is-the-role-of-the-activation-function-in-a-neural-network
- http://www.wildml.com/2015/09/implementing-a-neural-network-from-scratch/
- http://squall0032.tumblr.com/post/77300791096/plotting-a-sigmoid-function-using
- http://stevenmiller888.github.io/mind-how-to-build-a-neural-network/
- http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html

<h2>3 Layer - XOR Neural Network</h2>

|Inputs	|Output|
|-------|------|
|0	0	1	|1|
|0	1	1	|0|
|1	0	1	|0|
|1	1	1	|1|

In [11]:
import numpy as np

# sigmoid function
def nonlin(x,deriv=False):
    if(deriv==True):
        return x*(1-x)
    return 1/(1+np.exp(-x))

# input dataset
X = np.array([  [0,0,1],
                [0,1,1],
                [1,0,1],
                [1,1,1] ])

# output dataset
y = np.array([[1,
               0,
               0,
               1]]).T

# seed random numbers to make calculation
# deterministic (just a good practice)
np.random.seed(1)

# initialize weights randomly with mean 0
syn0 = 2*np.random.random((3,1)) - 1

for iter in xrange(10000):

    # forward propagation
    l0 = X
    l1 = nonlin(np.dot(l0,syn0))

    # how much did we miss?
    l1_error = y - l1

    # multiply how much we missed by the
    # slope of the sigmoid at the values in l1
    l1_delta = l1_error * nonlin(l1,True)

    # update weights
    syn0 += np.dot(l0.T,l1_delta)

print ("Input Layers: ")
print X
print
print ("Expected Output: ")
print y
print
print "Output After Training:"
print l1

Input Layers: 
[[0 0 1]
 [0 1 1]
 [1 0 1]
 [1 1 1]]

Expected Output: 
[[1]
 [0]
 [0]
 [1]]

Output After Training:
[[ 0.5]
 [ 0.5]
 [ 0.5]
 [ 0.5]]


In [12]:
# So what happened?
# Our Neural Network is simply unable to predict what the values should be!
# Remember our "Error Weighted Derivative" ?
# It isn't robust enough.

<div id="pic1" style="float:left;width:50%"><img class="img-responsive" width="100%" src="https://iamtrask.github.io/img/rcnn.png" alt=""><br></div><div id="pic2" style="float:right;width:50%;"><img class="img-responsive" width="100%" src="https://iamtrask.github.io/img/margritti-this-is-not-a-pipe.jpg" alt=""><br></div>

In [13]:
# Believe it or not, image recognition is a similar problem. 
# If one had 100 identically sized images of pipes and bicycles,
# no individual pixel position would directly correlate with the presence of a bicycle or pipe. 
# The pixels might as well be random from a purely statistical point of view. 
# However, certain combinations of pixels are not random!

In [14]:
import numpy as np

def nonlin(x,deriv=False):
	if(deriv==True):
	    return x*(1-x)

	return 1/(1+np.exp(-x))
    
X = np.array([[0,0,1],
             [0,1,1],
             [1,0,1],
             [1,1,1]])
                
y = np.array([[1],
              [0],
              [0],
              [1]])

np.random.seed(1)

# randomly initialize our weights with mean 0
syn0 = 2*np.random.random((3,4)) - 1
syn1 = 2*np.random.random((4,1)) - 1

for j in xrange(60000):

    # Feed forward through layers 0, 1, and 2
    l0 = X
    l1 = nonlin(np.dot(l0,syn0))
    l2 = nonlin(np.dot(l1,syn1))

    # how much did we miss the target value?
    l2_error = y - l2
    
    if (j% 10000) == 0:
        print "Error:" + str(np.mean(np.abs(l2_error)))
        
    # in what direction is the target value?
    # were we really sure? if so, don't change too much.
    l2_delta = l2_error*nonlin(l2,deriv=True)

    # how much did each l1 value contribute to the l2 error (according to the weights)?
    l1_error = l2_delta.dot(syn1.T)
    
    # in what direction is the target l1?
    # were we really sure? if so, don't change too much.
    l1_delta = l1_error * nonlin(l1,deriv=True)

    syn1 += l1.T.dot(l2_delta)
    syn0 += l0.T.dot(l1_delta)

Error:0.503589968097
Error:0.010347374888
Error:0.00680775185908
Error:0.00535111040195
Error:0.00452404113138
Error:0.00397933174579


In [15]:
print l2

[[ 0.99733349]
 [ 0.00409725]
 [ 0.00390993]
 [ 0.99632136]]


In [None]:
# Succes! Simply by adding additional layers to out network, 
# we were able to successfully train our network!