# Chapter 4: Your First Artificial Neural Networks

In Chapter 4, we focused on how to construct basic artificial intelligence applications,  starting with constructing a basic feed forward network with TensorFlow. Artificial neural networks allow us to define complex non-linear problems, and as we take a dive into the mechanics of true deep learning, you'll begin to see how powerful AI applications can be with deep learning at the core. 

## Feed Forward Network for MNIST

Start with our imports. MNIST is readily available for use within Tensorflow; it's like the "Hello World" of deep learning. 

In [3]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting /tmp/data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz


Next, we can set up out parameters

In [4]:
# Layer Sizes, Input Size, and the Size of the total number of classes
n_hidden_1 = 256 # 1st layer number of features
n_hidden_2 = 256 # 2nd layer number of features
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)


# Network Parameters
learning_rate = 0.001
training_epochs = 15
batch_size = 100
display_step = 1

Next, create the placeholder variables for our network

In [5]:
# Create the Placeholder Variables
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])

Initialize our weight and bias factors in the TensorFlow graph

In [6]:
weights = {
 'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])), #784x256
 'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])), #256x256
 'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes])) #256x10
}
biases = {
 'b1': tf.Variable(tf.random_normal([n_hidden_1])), #256x1
 'b2': tf.Variable(tf.random_normal([n_hidden_2])), #256x1
 'out': tf.Variable(tf.random_normal([n_classes])) #10x1
}


Next, let's actually create our basic, feed forward network. 

In [7]:
def feedforward_network(x, weights, biases):
    ## First layer; a hidden layer with RELU activation
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])    
    layer_1 = tf.nn.relu(layer_1)

    # Second layer; a hidden layer with RELU activation function
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']),biases['b2']) 
    layer_2 = tf.nn.relu(layer_2)


    # Output layer; utilizes a linear activation function
    outputLayer = tf.matmul(layer_2, weights['out']) + biases['out'] 
    
    ## Reutrn the Last Layer
    return outputLayer

Lastly before training, we'll define our loss and optimizer, and initialize all of the variables

In [None]:
# Construct model
pred = feedforward_network(x, weights, biases)

# Define the optimizer and the loss function for the network 
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)

# Initialize the Tensorflow Variables
init = tf.global_variables_initializer()

Launch the training process by utilizing a Tensorflow session

In [15]:
## Run the Traininng Process Using a Tensorflow Session
with tf.Session() as sess:
    sess.run(init)

    # We'll run the training cycle for the amount of epochs that we defined above
    for epoch in range(training_epochs):
        avg_loss = 0.  # Initialize the loss at zero
        total_batch = int(mnist.train.num_examples/batch_size)
        
        # Now, loop over all of the batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            
            # Here, we'll run the session by feeding in the optimizer, loss operation, and the batches of data
            _, loss = sess.run([train_op, loss_op], feed_dict={x: batch_x, y: batch_y})
            
            # Compute average loss
            avg_loss += loss / total_batch
            
        # Print out the loss at each step
        if epoch % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "loss={:.9f}".format(avg_loss))
            
        # Test the Model's Accuracy
        pred = tf.nn.softmax(pred)  
        correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
        
        # Calculate the accuracy
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
        print("Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))

Epoch: 0001 cost=195.176404755
Accuracy: 0.8573
Epoch: 0002 cost=41.329837662
Accuracy: 0.8954
Epoch: 0003 cost=26.825957718
Accuracy: 0.909
Epoch: 0004 cost=18.745037566
Accuracy: 0.9193
Epoch: 0005 cost=13.856974419
Accuracy: 0.9247
Epoch: 0006 cost=10.398377342
Accuracy: 0.9292
Epoch: 0007 cost=7.796720581
Accuracy: 0.9309
Epoch: 0008 cost=5.936711622
Accuracy: 0.934
Epoch: 0009 cost=4.320155211
Accuracy: 0.9352
Epoch: 0010 cost=3.367201917
Accuracy: 0.098
Epoch: 0011 cost=2.460348041
Accuracy: 0.098
Epoch: 0012 cost=1.858598225
Accuracy: 0.098
Epoch: 0013 cost=1.349503601
Accuracy: 0.098
Epoch: 0014 cost=1.084136149
Accuracy: 0.098
Epoch: 0015 cost=0.898160186
Accuracy: 0.098


## Defining Loss Functions from Scratch

In [None]:
## Cross Entropy Loss from Scratch
def CrossEntropy(yHat, y):
    if yHat == 1:
        return -log(y)
    else:
        return -log(1 - y)

## Defining Gradient Descsent from Scratch

In [None]:
## Vanilla Gradient Descent from Scratch
def gradientDescent(X, y, theta, alpha, num_iters):
    m = y.size  
    for i in range(num_iters):
        y_hat = np.dot(X, theta)
        theta = theta - alpha * (1.0/m) * np.dot(X.T, y_hat-y)
    return theta

In [None]:
## Stochastic Gradient Descent from Scratch
def SGD(f, theta0, alpha, num_iters):
    start_iter = 0
    theta= theta0
    for iter in xrange(start_iter + 1, num_iters + 1):
        _, grad = f(theta)
        theta = theta - (alpha * grad) 
    return theta

## Defining Activation Functions from Scratch

In [None]:
## Parametric ReLu
def parametric_relu(_x):
 alphas = tf.get_variable('alpha', _x.get_shape()[-1],
 initializer=tf.constant_initializer(0.0),
 dtype=tf.float32)
 pos = tf.nn.relu(_x)
 neg = alphas * (_x - abs(_x)) * 0.5
 return pos + neg