# Overview

In the previous assignment you learned the mechanics of creating a basic, single-layer linear classifier. In this next notebook, we will walk through the steps needed to add additional layers.

Recall that in addition to simply adding more layers to the network, the most important new technique is the use of **non-linearities**. The use of non-linear functions prevents the collapse (or reduction) of a series of matrix operations into a single step ("reflex") loop. 

Additionally, as you begin to create and test more models, it will be important to create a system for organizing your code and maintainng changes. Here we will re-apply some of the major concepts in object-oriented programming to help facilitate keep your experiments on track.

## Restarting your Virtual Machine

If at any point during this assignment you accidentally execute code or do something that cannot seem to undo and need to "restart" the system (including deleting all temporary folders), go ahead and run the following single line of code. It will take about 1 minute to restart. Following this, you will have to proceed at the beginning of the assignment to re-downloaded the data and run the code you have written. Note, the code that you have already written will **not** be deleted; you simply need to start executing the code once again from the start.

In [0]:
!kill -9 -1 # Warning this restarts your machine

## Downloading the Data

The following commands can be used to copy over the assignment materials to your local Colaboratory instance and unzip in preparation for your assignment:

In [0]:
!git clone https://github.com/CAIDMRes/lecture_02
!unzip lecture_02/data.zip
!rm -r lecture_02
!ls

## Loading the Data

In [0]:
# Loading a pickle (*.pkl) file
import pickle
x = pickle.load(open('x.pkl', 'rb'))

# x is a NumPy array with (flattened) image data
print(type(x))
print(x.shape)

# y is a NumPy array with labels
y = pickle.load(open('y.pkl', 'rb'))
print(y.shape) 

# Non-linear Functions

The interposition of *any* non-linear function can serve as a basis for neural network design. Currently the most common non-linear function used in modern neural networks is known as the **ReLU** (rectified linear unit), defined as:

```
relu = max(0, x)
```
![relu](https://github.com/CAIDMRes/images/blob/master/relu.jpg?raw=true)

Due to some favorable features related to matrix calculus (e.g. the slope is constant) and ease of calculation / implementation, it is a quite widely accepted function to use.

Tensorflow as has a very simple implementation of the ReLU function:

In [0]:
import numpy as np
import tensorflow as tf

A = np.random.normal(loc=0, scale=1, size=(10))
a = tf.placeholder(tf.float32, [10])
b = tf.nn.relu(a)

sess = tf.InteractiveSession()
output = sess.run(b, feed_dict={a: A})

print(A)
print(output)

# Multilayer Network

Let's go ahead an incorporate the use of ReLU non-linearities in our first multiple layer (true) neural network!

In [0]:
n_hidden_layer_1 = 128 

def create_model():
    
    # Reset our graph to build a new one
    tf.reset_default_graph()

    # ------------------------------------------------------------------------
    # Define placeholders for our images and labels
    # ------------------------------------------------------------------------
    #
    # 1. Define images
    # 
    #  - type: float32
    #  - size: [None, 784] so that we can feed in as many images as we need
    # 
    # 2. Define labels
    # 
    #  - type: int64
    #  - size: [None] so that we can feed in as many labels as we need
    # 
    # ------------------------------------------------------------------------

    im = tf.placeholder(?)
    labels = tf.placeholder(?)

    # ------------------------------------------------------------------------
    # Define our weights
    # ------------------------------------------------------------------------
    # 
    # Let us use a new variable initializer, the tf.get_variables(...) method.
    # In this method call, we simply need to provide the function with the 
    # variable name, size, and type. Tensorflow will subsequently use this 
    # information to generate a reasonable starting distribution.
    # 
    # Keep in mind that a different set of weights (connections) must be 
    # created for each separate layer in our graph. Make sure that the matrix
    # dimensions match up.
    # 
    # ------------------------------------------------------------------------

    w1 = tf.get_variable('w1', ?, dtype=tf.float32)
    w2 = tf.get_variable('w2', ?, dtype=tf.float32)

    # ------------------------------------------------------------------------
    # Define our matmul operations
    # ------------------------------------------------------------------------

    h1 = tf.nn.relu(tf.matmul(?))
    logits = tf.matmul(?)

    # ------------------------------------------------------------------------
    # Define our softmax cross-entropy loss
    # ------------------------------------------------------------------------
    # 
    # HINT: use tf.losses.sparse_softmax_cross_entropy() as above
    #
    # ------------------------------------------------------------------------

    loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)
    
    # ------------------------------------------------------------------------
    # Define our optimizer
    # ------------------------------------------------------------------------
    # 
    # An optimizer is a special TensorFlow class that takes your model weights and 
    # adjusts them ever so slightly so that they will make a better prediction the
    # next time around. They are implemented with a technique known as 
    # backpropogation which we will learn about in further detail during later 
    # lectures. For now, just know that this is what we are using here.
    #
    # ------------------------------------------------------------------------

    train_op = tf.train.AdamOptimizer(0.001).minimize(loss)
    
    return im, labels, [w1, w2], logits, loss, train_op

# ------------------------------------------------------------------------
# Test our model
# ------------------------------------------------------------------------
# 
# If the graph was defined properly, we should be able to check the out
# what the model outputs should look like. Can you guess by the shapes
# of our logits and losses will be?
# 
# ------------------------------------------------------------------------

im, labels, weights, logits, loss, train_op = create_model()
print(logits.shape)
print(loss.shape)

In [0]:
# ------------------------------------------------------------------------
# Create our model
# ------------------------------------------------------------------------

im, labels, weights, logits, loss, train_op = create_model()

# ------------------------------------------------------------------------
# Add to collections
# ------------------------------------------------------------------------
# 
# Collections are used by TensorFlow to keep track of certain intermediate 
# values for quick access during save/load functions.
# 
# ------------------------------------------------------------------------

tf.add_to_collection('im', im)
tf.add_to_collection('logits', logits)

# ------------------------------------------------------------------------
# Initialize our test graph
# ------------------------------------------------------------------------
# 
# What two things do we need to initialize our graph?
# 
# ------------------------------------------------------------------------

sess = ?
tf.?

# ------------------------------------------------------------------------
# Initialize our test graph
# ------------------------------------------------------------------------
# 
# Initialize a Saver object
# 
# ------------------------------------------------------------------------

saver = tf.train.Saver()

In [0]:
# ------------------------------------------------------------------------
# Train our algorithm 
# ------------------------------------------------------------------------
# 
# Let's set up a loop to train our algorithm by feeding it data iteratively.
# For each iteration, we will feed a batch_size number of images into our 
# model and let it readjust it's neuronal weights.
# 
# ------------------------------------------------------------------------

def train_model(iterations=2000, batch_size=256):
    
    accuracies = []
    losses = []

    for i in range(iterations):

        # --------------------------------------------------------------------
        # Grab a total of batch_size number of random images and labels 
        # --------------------------------------------------------------------
        # 
        # 1. Pick batch_size number of random indices between 0 and 60,000
        # 2. Select those images / labels
        #
        # --------------------------------------------------------------------

        rand_indices = np.random.randint(?)
        x_batch = x[?]
        y_batch = y[?]

        # --------------------------------------------------------------------
        # Normalize x_batch
        # --------------------------------------------------------------------
        # 
        # Currently, values in x range from 0 to 255. If we normalize these values
        # to a mean of 0 and SD of 1 we will improve the stability of training
        # and furthermore improve interpretation of learned weights. Use the
        # following code to normalize your batch:
        # 
        # --------------------------------------------------------------------

        x_batch = (x_batch - np.mean(x_batch)) / np.std(x_batch)

        # Convert to types matching our defined placeholders
        x_batch = x_batch.astype(?)
        y_batch = y_batch.astype(?)

        # Prepare feed_dict
        feed_dict = {?}

        # --------------------------------------------------------------------
        # Run training iteration via sess.run()
        # --------------------------------------------------------------------
        # 
        # Here, in addition to whichever ouputs we wish to extract, we need to
        # also include the train_op variable. Including train_op will tell 
        # Tensorflow that in addition to calculating the intermediates of our graph,
        # we also need to readjust the variables so that the overall loss goes
        # down.
        # 
        # --------------------------------------------------------------------

        outputs = sess.run([logits, loss, train_op], feed_dict)

        # --------------------------------------------------------------------
        # Use argmax to determine highest logit (model guess)
        # --------------------------------------------------------------------
        # 
        # Keep in mind our logits matrix is (batch_size x 10) in size representing
        # a total of batch_size number of predictions. How do we process this matrix
        # with the np.argmax() to find the highest logit along each row of the matrix
        # (e.g. find the prediction for each of our images)?
        # 
        # HINT: what does the axis parameter in np.argmax(a, axis) specify?
        # 
        # --------------------------------------------------------------------

        predictions = np.argmax(?)

        # --------------------------------------------------------------------
        # Calculate accuracy 
        # --------------------------------------------------------------------
        # 
        # Consider the following:
        # 
        # - predictions = the predicted digits
        # - y_batch = the ground-truth digits
        # 
        # How do I calculate an accuracy % with this data?
        # 
        # --------------------------------------------------------------------

        accuracy = np.sum(?) / ?

        # --------------------------------------------------------------------
        # Accumulate and print iteration, loss and accuracy 
        # --------------------------------------------------------------------

        print('Iteration %05i | Loss = %07.3f | Accuracy = %0.4f' %
            (i + 1, outputs[1], accuracy))

        losses.append(outputs[1])
        accuracies.append(accuracy)
        
    # --------------------------------------------------------------------
    # Graph outputs and accuracy
    # --------------------------------------------------------------------

    import pylab
    pylab.plot(losses)
    pylab.title('Model loss over time')
    pylab.show()

    pylab.plot(accuracies)
    pylab.title('Model accuracy over time')

    return losses, accuracies

In [0]:
# --------------------------------------------------------------------
# Train model
# --------------------------------------------------------------------
losses, accuracies = train_model(iterations=2000, batch_size=256)

# --------------------------------------------------------------------
# Save model
# --------------------------------------------------------------------
# 
# In this step, all model variables and the underlying graph structure
# are saved so that they can be reloaded. Although it looks like just one
# file is saved here, in fact both a *.cpkt and *.cpkt.meta file are both
# saved in this single line of code.
#  
# --------------------------------------------------------------------

import os
model_file = './model_128/model.ckpt'
os.makedirs(os.path.dirname(model_file), exist_ok=True)
print('Saving model')
saver.save(sess, model_file)

Congratulations! If that went well, you have now trained your first single-layer neural network. The accuracy seems quite impressive, much better than the linear classifier... have we solved this problem yet? Well let's go ahead and perform some checks to see just how good we're doing.

## Model weights

Compared to the average filter linear classifier, where the weights have a very obvious and intuitive meaning, these model filters are much more complex. What the algorithm has learned is to identify 128 different filters, which when used in combination with each other, can predict the likelihood of a digit. Therefore, each one of these filters captures a small "component" of any given digit. Let's take a look

In [0]:
# --------------------------------------------------------------------
# Extract model weights 
# --------------------------------------------------------------------
# 
# What line of code do we need here to extract the model weights?
# 
# HINT: what do we pass to sess.run()?
#
# --------------------------------------------------------------------

W = sess.run(?)

# Visualize
import pylab
fig = pylab.figure()
for i in range(24):
    fig.add_subplot(4, 6, i + 1)
    pylab.axis('off')
    pylab.imshow(W[..., i].reshape(28, 28))
    
pylab.show()

# Running inference

Now that we have a trained model, let's go ahead and see how it performs! We will use the same procedure as before to load up a trained network and then feed in random digits to see how it fares.

In [0]:
# Load the saved model
tf.reset_default_graph()
sess = tf.InteractiveSession()
saver = tf.train.import_meta_graph('./model_128/model.ckpt.meta')
saver.restore(sess, './model_128/model.ckpt')

# Find our placeholders
im = tf.get_collection('im')[0]
logits = tf.get_collection('logits')[0]

# Find a random test image
i = int(?)
image = x[i].reshape(1, 784)
label = y[i]

# Normalize the image
image = (image - np.mean(image)) / np.std(image)

# Create a feed_dict
feed_dict = {?}

# Pass data through the network
l = sess.run(logits, ?)

# Convert logits to predictions
prediction = np.argmax(?)

# Visualize
pylab.imshow(image.reshape(28, 28))
pylab.axis('off')
pylab.title('My prediction is %i' % prediction)
pylab.show()

In [0]:
!ls ./model_128

# Model validation

As we learned about in lecture #4, a model with *too much* learning capacity can potentially memorize the dataset without learning anything too useful. The fact that we trained to an accuracy so close to 100% means that we should be wary that the algorithm may be at least in part memorizing portions of the dataset. To test for this phenomenon we need to evaluate the model on new data that the algorithm has never seen before. Let's go ahead download this new data now:

In [0]:
!git clone https://github.com/CAIDMRes/lecture_03
!unzip lecture_03/data.zip
!rm -r lecture_03 
!ls

## Loading the data

The two new files we downloaded are `x_test.npy` and `y_test.npy` corresponding to our test set data and labels.  The format is identical to before. We have a total of 10,000 examples to test. Let us load them now:

In [0]:
import numpy as np
x_test = np.load('x_test.npy',allow_pickle=True)
y_test = np.load('y_test.npy',allow_pickle=True)

print(x_test.shape)
print(y_test.shape)

## Validating

Using the template code to run inference shown above, let us now write code to:

* load our saved model 
* create a `feed_dict` with new test data 
* pass through network using `sess.run()`
* convert the `logits` to predictions
* calculate overall network accuracy

In [0]:
def validate_model(model_file):
    """
    Method to test the validation performance of a model using the 
    test set data.
    
    :params
    
      (str) model_file : name of model file saved by saver object
      
    """
    # Load saved model
    tf.reset_default_graph()
    sess = tf.InteractiveSession()
    saver = tf.train.import_meta_graph('%s.meta' % model_file)
    saver.restore(sess, model_file)

    # Find our placeholders
    im = tf.get_collection('im')[0]
    logits = tf.get_collection('logits')[0]

    # Normalize our input data x_test
    input_data = (x_test - np.mean(x_test, axis=1, keepdims=True)) / \
        np.std(x_test, axis=1, keepdims=True)
    
    # -------------------------------------------------------
    # Create a feed_dict
    # -------------------------------------------------------
    # 
    # HINT: What type of data can we possibly feed into our network?
    # What happens if we feed in more than one image at a time?
    # Think about the rules for matrix multiplication.
    # 
    # -------------------------------------------------------

    feed_dict = {?}

    # Pass data through the network using sess.run() to get our logits 
    output = sess.run(logits, ?)

    # Convert logits to predictions
    predictions = np.argmax(?)

    # Compare predictions to ground-truth to find accuracy
    accuracy = np.sum(?) / ?
    
    print('Network test-set accuracy: %0.4f' % accuracy)
    
# Pass our model_file
model_file = './model_128/model.ckpt' 
validate_model(model_file)

## Notes

How did the algorithm perform? Were you surprised, not surprised? Regardless it's still quite a bit better than our linear classier which maxed out around 90-92%. In the remainder of this tutorial, let's test a handful of different architectures.

# Exercises

For the following exercises, we will evaluate a number of different variations in network architecture. Generally, the steps will include:

* writing a new model in the `create_model()` method
* initial training variables
* use the `train_model()` method defined above to run training (repeated as many times needed to converge)
* save model
* validate model on test set data

The goal is to get a sense of which combinations work better than others. Keep in mind we are already at high 97%+ accuracy, so we're not expecting any dramatic changes, but the process fine-tuning a neural network is an extremely valuable experience to gain first-hand.

## Exercise 1

Re-train several neural networks this time with either more (196, 256) or less (96, 64, 32) nodes. What do you expect to happen to your algorithm accuracy? 

In [0]:
n_hidden_layer_1 = 256

def create_model():
    
    # Reset our graph to build a new one
    tf.reset_default_graph()

    # ------------------------------------------------------------------------
    # Define placeholders for our images and labels
    # ------------------------------------------------------------------------
    im = tf.placeholder(tf.float32, [None, 784])
    labels = tf.placeholder(tf.int64,[None])

    # ------------------------------------------------------------------------
    # Define our weights
    # ------------------------------------------------------------------------
    w1 = tf.get_variable('w1', [784, ?], dtype=tf.float32)
    w2 = tf.get_variable('w2', [?, 10], dtype=tf.float32)

    # ------------------------------------------------------------------------
    # Define our matmul operations
    # ------------------------------------------------------------------------
    h1 = tf.nn.relu(tf.matmul(?))
    logits = tf.matmul(?)

    # ------------------------------------------------------------------------
    # Define our softmax cross-entropy loss
    # ------------------------------------------------------------------------
    loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)

    # ------------------------------------------------------------------------
    # Define our optimizer
    # ------------------------------------------------------------------------
    train_op = tf.train.AdamOptimizer(0.001).minimize(loss)
    
    return im, labels, [w1, w2], logits, loss, train_op

# ------------------------------------------------------------------------
# Create model 
# ------------------------------------------------------------------------
im, labels, weights, logits, loss, train_op = create_model()

# ------------------------------------------------------------------------
# Add to collections, initialize graph and saver
# ------------------------------------------------------------------------
tf.add_to_collection('im', im)
tf.add_to_collection('logits', logits)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
saver = tf.train.Saver()

In [0]:
# ------------------------------------------------------------------------
# Train model 
# ------------------------------------------------------------------------
losses, accuracies = train_model(iterations=2000, batch_size=128)

In [0]:
# ------------------------------------------------------------------------
# Save your model
# ------------------------------------------------------------------------
# 
# HINT: use a meaningful naming convention so you remember which model is 
# what during validation
# 
# ------------------------------------------------------------------------

import os
model_file = './model_128/model.test'
os.makedirs(os.path.dirname(model_file), exist_ok=True)
print('Saving model')
saver.save(sess, model_file)

In [0]:
# ------------------------------------------------------------------------
# Validate model 
# ------------------------------------------------------------------------
validate_model(model_file)

## Exercise 2

Re-train several new networks, however this time use **two** hidden layers (instead of one). Try experimenting with large number intermediate nodes (e.g. 128, 128) or low number intermediate nodes (32, 32), or some combination of both. Were you able to get a better result than the single hidden layer model. Why or why not? 

In [0]:
import tensorflow as tf
import numpy as np

n_hidden_layer_1 = 128
n_hidden_layer_2 = 64
n_hidden_layer_3 = 32
n_hidden_layer_4 = 16

def create_model():
    
    # Reset our graph to build a new one
    tf.reset_default_graph()

    # ------------------------------------------------------------------------
    # Define placeholders for our images and labels
    # ------------------------------------------------------------------------
    im = tf.placeholder(tf.float32,[None, 784])
    labels = tf.placeholder(tf.int64,[None])

    # ------------------------------------------------------------------------
    # Define our weights
    # ------------------------------------------------------------------------
    w1 = tf.get_variable('w1', [784, ?],dtype=tf.float32)
    w2 = tf.get_variable('w2', [?, ?], dtype=tf.float32)
    w3 = tf.get_variable('w3', [?, ?], dtype=tf.float32)
    w4 = tf.get_variable('w4', [?, ?], dtype=tf.float32)
    w5 = tf.get_variable('w5', [?, 10], dtype=tf.float32)

    # ------------------------------------------------------------------------
    # Define our matmul operations
    # ------------------------------------------------------------------------
    h1 = tf.nn.relu(tf.matmul(?))
    h2 = tf.nn.relu(tf.matmul(?))
    h3 = tf.nn.relu(tf.matmul(?))
    h4 = tf.nn.relu(tf.matmul(?))
    logits = tf.matmul(?)

    # ------------------------------------------------------------------------
    # Define our softmax cross-entropy loss
    # ------------------------------------------------------------------------
    loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)

    # ------------------------------------------------------------------------
    # Define our optimizer
    # ------------------------------------------------------------------------
    train_op = tf.train.AdamOptimizer(0.001).minimize(loss)
    
    return im, labels, [w1, w2, w3], logits, loss, train_op

# ------------------------------------------------------------------------
# Create model 
# ------------------------------------------------------------------------
im, labels, weights, logits, loss, train_op = create_model()

# ------------------------------------------------------------------------
# Add to collections, initialize graph and saver
# ------------------------------------------------------------------------
tf.add_to_collection('im', im)
tf.add_to_collection('logits', logits)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
saver = tf.train.Saver()

In [0]:
# ------------------------------------------------------------------------
# Train model 
# ------------------------------------------------------------------------
losses, accuracies = train_model(iterations=2000, batch_size=128)

In [0]:
# ------------------------------------------------------------------------
# Save your model
# ------------------------------------------------------------------------
# 
# HINT: use a meaningful naming convention so you remember which model is 
# what during validation
# 
# ------------------------------------------------------------------------

import os
model_file = './model_64/model.cpkt'
os.makedirs(model_file, exist_ok=True)
print('Saving model')
saver.save(sess, model_file)

In [0]:
# ------------------------------------------------------------------------
# Validate model 
# ------------------------------------------------------------------------
validate_model(model_file)