# Simple FC Model


We are going to build an FC model from scratch (no layer API/no Keras)and train it on the MNIST classification task.

All of the concepts in the previous notebook will be put to use together here. This should get us used to building a full model, the different operations involved in a tensorflow model and the work flow involved in development. This is adapted from:

[Magnus Erik Hvass Pedersen](http://www.hvass-labs.org/)
/ [GitHub](https://github.com/Hvass-Labs/TensorFlow-Tutorials) / [Videos on YouTube](https://www.youtube.com/playlist?list=PL9Hr9sNUjfsmEu1ZniY0XpHSzl5uihcXZ)


## Imports

In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
from sklearn.metrics import confusion_matrix
import utils

## Load Data

First we load the MNIST dataset into memory in preparation for modeling.

In [None]:
import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

num_train = x_train.shape[0]
num_test = x_test.shape[0]

print("Size of:")
print("- Training-set:\t\t{}".format(num_train))
print("- Test-set:\t\t{}".format(num_test))

The MNIST data-set has now been loaded and consists of 70000 images and class-numbers for the images.

Copy some of the data-dimensions for convenience.

In [None]:
# The images are stored in one-dimensional arrays of this length.
img_size_flat = x_train[0].flatten().shape[0]

# Tuple with height and width of images used to reshape arrays.
img_shape = x_train[0].shape
H,W = img_shape

# Number of classes, one class for each of 10 digits.
num_classes = len(np.unique(y_train))

print("Flattened Image Size = \t\t{}".format(img_size_flat))
print("Image Shape = \t\t\t{}".format(img_shape))
print("Number of label classes = \t{}".format(num_classes))

### Plot a few images to see if data is correct

In [None]:
# Get the first images from the test-set.
images = x_test[0:9]

# Get the true classes for those images.
cls_true = y_test[0:9]

# Plot the images and labels using our helper-function above.
utils.plot_images(images=images, cls_true=cls_true)

### Placeholder variables

Placeholder variables serve as the input to the graph that we may change each time we execute the graph. We call this feeding the placeholder variables and it is demonstrated further below.

First we define the placeholder variable for the input images. This allows us to change the images that are input to the TensorFlow graph. This is a so-called tensor, which just means that it is a multi-dimensional vector or matrix. The data-type is set to `float32` and the shape is set to `[None, img_size_flat]`, where `None` means that the tensor may hold an arbitrary number of images with each image being a vector of length `img_size_flat`.

In [None]:
x = tf.placeholder(tf.float32, [None, H,W])

Finally we have the placeholder variable for the true class of each image in the placeholder variable `x`. These are integers and the dimensionality of this placeholder variable is set to `[None]` which means the placeholder variable is a one-dimensional vector of arbitrary length.

In [None]:
y_true_cls = tf.placeholder(tf.int64, [None])

### Variables to be optimized

Apart from the placeholder variables that were defined above and which serve as feeding input data into the model, there are also some model variables that must be changed by TensorFlow so as to make the model perform better on the training data.

The first variable that must be optimized is called `weights` and is defined here as a TensorFlow variable that must be initialized with zeros and whose shape is `[img_size_flat, num_classes]`, so it is a 2-dimensional tensor (or matrix) with `img_size_flat` rows and `num_classes` columns.

In [None]:
weights = tf.Variable(tf.zeros([img_size_flat, num_classes]))

The second variable that must be optimized is called `biases` and is defined as a 1-dimensional tensor (or vector) of length `num_classes`.

In [None]:
biases = tf.Variable(tf.zeros([num_classes]))

### Model

This simple mathematical model multiplies the images in the placeholder variable `x` with the `weights` and then adds the `biases`.

__Techinical Note__: If we are defining a tensors shape in reference to another tensor we the __None__ variable can inforce the same space across multiple tensors by passing the shape as a tensor and using the 'tf.shape' operation to extract the shape.

In [None]:
batch_size_tensor = tf.shape(x)[0]


The result is a matrix of shape `[num_images, num_classes]` because `x` has shape `[num_images, img_size_flat]` and `weights` has shape `[img_size_flat, num_classes]`, so the multiplication of those two matrices is a matrix with shape `[num_images, num_classes]` and then the `biases` vector is added to each row of that matrix.

Note that the name `logits` is typical TensorFlow terminology, but other people may call the variable something else.

In [None]:
x_flat = tf.reshape(x,[batch_size_tensor,img_size_flat])

Now `logits` is a matrix with `num_images` rows and `num_classes` columns, where the element of the $i$'th row and $j$'th column is an estimate of how likely the $i$'th input image is to be of the $j$'th class.

In [None]:
logits = tf.matmul(x_flat, weights) + biases

However, these estimates are a bit rough and difficult to interpret because the numbers may be very small or large, so we want to normalize them so that each row of the `logits` matrix sums to one, and each element is limited between zero and one. This is calculated using the so-called softmax function and the result is stored in `y_pred`.

In [None]:
y_pred = tf.nn.softmax(logits)

The predicted class can be calculated from the `y_pred` matrix by taking the index of the largest element in each row.

In [None]:
y_pred_cls = tf.argmax(y_pred, axis=1)

### Cost-function to be optimized

To make the model better at classifying the input images, we must somehow change the variables for `weights` and `biases`. To do this we first need to know how well the model currently performs by comparing the predicted output of the model `y_pred` to the desired output `y_true`.

The cross-entropy is a performance measure used in classification. The cross-entropy is a continuous function that is always positive and if the predicted output of the model exactly matches the desired output then the cross-entropy equals zero. The goal of optimization is therefore to minimize the cross-entropy so it gets as close to zero as possible by changing the `weights` and `biases` of the model. To perform this calculation we need a binary label for each available class. This type of representation is often refered to as one hot representation.


In [None]:
y_true_onehot = tf.one_hot(y_true_cls,depth = num_classes)


TensorFlow has a built-in function for calculating the cross-entropy. Note that it uses the values of the `logits` because it also calculates the softmax internally.

In [None]:
y_true = tf.one_hot(y_true_cls,depth = num_classes)

cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits,
                                                           labels=y_true)

We have now calculated the cross-entropy for each of the image classifications so we have a measure of how well the model performs on each image individually. But in order to use the cross-entropy to guide the optimization of the model's variables we need a single scalar value, so we simply take the average of the cross-entropy for all the image classifications.

In [None]:
cost = tf.reduce_mean(cross_entropy)

### Optimization method

Now that we have a cost measure that must be minimized, we can then create an optimizer. In this case it is the basic form of Gradient Descent where the step-size is set to 0.5.

Note that optimization is not performed at this point. In fact, nothing is calculated at all, we just add the optimizer-object to the TensorFlow graph for later execution.

In [None]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.5).minimize(cost)

### Performance measures

We need a few more performance measures to display the progress to the user.

This is a vector of booleans whether the predicted class equals the true class of each image.

In [None]:
correct_prediction = tf.equal(y_pred_cls, y_true_cls)

This calculates the classification accuracy by first type-casting the vector of booleans to floats, so that False becomes 0 and True becomes 1, and then calculating the average of these numbers.

In [None]:
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

## TensorFlow Run

### Create TensorFlow session

Once the TensorFlow graph has been created, we have to create a TensorFlow session which is used to execute the graph.

In [None]:
session = tf.Session()

### Initialize variables

The variables for `weights` and `biases` must be initialized before we start optimizing them.

In [None]:
session.run(tf.global_variables_initializer())

### Helper-function to perform optimization iterations

There are 55.000 images in the training-set. It takes a long time to calculate the gradient of the model using all these images. We therefore use Stochastic Gradient Descent which only uses a small batch of images in each iteration of the optimizer.

In [None]:
batch_size = 100

Function for performing a number of optimization iterations so as to gradually improve the `weights` and `biases` of the model. In each iteration, a new batch of data is selected from the training-set and then TensorFlow executes the optimizer using those training samples.

In [None]:
def optimize(num_iterations):
    for i in range(num_iterations):
        # Get a batch of training examples.
        # x_batch now holds a batch of images and
        # y_true_batch are the true labels for those images.
        x_batch, y_true_batch, _ = utils.batch_data(x_train,y_train,batch_size=batch_size)
        
        # Put the batch into a dict with the proper names
        # for placeholder variables in the TensorFlow graph.
        # Note that the placeholder for y_true_cls is not set
        # because it is not used during training.
        feed_dict_train = {x: x_batch,
                           y_true_cls: y_true_batch}

        # Run the optimizer using this batch of training data.
        # TensorFlow assigns the variables in feed_dict_train
        # to the placeholder variables and then runs the optimizer.
        session.run(optimizer, feed_dict=feed_dict_train)
        
    

## Performance before any optimization

The accuracy on the test-set is 9.8%. This is because the model has only been initialized and not optimized at all, so it always predicts that the image shows a zero digit, as demonstrated in the plot below, and it turns out that 9.8% of the images in the test-set happens to be zero digits.

In [None]:
### define the test feed dictionary
feed_dict_test = {x: x_test,
                  y_true_cls: y_test}

### print out the current accuracy, a sample of inputs and the current weights of the model. 
utils.print_accuracy(session,accuracy,feed_dict_test)
utils.plot_example_errors(x_test,y_test,session,[correct_prediction,y_pred_cls],feed_dict_test)
utils.plot_weights(session,weights)

## Performance after 1 optimization iteration

Already after a single optimization iteration, the model has increased its accuracy on the test-set significantly.

In [None]:
optimize(num_iterations=1)

In [None]:
utils.print_accuracy(session,accuracy,feed_dict_test)
utils.plot_example_errors(x_test,y_test,session,[correct_prediction,y_pred_cls],feed_dict_test)
utils.plot_weights(session,weights)

## Performance after 10 optimization iterations

In [None]:
# We have already performed 1 iteration.
optimize(num_iterations=9)

In [None]:
utils.print_accuracy(session,accuracy,feed_dict_test)
utils.plot_example_errors(x_test,y_test,session,[correct_prediction,y_pred_cls],feed_dict_test)
utils.plot_weights(session,weights)

## Performance after 1000 optimization iterations

After 1000 optimization iterations, the model only mis-classifies about one in ten images. As demonstrated below, some of the mis-classifications are justified because the images are very hard to determine with certainty even for humans, while others are quite obvious and should have been classified correctly by a good model. But this simple model cannot reach much better performance and more complex models are therefore needed.

In [None]:
# We have already performed 10 iterations.
optimize(num_iterations=990)

In [None]:
utils.print_accuracy(session,accuracy,feed_dict_test)
utils.plot_example_errors(x_test,y_test,session,[correct_prediction,y_pred_cls],feed_dict_test)
utils.plot_weights(session,weights)

The model has now been trained for 1000 optimization iterations, with each iteration using 100 images from the training-set. Because of the great variety of the images, the weights have now become difficult to interpret and we may doubt whether the model truly understands how digits are composed from lines, or whether the model has just memorized many different variations of pixels.

We can also print and plot the so-called confusion matrix which lets us see more details about the mis-classifications. For example, it shows that images actually depicting a 5 have sometimes been mis-classified as all other possible digits, but mostly as 6 or 8.

In [None]:
utils.print_confusion_matrix(session,y_test,y_pred_cls,feed_dict_test)

We are now done using TensorFlow, so we close the session to release its resources.

In [None]:
session.close()

# Model In one script


We can now compress this model into a single script. Notice the use of `tf.variable_scope`. This will become important later on when we want to organize out computational graph for variable reuse and to collect relavant variable sets.

In [None]:
### Hyper parameters
num_iterations = 1000
batch_size = 100
learning_rate = 0.5
display_freq = 100

### This will erase the computational graph in memory
tf.reset_default_graph()




####################################
### define the model operations ####
####################################
with tf.variable_scope('inputs'):
    x = tf.placeholder(tf.float32, [None, H,W])
    y_true_cls = tf.placeholder(tf.int64, [None])

with tf.variable_scope('model_ops'):
    # define model parameters
    weights = tf.Variable(tf.zeros([img_size_flat, num_classes]))
    biases = tf.Variable(tf.zeros([num_classes]))

    # create linear layer
    x_flat = tf.reshape(x,[tf.shape(x)[0],img_size_flat])
    logits = tf.matmul(x_flat, weights) + biases
    # add a non-linearity
    y_pred = tf.nn.softmax(logits)

with tf.variable_scope('loss_ops'): 
    y_pred_cls = tf.argmax(y_pred, axis=1)
    y_true = tf.one_hot(y_true_cls,depth = num_classes)
    cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits,
                                                               labels=y_true)
    cost = tf.reduce_mean(cross_entropy)

with tf.variable_scope('optimization_ops'):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

with tf.variable_scope('metrics'):
    correct_prediction = tf.equal(y_pred_cls, y_true_cls)
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

### Define our test set feed dictionary    
feed_dict_test = {x: x_test,
              y_true_cls: y_test}
    
### Start a session to train and test the model    
with tf.Session() as sess:
    ### Initialize model parameters
    tf.global_variables_initializer().run()
    ### Optimize model parameters 
    for i in range(num_iterations):
        x_batch, y_true_batch, _ = utils.batch_data(x_train,y_train,batch_size=batch_size)
        feed_dict = {x: x_batch,
                    y_true_cls: y_true_batch}

        sess.run(optimizer, feed_dict=feed_dict) 
        
        if i % display_freq == 0:
            loss_batch, acc_batch = sess.run([cost, accuracy],feed_dict=feed_dict_test)
            print("iter {0:3d}:\t Loss={1:.2f},\tTraining Accuracy={2:.01%}".
                  format(i, loss_batch, acc_batch))            
        
        

    print('****** Training Complete ******')
    utils.print_accuracy(sess,accuracy,feed_dict_test)
    utils.plot_example_errors(x_test,y_test,sess,[correct_prediction,y_pred_cls],feed_dict_test)
    utils.plot_weights(sess,weights)   
    


## Exercises

These are a few suggestions for exercises that may help improve your skills with TensorFlow. It is important to get hands-on experience with TensorFlow in order to learn how to use it properly.

You may want to backup this Notebook before making any changes.

* Change the learning-rate for the optimizer.
* Change the optimizer to e.g. `AdagradOptimizer` or `AdamOptimizer`.
* Change the batch-size to e.g. 1 or 1000.
* Change the number of iterations used for optimization.
* How do these changes affect the performance?
* Do you think these changes will have the same effect (if any) on other classification problems and mathematical models?
* Do you get the exact same results if you run the Notebook multiple times without changing any parameters? Why or why not?
* Use `sparse_softmax_cross_entropy_with_logits` instead of `softmax_cross_entropy_with_logits`. This may require several changes to multiple places in the source-code. Discuss the advantages and disadvantages of using the two methods.