# EECS 531: Computer Vision Assignment 3
**David Fan**

3/30/18

In this notebook we will be exploring different packages for training and configuring deep neural networks.

## Set Up Keras

In [54]:
import keras
from keras.models import Sequential
from keras.layers import Dense, Flatten, Conv2D, MaxPooling2D, Dropout
from keras import backend as K

We'll be using the mnist dataset for these first tests to attempt to replicate the demo's work.

In [35]:
from keras.datasets import mnist
(imgTrain, labelTrain), (imgTest, labelTest) = mnist.load_data()

## Format dataset

As Keras is essentially just an upper level API for different deep neural network package backends, we need to shape our image data to ensure that it matches the format that the backend we're using expects. Currently we're using Tensorflow as Keras's backend.

In [51]:
def format_data(imgTrain, imgTest):
    imgRows, imgCols = 28, 28

    if K.image_data_format() == 'channels_first':
        imgTrain = imgTrain.reshape(imgTrain.shape[0], 1, imgRows, imgCols)
        imgTest  = imgTest.reshape(imgTest.shape[0], 1, imgRows, imgCols)
        smpSize  = (1, imgRows, imgCols)
    else:
        imgTrain = imgTrain.reshape(imgTrain.shape[0], imgRows, imgCols, 1)
        imgTest  = imgTest.reshape(imgTest.shape[0], imgRows, imgCols, 1)
        smpSize  = (imgRows, imgCols, 1)

    imgTrain = imgTrain.astype('float') / 255
    imgTest  = imgTest.astype('float') / 255

    print('Training set in shape of ', imgTrain.shape, ' with element type ', type(imgTrain.item(0)))
    print('Testing set in shape of  ', imgTest.shape, ' with element type ', type(imgTrain.item(0)))
    
    return (imgTrain, imgTest, smpSize)

imgTrain, imgTest, smpSize = format_data(imgTrain, imgTest)

Training set in shape of  (60000, 28, 28, 1)  with element type  <class 'float'>
Testing set in shape of   (10000, 28, 28, 1)  with element type  <class 'float'>


Currently the labels are in the form of a single scalar with the numerical value of the handwritten digit. For our model we would prefer them as one-hot categorical vectors so we us a convenient built in keras function for this:

In [37]:
def label_to_onehot(labelTrain, labelTest):
    ncat = 10 

    onehotTrain = keras.utils.to_categorical(labelTrain, ncat)
    onehotTest  = keras.utils.to_categorical(labelTest, ncat)
    
    return (onehotTrain, onehotTest)

onehotTrain, onehotTest = label_to_onehot(labelTrain, labelTest)

# Replicating the Demo
## Define the model 

Now we use Keras to define our neural network. We first define the model as a sequential model which means it's just a linear stack of layers. This is a Keras built in model definition for simple model structures. Next, we add in a few layers to build the same model that the demo built:
1. A convolution transformation with 32 3x3 filters. Here we use the ReLU activation function. An explanation of the ReLU activation function can be found [here](https://en.wikipedia.org/wiki/Rectifier_(neural_networks). Essentially it just returns the positive part of its argument. We use it here due to its computational ease.
2. A max pooling layer with a 2x2 filter to perform non-linear down-sampling.
3. A flattening layer to reshape the data for the final linear transformation.
4. A regular linear transformation layer using softmax activation. Softmax is a really commonly used activation function for the final layer of a neural network. More detail can be found [here](https://en.wikipedia.org/wiki/Rectifier_(neural_networks).

In [50]:
def simple_model(smpSize):
    model = Sequential()
    model.add(Conv2D(32, kernel_size=(3, 3),
                     activation='relu',
                     input_shape=smpSize))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Flatten())
    model.add(Dense(10, activation='softmax'))
    
    return model

model = simple_model(smpSize)

## Compile the model
Here we compile our model. We use the cross-entropy loss function. A nice explanation of cross-entropy can be found [here](https://rdipietro.github.io/friendly-intro-to-cross-entropy-loss/). Keras requires us to define an "optimizer" when we compile the model so we'll be using the Adadelta optimizer with the default parameters. We will specify the accuracy metric for the model output.

In [45]:
def compile_simple(model):
    model.compile(loss=keras.losses.categorical_crossentropy,
                  optimizer=keras.optimizers.Adadelta(),
                  metrics=['accuracy'])
    
compile_simple(model)

## Fit the model
Here we tell Keras to fit our model. Not much about the parameters needs to be explained here except for the `batch_size` and the `epochs` parameters. `batch_size` defines the number of samples per gradient update while `epochs` defines the number of iterations over the entire input dataset.

In [46]:
def fit_simple(model):
    model.fit(imgTrain, onehotTrain, validation_data=(imgTest, onehotTest), batch_size=128, epochs=3, verbose=1)
    
fit_simple(model)

Train on 60000 samples, validate on 10000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


## Evaluate the model
Here we tell Keras to evaluate our model and provide the metrics we defined earlier. We can now see the accuracy of our model on the mnist dataset:

In [47]:
def evaluate(model):
    score = model.evaluate(imgTest, onehotTest, verbose=0)
    print('Test loss     :', score[0])
    print('Test accuracy :', score[1])
    
evaluate(model)

Test loss     : 0.07534846454262734
Test accuracy : 0.9771


## Fashion MNIST
Let's train the previous architecture on the Fashion MNIST dataset. The dataset is another built in Keras dataset that can be found on their [website](https://keras.io/datasets/). It's the same shape and size as the MNIST dataset, but instead of handwritten digits it's articles of clothing. We're using this dataset for comparison because it doesn't require us to change anything about the shape of our network so we can make a quick and easy direct comparison.

In [48]:
from keras.datasets import fashion_mnist
(imgTrain, labelTrain), (imgTest, labelTest) = fashion_mnist.load_data()

In [49]:
imgTrain, imgTest = format_data(imgTrain, imgTest)
onehotTrain, onehotTest = label_to_onehot(labelTrain, labelTest)
model = simple_model()
compile_simple(model)
fit_simple(model)
evaluate(model)

Training set in shape of  (60000, 28, 28, 1)  with element type  <class 'float'>
Testing set in shape of   (10000, 28, 28, 1)  with element type  <class 'float'>
Train on 60000 samples, validate on 10000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3
Test loss     : 0.3372094687461853
Test accuracy : 0.8837


# Making a More Complex Network
First we need to get our MNIST data again.

In [52]:
(imgTrain, labelTrain), (imgTest, labelTest) = mnist.load_data()
imgTrain, imgTest, smpSize = format_data(imgTrain, imgTest)
onehotTrain, onehotTest = label_to_onehot(labelTrain, labelTest)

Training set in shape of  (60000, 28, 28, 1)  with element type  <class 'float'>
Testing set in shape of   (10000, 28, 28, 1)  with element type  <class 'float'>


## Define the model
This time around we're going to define a more complex CNN. We'll be using the following layers:
1. A convolutional layer using the ReLU activation function with 32 5x5 filters.
2. A max-pooling layer with a 2x2 filter.
3. A convolutional layer using the ReLU activation function with 64 5x5 filters.
4. A max-pooling layer with a 2x2 filter.
5. A flattening layer
6. A dense layer using the ReLU activation function
7. A dropout layer with rate 0.4
8. A final dense layer using softmax activation

In [55]:
model = Sequential()
model.add(Conv2D(32, kernel_size=(5, 5),
                 activation='relu',
                 input_shape=smpSize))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, kernel_size=(5, 5),
                 activation='relu',
                 input_shape=smpSize))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dropout(.4))
model.add(Dense(10, activation='softmax'))

## Compile the model

In [56]:
model.compile(loss=keras.losses.categorical_crossentropy,
                  optimizer=keras.optimizers.Adadelta(),
                  metrics=['accuracy'])

## Fit the model

In [57]:
model.fit(imgTrain, onehotTrain, validation_data=(imgTest, onehotTest), batch_size=128, epochs=3, verbose=1)

Train on 60000 samples, validate on 10000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x181e2af208>

## Evaluate the model

In [58]:
score = model.evaluate(imgTest, onehotTest, verbose=0)
print('Test loss     :', score[0])
print('Test accuracy :', score[1])

Test loss     : 0.03277401327027474
Test accuracy : 0.9886


We were able to squeeze nearly a whole percentage point more of accuracy with our more complex model!

# Complex model in Tensorflow
For comparison purposes let's see how difficult it would be to create the exact same model using Tensorflow. We were using Tensorflow as Keras's backend, but here we're going to skip the Keras API and use Tensorflow directly.

In [60]:
import numpy as np
import tensorflow as tf

In [61]:
tf.logging.set_verbosity(tf.logging.INFO)

In [94]:
def cnn_model_fn(features, labels, mode):
    input_layer = tf.reshape(features["x"], [-1, 28, 28, 1])
    conv1 = tf.layers.conv2d(
        inputs=input_layer,
        filters=32,
        kernel_size=[5,5],
        padding="same",
        activation=tf.nn.relu)
    pool1=tf.layers.max_pooling2d(inputs=conv1, pool_size=[2,2], strides=2)
    conv2=tf.layers.conv2d(
        inputs=pool1,
        filters=64,
        kernel_size=[5,5],
        padding="same",
        activation=tf.nn.relu)
    pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2,2], strides=2)
    
    pool2_flat = tf.reshape(pool2, [-1, 7*7*64])
    dense=tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)
    dropout=tf.layers.dropout(inputs=dense,rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN)
    
    logits = tf.layers.dense(inputs=dropout, units=10)
    
    predictions = {
        'classes': tf.argmax(input=logits, axis=1),
        'probabilities': tf.nn.softmax(logits, name="softmax_tensor")
    }
    
    if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)
    
    onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=10)
    loss=tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits)
    if mode == tf.estimator.ModeKeys.TRAIN:
        optimizer = tf.train.AdagradOptimizer(learning_rate=0.01)
        train_op = optimizer.minimize(
            loss=loss,
            global_step=tf.train.get_global_step())
        return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)
    
    eval_metric_ops = {
        "accuracy": tf.metrics.accuracy(
            labels=labels, predictions=predictions["classes"])
    }
    return tf.estimator.EstimatorSpec(mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

In [95]:
mnist = tf.contrib.learn.datasets.load_dataset("mnist")
train_data = mnist.train.images
train_labels = np.asarray(mnist.train.labels, dtype=np.int32)
eval_data = mnist.test.images
eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)

Extracting MNIST-data/train-images-idx3-ubyte.gz
Extracting MNIST-data/train-labels-idx1-ubyte.gz
Extracting MNIST-data/t10k-images-idx3-ubyte.gz
Extracting MNIST-data/t10k-labels-idx1-ubyte.gz


In [96]:
mnist_classifier = tf.estimator.Estimator(model_fn=cnn_model_fn, model_dir = "/tmp/mnist_convnet_model")

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/mnist_convnet_model', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x181b7b7550>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


In [97]:
tensors_to_log = {"probabilities": "softmax_tensor"}
logging_hook = tf.train.LoggingTensorHook(tensors=tensors_to_log, every_n_iter=50)

In [98]:
train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": train_data},
    y=train_labels,
    batch_size=128,
    num_epochs=3,
    shuffle=True)
mnist_classifier.train(
    input_fn=train_input_fn,
    hooks=[logging_hook])

INFO:tensorflow:Calling model_fn.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1 into /tmp/mnist_convnet_model/model.ckpt.
INFO:tensorflow:probabilities = [[0.09947185 0.09428356 0.09020665 ... 0.09754863 0.10479341 0.10819811]
 [0.0967101  0.09268875 0.09332853 ... 0.1029065  0.09900474 0.11861431]
 [0.10116234 0.10283292 0.09386651 ... 0.10139182 0.09759238 0.11258373]
 ...
 [0.09149599 0.11180939 0.09391053 ... 0.08764391 0.09967451 0.1082143 ]
 [0.09633005 0.1092521  0.08950268 ... 0.08119695 0.10351996 0.10277978]
 [0.09181795 0.13115564 0.09331577 ... 0.08530012 0.11110679 0.09617934]]
INFO:tensorflow

INFO:tensorflow:global_step/sec: 3.54988
INFO:tensorflow:probabilities = [[0.00008914 0.9919952  0.00024178 ... 0.00578251 0.00093296 0.00032162]
 [0.00001386 0.00002614 0.00210979 ... 0.00008718 0.0001348  0.00699584]
 [0.00023731 0.00004146 0.0014921  ... 0.00004484 0.9925707  0.00424605]
 ...
 [0.00000063 0.0000153  0.00000524 ... 0.00057431 0.00062672 0.8320926 ]
 [0.00002464 0.00000032 0.00088321 ... 0.00000059 0.00004074 0.0000173 ]
 [0.00000975 0.00001524 0.9903094  ... 0.00882894 0.0000039  0.00001754]] (13.918 sec)
INFO:tensorflow:loss = 0.104220234, step = 701 (28.170 sec)
INFO:tensorflow:probabilities = [[0.998895   0.00000363 0.00040399 ... 0.00004001 0.00000599 0.00018817]
 [0.0031018  0.00000261 0.00003349 ... 0.00015691 0.01229582 0.0119734 ]
 [0.00000018 0.00000002 0.00000339 ... 0.00007247 0.00007766 0.00059537]
 ...
 [0.00002519 0.00051321 0.00772043 ... 0.00001007 0.00247321 0.00001831]
 [0.00309345 0.00000324 0.00327289 ... 0.00000009 0.00221275 0.00000212]
 [0.    

<tensorflow.python.estimator.estimator.Estimator at 0x181b7b7518>

In [99]:
eval_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": eval_data},
    y=eval_labels,
    num_epochs=3,
    shuffle=False)
eval_results = mnist_classifier.evaluate(input_fn=eval_input_fn)
print(eval_results)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-05-05-22:40:11
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/mnist_convnet_model/model.ckpt-1290
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-05-05-22:40:31
INFO:tensorflow:Saving dict for global step 1290: accuracy = 0.9738, global_step = 1290, loss = 0.08406767
{'accuracy': 0.9738, 'loss': 0.08406767, 'global_step': 1290}


Interestingly we get worse accuracy than when we implemented the model in Keras. This is likely due to a difference in implementations and a lack of optimization for using Tensorflow.