# Deep Neural Net on MNIST Data

This notebook shows an implementation of the deep neural network discussed on the bitfusion blog post titled "Intro to Tensorflow." More details about the code can be found there.

What follows assumes you are set up on the bitfusion TensorFlow 1.0 AMI and that you have already run the `./setup.sh` script to download the required data.

## Set up the Environment

The first thing we do is import the required packages and also change the logging of TensorFlow (TF). Since tf.contrib.learn is still part of contrib and not yet core, there are some warning messages that are outputted. If the warning in these messages becomes something that needs to change the code, we will change the code. The commented out code is what would be in place if the TF code was not causing a large amount of warning text (and what should be used in the future).

In [1]:
# Read in required packages
import tensorflow as tf
import tensorflow.contrib.layers as layers
import tensorflow.contrib.learn as learn
from tensorflow.contrib.learn.python.learn.metric_spec import MetricSpec
import tensorflow.contrib.metrics as tfmetrics
import cPickle
import gzip

# The following code is to over-write the logging information outputted by tf.contrib.learn
from logging import StreamHandler, INFO, getLogger

logger = getLogger('tensorflow')
logger.removeHandler(logger.handlers[0])

logger.setLevel(INFO)


class DebugFileHandler(StreamHandler):
    def __init__(self):
        StreamHandler.__init__(self)

    def emit(self, record):
        if not record.levelno == INFO:
            return
        StreamHandler.emit(self, record)

logger.addHandler(DebugFileHandler())

# Once the code is fixed, the way that the code should be implemented is this:
# tf.logging.set_verbosity(tf.logging.INFO)

## Reading the Data

For this project we will be reading in the MNIST dataset that was downloaded as part of the `./setup.sh` script.

In [2]:
# Read Data
f = gzip.open('mnist.pkl.gz', 'rb')
train_set, valid_set, test_set = cPickle.load(f)
f.close()

## Defining the Model Using `tf.contrib.learn`

We will now define the model structure for a deep neural network (DNN). The `tf.contrib.learn` framework provides pre-packaged models that can be run in a very similar manner to a scikit-learn model.

In [3]:
# Infer the feature columns
feature_columns = tf.contrib.learn.infer_real_valued_columns_from_input(train_set[0])

# Define the DNN classifier
classifier = tf.contrib.learn.DNNClassifier(
    feature_columns=feature_columns,
    hidden_units=[512, 128],
    n_classes=10
)

Using default config.
Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_tf_random_seed': None, '_task_type': None, '_environment': 'local', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f134c1dd750>, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_task_id': 0, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_evaluation_master': '', '_keep_checkpoint_every_n_hours': 10000, '_master': ''}


## Training the `tf.contrib.learn` Model

The next thing that we will do is train the model using the `fit` function on the `DNNClassifer` class object. The code will run for 10,000 steps with a batch size of 100.

In [4]:
classifier.fit(x=train_set[0],
               y=train_set[1],
               batch_size=100,
               steps=10000)

  equality = a == b
Create CheckpointSaverHook.
Saving checkpoints for 1 into /tmp/tmpl6L7el/model.ckpt.
loss = 2.38357, step = 1
global_step/sec: 291.04
loss = 0.364843, step = 101
global_step/sec: 296.332
loss = 0.293239, step = 201
global_step/sec: 294.974
loss = 0.162038, step = 301
global_step/sec: 294.066
loss = 0.306019, step = 401
global_step/sec: 290.946
loss = 0.146603, step = 501
global_step/sec: 300.217
loss = 0.276031, step = 601
global_step/sec: 294.977
loss = 0.158162, step = 701
global_step/sec: 297.202
loss = 0.178328, step = 801
global_step/sec: 300.972
loss = 0.092749, step = 901
global_step/sec: 295.86
loss = 0.163923, step = 1001
global_step/sec: 297.505
loss = 0.0521841, step = 1101
global_step/sec: 300.212
loss = 0.0600116, step = 1201
global_step/sec: 296.639
loss = 0.0985562, step = 1301
global_step/sec: 299.17
loss = 0.0707446, step = 1401
global_step/sec: 297.624
loss = 0.0718092, step = 1501
global_step/sec: 299.183
loss = 0.0541069, step = 1601
global_step/

DNNClassifier(params={'head': <tensorflow.contrib.learn.python.learn.estimators.head._MultiClassHead object at 0x7f134c1c2fd0>, 'hidden_units': [512, 128], 'feature_columns': (_RealValuedColumn(column_name='', dimension=784, default_value=None, dtype=tf.float32, normalizer=None),), 'embedding_lr_multipliers': None, 'optimizer': None, 'dropout': None, 'gradient_clip_norm': None, 'activation_fn': <function relu at 0x7f1301963668>, 'input_layer_min_slice_size': None})

## Testing the Trained Model

The last thing we want to do with this model is test the accuracy against some held-out test data set.

In [5]:
score = classifier.evaluate(x=test_set[0], y=test_set[1])
print('Accuracy: {0:f}'.format(score['accuracy']))

Starting evaluation at 2017-03-10-18:21:46
Finished evaluation at 2017-03-10-18:21:46
Saving dict for global step 10000: accuracy = 0.9814, auc = 0.998549, global_step = 10000, loss = 0.0636759


Accuracy: 0.981400


## Defining Our Own Model Using `tf.contrib.layers`

The next thing we want to illustrate is how someone would implement their own DNN architecture. For this example we will actually just implement the exact same version of the model that we already trained. The purpose will be to get comfortable with defining models using `tf.contrib.layers`

In [6]:
# Model Definition
def fully_connected_model(features, labels):
    features = layers.flatten(features)
    labels = tf.one_hot(tf.cast(labels, tf.int32), 10, 1, 0)

    layer1 = layers.fully_connected(features, 512, activation_fn=tf.nn.relu, scope='fc1')
    layer2 = layers.fully_connected(layer1, 128, activation_fn=tf.nn.relu, scope='fc2')
    logits = layers.fully_connected(layer2, 10, activation_fn=None, scope='out')

    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))

    train_op = layers.optimize_loss(
        loss,
        tf.contrib.framework.get_global_step(),
        optimizer='SGD',
        learning_rate=0.01
    )

    return tf.argmax(logits, 1), loss, train_op

custom_classifier = learn.Estimator(model_fn=fully_connected_model)

Using default config.
Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_tf_random_seed': None, '_task_type': None, '_environment': 'local', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f12efb81190>, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_task_id': 0, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_evaluation_master': '', '_keep_checkpoint_every_n_hours': 10000, '_master': ''}


## Train the Custom Model

Train our custom model with the same steps and batch size.

In [7]:
custom_classifier.fit(x=train_set[0],
                      y=train_set[1],
                      batch_size=100,
                      steps=10000)

Create CheckpointSaverHook.
Saving checkpoints for 1 into /tmp/tmpYMw7hX/model.ckpt.
loss = 2.35727, step = 1
global_step/sec: 279.355
loss = 1.4986, step = 101
global_step/sec: 287.367
loss = 0.900241, step = 201
global_step/sec: 282.195
loss = 0.625328, step = 301
global_step/sec: 281.372
loss = 0.618043, step = 401
global_step/sec: 281.072
loss = 0.49757, step = 501
global_step/sec: 286.787
loss = 0.455548, step = 601
global_step/sec: 288.139
loss = 0.4886, step = 701
global_step/sec: 288.006
loss = 0.426209, step = 801
global_step/sec: 287.755
loss = 0.315149, step = 901
global_step/sec: 283.779
loss = 0.411261, step = 1001
global_step/sec: 286.017
loss = 0.312354, step = 1101
global_step/sec: 285.877
loss = 0.376406, step = 1201
global_step/sec: 285.194
loss = 0.333029, step = 1301
global_step/sec: 287.648
loss = 0.306934, step = 1401
global_step/sec: 286.285
loss = 0.298749, step = 1501
global_step/sec: 289.933
loss = 0.283361, step = 1601
global_step/sec: 289.549
loss = 0.303073

Estimator(params=None)

## Test the Model

In [8]:
# Model Testing
score = custom_classifier.evaluate(x=test_set[0], y=test_set[1],
                                   metrics={'accuracy': MetricSpec(tfmetrics.streaming_accuracy)})
print('Accuracy: {0:f}'.format(score['accuracy']))

Starting evaluation at 2017-03-10-18:22:22
Finished evaluation at 2017-03-10-18:22:22
Saving dict for global step 10000: accuracy = 0.9555, global_step = 10000, loss = 0.14439


Accuracy: 0.955500


# Summary and Note on Accuracy

This is a great starting point for going deeper with TensorFlow. What we have done is implemented a DNN using a pre-built model using `tf.contrib.learn` and our own model using `tf.contrib.layers`. Moving forward, we will expand on the custom model. To look at the raw code of the final model, look at the `model.py` file found in this repository.

One thing that I want to address is the fact that the accuracy of the custom model may be lower than the pre-built model. We will cover optimizers in future posts, but long story short is the optimizer used by the DNNClassifier class converges better on this dataset. One might also note that we never used a learning rate when defining the DNNClassifier. It is because we used an adagrad optimizer in the DNNClassifier, but we use a traditional Stochastic Gradient Descent (SGD) optimizer for our custom model. We will go over these a little more in later posts.