# Evaluating Process
We are able to evaluate the model against test dataset both after and in parallel with the training process. We aim to perform the latter method in this workshop. In the former, the evaluation performs on all the pre-build check-points however the latter evaluates every single checkpoint that the training process generates. Anyhow, let's go through the evaluation process.

Again, we import `tensorflow`, `mnist`, `lenet`, and `load_batch`.

In [2]:
import sys
print(sys.path)
print(sys.meta_path)

['', '/env/python', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', '/usr/local/lib/python2.7/dist-packages/IPython/extensions', '/content/.ipython']
[<six._SixMetaPathImporter object at 0x7f2489614750>, <pkg_resources.extern.VendorImporter instance at 0x7f2487a94a28>, <pkg_resources._vendor.six._SixMetaPathImporter object at 0x7f24879f1d50>, <IPython.utils.shimmodule.ShimImporter object at 0x7f2456d53790>, <IPython.utils.shimmodule.ShimImporter object at 0x7f2456d53810>]


In [0]:
import tensorflow as tf

from datasets import mnist
from model import lenet, load_batch

In [4]:
!ls -R


.:
data	       index.html?C=D;O=A  index.html?C=N;O=D  mnist_train.ipynb
datalab        index.html?C=D;O=D  index.html?C=S;O=A  model.py
datasets       index.html?C=M;O=A  index.html?C=S;O=D  model.pyc
Ellie	       index.html?C=M;O=D  log		       preprocessing
ict.icrar.org  index.html?C=N;O=A  mnist_eval.ipynb

./data:
index.html	    index.html?C=M;O=D	index.html?C=S;O=D
index.html?C=D;O=A  index.html?C=N;O=A	mnist_test.tfrecord
index.html?C=D;O=D  index.html?C=N;O=D	mnist_train.tfrecord
index.html?C=M;O=A  index.html?C=S;O=A

./datalab:

./datasets:
dataset_utils.py	       index.html?C=M;O=A  __init__.py
dataset_utils.pyc	       index.html?C=M;O=D  __init__.pyc
download_and_convert_mnist.py  index.html?C=N;O=A  mnist.py
index.html		       index.html?C=N;O=D  mnist.pyc
index.html?C=D;O=A	       index.html?C=S;O=A
index.html?C=D;O=D	       index.html?C=S;O=D

./Ellie:
data		    index.html?C=M;O=A	index.html?C=S;O=A  mnist_train.ipynb
datasets	    index.html?C=M

Like the train code, we shorten some directions and specify the flags.

In [0]:
slim = tf.contrib.slim
metrics = tf.contrib.metrics

flags = tf.app.flags
flags.DEFINE_string('data_dir', './data/',
                    'Directory with the MNIST data.')
flags.DEFINE_integer('batch_size', 5, 'Batch size.')
flags.DEFINE_integer('eval_interval_secs', 60,
                    'Number of seconds between evaluations.')
flags.DEFINE_integer('num_evals', 100, 'Number of batches to evaluate.')
flags.DEFINE_string('log_dir', './log/',
                    'Directory where to log evaluation data.')
flags.DEFINE_string('checkpoint_dir', './log/',
                    'Directory with the model checkpoint data.')
FLAGS = flags.FLAGS

Load the dataset using `mnist.get_split`. Notice that we load the test dataset here since we have to evaluate the model using a separate dataset from the training dataset. Otherwise, the accuracy will turn out an unrealistic value, i.e. 1 or so close. To test the quality of the recognition in real-world conditions, we must use digits that the system has NOT seen during training. Otherwise, it could learn all the training digits by heart and still fail at recognizing an "8" that I just wrote. The MNIST dataset contains 10,000 test digits.

In [0]:
dataset = mnist.get_split('test', FLAGS.data_dir)

images, labels = load_batch(
    dataset,
    FLAGS.batch_size,
    is_training=False)

Get the model prediction from the LeNet network.

In [0]:
predictions = lenet(images)

Convert prediction values for each class into single class prediction which is the highest probability for that class.

In [0]:
predictions = tf.to_int64(tf.argmax(predictions, 1))

The accuracy is simply the % of correctly recognized digits. This is computed on the test set. You will see the values go up if the training goes well.

In [0]:
metrics_to_values, metrics_to_updates = metrics.aggregate_metric_map({
    'mse': metrics.streaming_mean_squared_error(predictions, labels),
    'accuracy': metrics.streaming_accuracy(predictions, labels),
})

Write the metrics values as summaries to be plotted later. We will be plotting the online evolution of accuracy on trained model.

In [0]:
for metric_name, metric_value in metrics_to_values.iteritems():
    tf.summary.scalar(metric_name, metric_value)

Having the instruction above, we are ready to launch the model evaluation. So, utilizing function `slim.evaluation.evaluation_loop` the checkpoints in the `checkpoint_dir` will run in a loop of evaluation with the time intervals of `eval_interval_secs`. Recall that we have specified the interval to be 60 seconds.

In [0]:
slim.evaluation.evaluation_loop(
    '',
    FLAGS.checkpoint_dir,
    FLAGS.log_dir,
    num_evals=FLAGS.num_evals,
    eval_op=metrics_to_updates.values(),
    eval_interval_secs=FLAGS.eval_interval_secs)

INFO:tensorflow:Waiting for new checkpoint at ./log/
INFO:tensorflow:Found new checkpoint at ./log/model.ckpt-0
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
INFO:tensorflow:Restoring parameters from ./log/model.ckpt-0
INFO:tensorflow:Starting evaluation at 2017-11-30-09:48:58
INFO:tensorflow:Evaluation [1/100]
INFO:tensorflow:Evaluation [2/100]
INFO:tensorflow:Evaluation [3/100]
INFO:tensorflow:Evaluation [4/100]
INFO:tensorflow:Evaluation [5/100]
INFO:tensorflow:Evaluation [6/100]
INFO:tensorflow:Evaluation [7/100]
INFO:tensorflow:Evaluation [8/100]
INFO:tensorflow:Evaluation [9/100]
INFO:tensorflow:Evaluation [10/100]
INFO:tensorflow:Evaluation [11/100]
INFO:tensorflow:Evaluation [12/100]
INFO:tensorflow:Evaluation [13/100]
INFO:tensorflow:Evaluation [14/100]
INFO:tensorflow:Evaluation [15/100]
INFO:tensorflow:Evaluation [16/100]
INFO:tensorflow:Evaluation [17/100]
INFO:tensorflow:Evaluation [18/100]
INFO:tensorflow:Evaluation [19/100]
INFO:tensorflo

INFO:tensorflow:Evaluation [60/100]
INFO:tensorflow:Evaluation [61/100]
INFO:tensorflow:Evaluation [62/100]
INFO:tensorflow:Evaluation [63/100]
INFO:tensorflow:Evaluation [64/100]
INFO:tensorflow:Evaluation [65/100]
INFO:tensorflow:Evaluation [66/100]
INFO:tensorflow:Evaluation [67/100]
INFO:tensorflow:Evaluation [68/100]
INFO:tensorflow:Evaluation [69/100]
INFO:tensorflow:Evaluation [70/100]
INFO:tensorflow:Evaluation [71/100]
INFO:tensorflow:Evaluation [72/100]
INFO:tensorflow:Evaluation [73/100]
INFO:tensorflow:Evaluation [74/100]
INFO:tensorflow:Evaluation [75/100]
INFO:tensorflow:Evaluation [76/100]
INFO:tensorflow:Evaluation [77/100]
INFO:tensorflow:Evaluation [78/100]
INFO:tensorflow:Evaluation [79/100]
INFO:tensorflow:Evaluation [80/100]
INFO:tensorflow:Evaluation [81/100]
INFO:tensorflow:Evaluation [82/100]
INFO:tensorflow:Evaluation [83/100]
INFO:tensorflow:Evaluation [84/100]
INFO:tensorflow:Evaluation [85/100]
INFO:tensorflow:Evaluation [86/100]
INFO:tensorflow:Evaluation [