# Optimising a TensorFlow SavedModel for Serving

This notebooks shows how to optimise the TensorFlow exported SavedModel by **shrinking** its size (to have less memory and disk footprints), and **improving** prediction latency. This can be accopmlished by applying the following:
* **Freezing**: That is, converting the variables stored in a checkpoint file of the SavedModel into constants stored directly in the model graph.
* **Pruning**: That is, stripping unused nodes during the prediction path of the graph, merging duplicate nodes, as well as removing other node ops like summary, identity, etc.
* **Quantisation**:  That is, converting any large float Const op into an eight-bit equivalent, followed by a float conversion op so that the result is usable by subsequent nodes.
* **Other refinements**: That includes constant folding, batch_norm folding, fusing convolusion, etc.

The optimisation operations we apply in this example are from the TensorFlow [Graph Conversion Tool](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md#fold_constants), which is a c++ command-line tool. We use the Python APIs to call the c++ libraries. 

The Graph Transform Tool is designed to work on models that are saved as GraphDef files, usually in a binary protobuf format. However, the model exported after training and estimator is in SavedModel format (saved_model.pb file + variables folder with variables.data-* and variables.index files). 

We need to optimise the mode and keep it the SavedModel format. Thus, the optimisation steps will be:
1. Freeze the SavedModel: SavedModel -> GraphDef
2. Optimisae the freezed model: GraphDef -> GraphDef
3. Convert the optimised freezed model to SavedModel: GraphDef -> SavedModel

In [1]:
import os
import numpy as np
from datetime import datetime

import tensorflow as tf
from tensorflow import data

print "TensorFlow : {}".format(tf.__version__)

TensorFlow : 1.10.0


## 1. Train and Export a TensorFlow DNNClassifier

### 1.1 Import Data

In [2]:
mnist = tf.contrib.learn.datasets.load_dataset("mnist")
train_data = mnist.train.images
train_labels = np.asarray(mnist.train.labels, dtype=np.int32)
eval_data = mnist.test.images
eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)
NUM_CLASSES = 10

Instructions for updating:
Please use tf.data.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST-data/train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST-data/train-labels-idx1-ubyte.gz
Extracting MNIST-data/t10k-images-idx3-ubyte.gz
Extracting MNIST-data/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


In [91]:
print "Train data shape: {}".format(train_data.shape)
print "Eval data shape: {}".format(eval_data.shape)

Train data shape: (55000, 784)
Eval data shape: (10000, 784)


### 1.2 Estimator

#### 1.2.1 Model Function

In [4]:
def model_fn(features, labels, mode, params):

    # conv layers
    def _cnn_layers(input_layer, num_conv_layers, init_filters, mode):

        inputs = input_layer

        for i in range(num_conv_layers):

            current_filters = init_filters * (2**i)
            
            conv = tf.layers.conv2d(inputs=inputs, kernel_size=3, filters=current_filters, strides=1,
                                     padding='SAME', name='conv{}'.format(i+1))
            
            pool = tf.layers.max_pooling2d(inputs=conv, pool_size=2, strides=2,
                                            padding='SAME', name='pool{}'.format(i+1))
            
            batch_norm = tf.layers.batch_normalization(pool, name='batch_norm{}'.format(i+1))
            
            if params.debug == True:
                tf.summary.histogram('Batch_Normalisation', batch_norm)

            if mode==tf.estimator.ModeKeys.TRAIN:
                batch_norm = tf.nn.dropout(batch_norm, params.drop_out)
                
            inputs = batch_norm

        outputs = batch_norm
        return outputs

    # model body
    def _inference(features, mode, params):
        
        input_layer = tf.reshape(features["input_image"], [-1, 28, 28, 1], name='input_image')

        conv_outputs = _cnn_layers(input_layer, params.num_conv_layers, params.init_filters, mode)
        
        flatten = tf.layers.flatten(inputs=conv_outputs, name='flatten')
        
        fully_connected = tf.contrib.layers.stack(inputs=flatten, layer=tf.contrib.layers.fully_connected,
                                                stack_args=params.hidden_units,
                                                activation_fn=tf.nn.relu)
        if params.debug == True:
            tf.summary.histogram('Fully_Connected', fully_connected)
        
        # unused_layer
        unused_layers = tf.layers.dense(flatten, units=100, name='unused', activation=tf.nn.relu)
        
        logits = tf.layers.dense(fully_connected, units=NUM_CLASSES, name='logits', activation=None)
        return logits
    

    # model head
    head = tf.contrib.estimator.multi_class_head(n_classes=NUM_CLASSES)
    
    return head.create_estimator_spec(
            features=features,
            mode=mode,
            logits=_inference(features, mode, params),
            labels=labels,
            optimizer=tf.train.AdamOptimizer(params.learning_rate)
    )

#### 1.2.2 Create Custom Estimator

In [5]:
def create_estimator(params, run_config):

    # evaluation metric_fn
    def _metric_fn(labels, predictions):

        metrics = {}
        pred_class = predictions['class_ids']
        metrics['micro_accuracy'] = tf.metrics.mean_per_class_accuracy(
            labels=labels, predictions=pred_class, num_classes=NUM_CLASSES
        )

        return metrics

    mnist_classifier = tf.estimator.Estimator(
        model_fn=model_fn, params=params, config=run_config)

    mnist_classifier = tf.contrib.estimator.add_metrics(
        estimator=mnist_classifier, metric_fn=_metric_fn)
    
    return mnist_classifier

### Train and Evaluate Experiment

In [6]:
def run_experiment(hparam, run_config):
    
    train_spec = tf.estimator.TrainSpec(
        input_fn = tf.estimator.inputs.numpy_input_fn(
            x={"input_image": train_data},
            y=train_labels,
            batch_size=hparam.batch_size,
            num_epochs=None,
            shuffle=True),
        max_steps=hparams.max_traning_steps
    )

    eval_spec = tf.estimator.EvalSpec(
        input_fn = tf.estimator.inputs.numpy_input_fn(
            x={"input_image": train_data},
            y=train_labels,
            batch_size=hparam.batch_size,
            num_epochs=1,
            shuffle=False),
        steps=None,
        throttle_secs=hparams.eval_throttle_secs
    )

    print("Removing previous artifacts...")
    if tf.gfile.Exists(run_config.model_dir):
        tf.gfile.DeleteRecursively(run_config.model_dir)

    tf.logging.set_verbosity(tf.logging.INFO)

    time_start = datetime.utcnow() 
    print("Experiment started at {}".format(time_start.strftime("%H:%M:%S")))
    print(".......................................") 

    estimator = create_estimator(hparams, run_config)

    tf.estimator.train_and_evaluate(
        estimator=estimator,
        train_spec=train_spec, 
        eval_spec=eval_spec
    )

    time_end = datetime.utcnow() 
    print(".......................................")
    print("Experiment finished at {}".format(time_end.strftime("%H:%M:%S")))
    print("")
    time_elapsed = time_end - time_start
    print("Experiment elapsed time: {} seconds".format(time_elapsed.total_seconds()))
    
    return estimator


### Experiment Parameters

In [7]:
MODELS_LOCATION = 'models/mnist'
MODEL_NAME = 'cnn_classifier'
model_dir = os.path.join(MODELS_LOCATION, MODEL_NAME)

hparams  = tf.contrib.training.HParams(
    batch_size=100,
    hidden_units=[1024],
    num_conv_layers=2, 
    init_filters=64,
    drop_out=0.85,
    max_traning_steps=50, #20000,
    eval_throttle_secs=10,
    learning_rate=1e-3,
    debug=True
)

run_config = tf.estimator.RunConfig(
    tf_random_seed=19830610,
    save_checkpoints_steps=1000,
    keep_checkpoint_max=3,
    model_dir=model_dir
)

### TensorFlow Graph 
![image](tensorboard-unused.jpeg)

### Run Experiment

In [8]:
estimator = run_experiment(hparams, run_config)

Removing previous artifacts...
Experiment started at 17:15:17
.......................................
INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_global_id_in_cluster': 0, '_session_config': None, '_keep_checkpoint_max': 3, '_tf_random_seed': 19830610, '_task_type': 'worker', '_train_distribute': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x1232b5c90>, '_model_dir': 'models/mnist/cnn_classifier', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_master': '', '_save_checkpoints_steps': 1000, '_keep_checkpoint_every_n_hours': 10000, '_evaluation_master': '', '_service': None, '_device_fn': None, '_save_summary_steps': 100, '_num_ps_replicas': 0}
INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_session_config': None, '_keep_checkpoint_max': 3, '_task_type': 'worker', '_global_id_in_cluster': 0, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.C

### Export the model

In [9]:
def make_serving_input_receiver_fn():
    inputs = {'input_image': tf.placeholder(shape=[None,784], dtype=tf.float32, name='input_image')}
    return tf.estimator.export.build_raw_serving_input_receiver_fn(inputs)

export_dir = os.path.join(model_dir, 'export')

if tf.gfile.Exists(export_dir):
    tf.gfile.DeleteRecursively(export_dir)
        
estimator.export_savedmodel(
    export_dir_base=export_dir,
    serving_input_receiver_fn=make_serving_input_receiver_fn()
)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Signatures INCLUDED in export for Eval: None
INFO:tensorflow:Signatures INCLUDED in export for Classify: None
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: ['predict']
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures EXCLUDED from export because they cannot be be served via TensorFlow Serving APIs:
INFO:tensorflow:'serving_default' : Classification input must be a single string Tensor; got {'input_image': <tf.Tensor 'input_image:0' shape=(?, 784) dtype=float32>}
INFO:tensorflow:'classification' : Classification input must be a single string Tensor; got {'input_image': <tf.Tensor 'input_image:0' shape=(?, 784) dtype=float32>}
INFO:tensorflow:Restoring parameters from models/mnist/cnn_classifier/model.ckpt-50
INFO

'models/mnist/cnn_classifier/export/1535131004'

## 2. Inspect the Exported SavedModel

In [10]:
%%bash

saved_models_base=models/mnist/cnn_classifier/export/
saved_model_dir=${saved_models_base}$(ls ${saved_models_base} | tail -n 1)
echo ${saved_model_dir}
ls ${saved_model_dir}
saved_model_cli show --dir=${saved_model_dir} --all

models/mnist/cnn_classifier/export/1535131004
saved_model.pb
variables

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['predict']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input_image'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 784)
        name: input_image:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['class_ids'] tensor_info:
        dtype: DT_INT64
        shape: (-1, 1)
        name: head/predictions/ExpandDims:0
    outputs['classes'] tensor_info:
        dtype: DT_STRING
        shape: (-1, 1)
        name: head/predictions/str_classes:0
    outputs['logits'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 10)
        name: logits/BiasAdd:0
    outputs['probabilities'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 10)
        name: head/predictions/probabilities:0
  Method name is: tensorflow/serving/predict


### Prediction with SavedModel

In [172]:
def inference_test(saved_model_dir, signature="predict", repeat=10):

    tf.logging.set_verbosity(tf.logging.ERROR)
    
    time_start = datetime.utcnow() 
    
    predictor = tf.contrib.predictor.from_saved_model(
        export_dir = saved_model_dir,
        signature_def_key=signature
    )
    
    output = None
    for i in range(repeat):
        output = predictor(
            {
                'input_image': eval_data[:10]
            }
        )
    
    time_end = datetime.utcnow() 

    time_elapsed = time_end - time_start
    print("Inference elapsed time: {} seconds".format(time_elapsed.total_seconds()))
    print ""

    
    print "Prediction produced for {} instances".format(len(output['class_ids']))
    print ""
    
    print "Prediction output for the last instance:"
    for key in output.keys():
        print "{}: {}".format(key,output[key][0])

## 3. Test Prediction with SavedModel 

In [175]:
saved_model_dir = os.path.join(export_dir, os.listdir(export_dir)[-1]) 
print(saved_model_dir)
inference_test(saved_model_dir, repeat=1000)

models/mnist/cnn_classifier/export/1535131004
Inference elapsed time: 18.519939 seconds

Prediction produced for 10 instances

Prediction output for the last instance:
probabilities: [3.4098350e-06 3.6456075e-08 6.1648639e-06 1.5369851e-05 9.4937924e-08
 1.6283158e-06 8.4963381e-10 9.9959904e-01 1.0351395e-05 3.6393310e-04]
logits: [-0.24423252 -4.7825427   0.34797013  1.2615114  -3.8254282  -0.98335
 -8.5416     12.3442135   0.8662256   4.426074  ]
classes: ['7']
class_ids: [7]


Latency for **10** instances, repeated **1000** is around **18.5 sec**.

### Describe GraphDef

In [186]:
def describe_graph(graph_def, show_nodes=False):
    
    print 'Input Features: {}'.format([node.name for node in graph_def.node if node.op=='Placeholder'])
    print 'Unused Nodes: {}'.format([node.name for node in graph_def.node if node.name=='unused'])
    print 'Output Probabilities: {}'.format( [node.name for node in graph_def.node if node.op=='Softmax'])
    print 'Constant Count: {}'.format( len([node for node in graph_def.node if node.op=='Const']))
    print 'Variable Count: {}'.format( len([node for node in graph_def.node if 'Variable' in node.op]))
    print 'Identity Count: {}'.format( len([node for node in graph_def.node if node.op=='Identity']))
    print 'Total nodes: {}'.format( len(graph_def.node))
    print ''
    
    if show_nodes==True:
        for node in graph_def.node:
            print(node.op, node.name)

## 4. Describe the SavedModel Graph (before optimisation)

### Load GraphDef from a SavedModel Directory

In [184]:
def get_graph_def_from_saved_model(saved_model_dir):
    
    print saved_model_dir
    print ""
    
    from tensorflow.python.saved_model import tag_constants
    
    with tf.Session() as session:
        meta_graph_def = tf.saved_model.loader.load(
            session,
            tags=[tag_constants.SERVING],
            export_dir=saved_model_dir
        )
        
    return meta_graph_def.graph_def

In [185]:
describe_graph(get_graph_def_from_saved_model(saved_model_dir), True)

models/mnist/cnn_classifier/export/1535131004

Input Features: [u'input_image']
Unused Noed: []
Output Probabilities: [u'head/predictions/probabilities']
Constant Count: 61
Variable Count: 19
Identity Count: 21
Total nodes: 211

(u'Const', u'global_step/Initializer/zeros')
(u'VariableV2', u'global_step')
(u'Assign', u'global_step/Assign')
(u'Identity', u'global_step/read')
(u'Placeholder', u'input_image')
(u'Const', u'input_image_1/shape')
(u'Reshape', u'input_image_1')
(u'Const', u'conv1/kernel/Initializer/random_uniform/shape')
(u'Const', u'conv1/kernel/Initializer/random_uniform/min')
(u'Const', u'conv1/kernel/Initializer/random_uniform/max')
(u'RandomUniform', u'conv1/kernel/Initializer/random_uniform/RandomUniform')
(u'Sub', u'conv1/kernel/Initializer/random_uniform/sub')
(u'Mul', u'conv1/kernel/Initializer/random_uniform/mul')
(u'Add', u'conv1/kernel/Initializer/random_uniform')
(u'VariableV2', u'conv1/kernel')
(u'Assign', u'conv1/kernel/Assign')
(u'Identity', u'conv1/kernel/read

### Get model size

In [33]:
def get_size(model_dir):
    
    print model_dir
    print ""
    
    pb_size = os.path.getsize(os.path.join(model_dir,'saved_model.pb'))
    
    variables_size = 0
    if os.path.exists(os.path.join(model_dir,'variables/variables.data-00000-of-00001')):
        variables_size = os.path.getsize(os.path.join(model_dir,'variables/variables.data-00000-of-00001'))
        variables_size += os.path.getsize(os.path.join(model_dir,'variables/variables.index'))

    print "Model siz: {} KB".format(round(pb_size/(1024.0),3))
    print "Variables size: {} KB".format(round( variables_size/(1024.0),3))
    print "Total Size: {} KB".format(round((pb_size + variables_size)/(1024.0),3))
    

In [34]:
get_size(saved_model_dir)

models/mnist/cnn_classifier/export/1535131004

Model siz: 43.121 KB
Variables size: 27877.202 KB
Total Size: 27920.323 KB


## 5. Freeze SavedModel

This function will convert the SavedModel into a GraphDef file (freezed_model.pb), and storing the variables as constrant to the freezed_model.pb

You need to define the graph output nodes for freezing. We are only interested in the **class_id**, which is produced by **head/predictions/ExpandDims** node

In [18]:
def freeze_graph(saved_model_dir):
    
    from tensorflow.python.tools import freeze_graph
    from tensorflow.python.saved_model import tag_constants
    
    output_graph_filename = os.path.join(saved_model_dir, "freezed_model.pb")
    output_node_names = "head/predictions/ExpandDims"
    initializer_nodes = ""

    freeze_graph.freeze_graph(
        input_saved_model_dir=saved_model_dir,
        output_graph=output_graph_filename,
        saved_model_tags = tag_constants.SERVING,
        output_node_names=output_node_names,
        initializer_nodes=initializer_nodes,

        input_graph=None, 
        input_saver=False,
        input_binary=False, 
        input_checkpoint=None, 
        restore_op_name=None, 
        filename_tensor_name=None, 
        clear_devices=False,
        input_meta_graph=False,
    )
    
    print "SavedModel graph freezed!"

In [19]:
freeze_graph(saved_model_dir)

SavedModel graph freezed!


In [20]:
%%bash
saved_models_base=models/mnist/cnn_classifier/export/
saved_model_dir=${saved_models_base}$(ls ${saved_models_base} | tail -n 1)
echo ${saved_model_dir}
ls ${saved_model_dir}

models/mnist/cnn_classifier/export/1535131004
freezed_model.pb
saved_model.pb
variables


## 6. Describe the freezed_model.pb Graph (after freezing)

### Load GraphDef from GraphDef File

In [21]:
def get_graph_def_from_file(graph_filepath):
    
    print graph_filepath
    print ""
    
    from tensorflow.python import ops
    
    with ops.Graph().as_default():
        with tf.gfile.GFile(graph_filepath, "rb") as f:
            graph_def = tf.GraphDef()
            graph_def.ParseFromString(f.read())
            
            return graph_def
            

In [22]:
freezed_filepath=os.path.join(saved_model_dir,'freezed_model.pb')
describe_graph(get_graph_def_from_file(freezed_filepath))

models/mnist/cnn_classifier/export/1535131004/freezed_model.pb

Input Features: [u'input_image']
Output Probabilities: []
Constant Count: 23
Variable Count: 0
Identity Count: 16
Total nodes: 60



## 8. Optimise the freezed_model.pb

Note that, the optimised graph will replace freezed_model.pb

### Optimise GraphDef

In [41]:
def optimize_graph(model_dir, graph_filename, transforms):
    
    from tensorflow.tools.graph_transforms import TransformGraph
    
    input_names = []
    output_names = ['head/predictions/ExpandDims']
    
    graph_def = get_graph_def_from_file(os.path.join(model_dir, graph_filename))
    optimised_graph_def = TransformGraph(graph_def, 
                                         input_names,
                                         output_names,
                                         transforms 
                                        )
    tf.train.write_graph(optimised_graph_def,
                        logdir=model_dir,
                        as_text=False,
                        name='optimised_model.pb')
    
    print "Freezed graph optimised!"

In [155]:
transforms = [
    'remove_nodes(op=Identity)', 
    'fold_constants(ignore_errors=true)',
#     'fold_batch_norms',
    'round_weights(num_steps=256)',
#     'quantize_weights', 
#     'quantize_nodes',
    'merge_duplicate_nodes',
    'strip_unused_nodes', 
    'sort_by_execution_order'
]

optimize_graph(saved_model_dir, 'freezed_model.pb', transforms)

models/mnist/cnn_classifier/export/1535131004/freezed_model.pb

Freezed graph optimised!


In [156]:
%%bash
saved_models_base=models/mnist/cnn_classifier/export/
saved_model_dir=${saved_models_base}$(ls ${saved_models_base} | tail -n 1)
echo ${saved_model_dir}
ls ${saved_model_dir}

models/mnist/cnn_classifier/export/1535131004
freezed_model.pb
optimised
optimised_model.pb
saved_model.pb
variables


## 8. Describe the Optimised Graph

In [157]:
optimised_filepath=os.path.join(saved_model_dir,'optimised_model.pb')
describe_graph(get_graph_def_from_file(optimised_filepath))

models/mnist/cnn_classifier/export/1535131004/optimised_model.pb

Input Features: [u'input_image']
Output Probabilities: []
Constant Count: 20
Variable Count: 0
Identity Count: 0
Total nodes: 41



## 9. Convert Optimised graph (GraphDef) to SavedModel

In [158]:
def convert_graph_def_to_saved_model(graph_filepath):

    from tensorflow.python import ops
    export_dir=os.path.join(saved_model_dir,'optimised')

    if tf.gfile.Exists(export_dir):
        tf.gfile.DeleteRecursively(export_dir)

    graph_def = get_graph_def_from_file(graph_filepath)
    
    with tf.Session(graph=tf.Graph()) as session:
        tf.import_graph_def(graph_def, name="")
        tf.saved_model.simple_save(session,
                export_dir,
                inputs={
                    node.name: session.graph.get_tensor_by_name("{}:0".format(node.name)) 
                    for node in graph_def.node if node.op=='Placeholder'},
                outputs={
                    "class_ids": session.graph.get_tensor_by_name("head/predictions/ExpandDims:0"),
                }
            )

        print "Optimised graph converted to SavedModel!"

In [159]:
optimised_filepath=os.path.join(saved_model_dir,'optimised_model.pb')
convert_graph_def_to_saved_model(optimised_filepath)

models/mnist/cnn_classifier/export/1535131004/optimised_model.pb

Optimised graph converted to SavedModel!


### Optimised SavedModel Size

In [160]:
optimised_saved_model_dir = os.path.join(saved_model_dir,'optimised') 
get_size(optimised_saved_model_dir)

models/mnist/cnn_classifier/export/1535131004/optimised

Model siz: 25433.796 KB
Variables size: 0.0 KB
Total Size: 25433.796 KB


In [161]:
%%bash

saved_models_base=models/mnist/cnn_classifier/export/
saved_model_dir=${saved_models_base}$(ls ${saved_models_base} | tail -n 1)/optimised
ls ${saved_model_dir}
saved_model_cli show --dir ${saved_model_dir} --all

saved_model.pb
variables

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input_image'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 784)
        name: input_image:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['class_ids'] tensor_info:
        dtype: DT_INT64
        shape: (-1, 1)
        name: head/predictions/ExpandDims:0
  Method name is: tensorflow/serving/predict


## 10. Prediction with the Optimised SavedModel

In [178]:
freezed_saved_model_dir = os.path.join(saved_model_dir,'optimised') 
print(freezed_saved_model_dir)
inference_test(saved_model_dir=freezed_saved_model_dir, signature='serving_default', repeat=1000)

models/mnist/cnn_classifier/export/1535131004/optimised
Inference elapsed time: 16.009274 seconds

Prediction produced for 10 instances

Prediction output for the last instance:
class_ids: [7]


Latency for **10** instances, repeated **1000** is around ** 15.8 sec**.