# Optimising a TensorFlow SavedModel for Serving

This notebooks shows how to optimise the TensorFlow exported SavedModel by **shrinking** its size (to have less memory and disk footprints), and **improving** prediction latency. This can be accopmlished by applying the following:
* **Freezing**: That is, converting the variables stored in a checkpoint file of the SavedModel into constants stored directly in the model graph.
* **Pruning**: That is, stripping unused nodes during the prediction path of the graph, merging duplicate nodes, as well as removing other node ops like summary, identity, etc.
* **Quantisation**:  That is, converting any large float Const op into an eight-bit equivalent, followed by a float conversion op so that the result is usable by subsequent nodes.
* **Other refinements**: That includes constant folding, batch_norm folding, fusing convolusion, etc.

The optimisation operations we apply in this example are from the TensorFlow [Graph Conversion Tool](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md#fold_constants), which is a c++ command-line tool. We use the Python APIs to call the c++ libraries. 

The Graph Transform Tool is designed to work on models that are saved as GraphDef files, usually in a binary protobuf format. However, the model exported after training and estimator is in SavedModel format (saved_model.pb file + variables folder with variables.data-* and variables.index files). 

We need to optimise the mode and keep it the SavedModel format. Thus, the optimisation steps will be:
1. Freeze the SavedModel: SavedModel -> GraphDef
2. Optimisae the freezed model: GraphDef -> GraphDef
3. Convert the optimised freezed model to SavedModel: GraphDef -> SavedModel

In [1]:
import os
from datetime import datetime

import tensorflow as tf
from tensorflow import data

print "TensorFlow : {}".format(tf.__version__)

TensorFlow : 1.10.0


## 1. Train and Export a TensorFlow DNNClassifier

### Dataset Metadata

In [2]:
CSV_HEADER = ['sepal_length', 'sepal_width',  'petal_length', 'petal_width', 'species']
NUMERIC_FEATURE_NAMES = ['sepal_length', 'sepal_width',  'petal_length', 'petal_width']
TARGET_NAME = 'species'
TARGET_LABELS = ['setosa', 'virginica', 'versicolor']

### Input Function

In [3]:
def make_input_fn(file_pattern, num_epochs=1, batch_size=50, mode=tf.estimator.ModeKeys.EVAL):
    
    def _input_fn():
        
        features = tf.contrib.data.make_csv_dataset(
            file_pattern=file_pattern,
            column_names=CSV_HEADER,
            header=True,
            num_epochs=num_epochs,
            batch_size=batch_size,
            shuffle=True if mode == tf.estimator.ModeKeys.TRAIN else False            
        ).make_one_shot_iterator().get_next()

        target = features.pop(TARGET_NAME)
        return features, target
    
    return _input_fn

### Serving Function

In [4]:
def make_serving_fn():
    
    inputs = {
        feature_name: tf.placeholder(name=feature_name, shape=[None], dtype=tf.float32)
        for feature_name in NUMERIC_FEATURE_NAMES
    }
    serving_fn = tf.estimator.export.build_raw_serving_input_receiver_fn(inputs)
    
    return serving_fn

### Estimator

In [5]:
def create_estimator(hparams, run_config):
    
    feature_columns = [
        tf.feature_column.numeric_column(feature_name) 
        for feature_name in NUMERIC_FEATURE_NAMES
    ]
    
    estimator = tf.estimator.DNNClassifier(
        n_classes=len(TARGET_LABELS),
        label_vocabulary=TARGET_LABELS, 
        feature_columns= feature_columns,
        hidden_units= hparams.hidden_units,
        config=run_config
    )
    
    return estimator

### Train and Evaluate Experiment

In [6]:
def run_experiment(hparam, run_config):
    
    train_spec = tf.estimator.TrainSpec(
        input_fn = make_input_fn(
            hparam.train_files,
            num_epochs=hparams.num_epochs,
            batch_size=hparams.batch_size,
            mode=tf.estimator.ModeKeys.TRAIN
        ),
        max_steps=hparams.max_steps
    )

    eval_spec = tf.estimator.EvalSpec(
        input_fn = make_input_fn(
            hparam.eval_files,
            batch_size=hparams.batch_size
        ),
        exporters=[tf.estimator.LatestExporter(
            name="estimate", 
            serving_input_receiver_fn=make_serving_fn(),
            exports_to_keep=1,
            as_text=False)],
        steps=hparams.eval_steps,
        throttle_secs=hparams.eval_throttle_secs,
        start_delay_secs=1
    )

    print("Removing previous artifacts...")
    if tf.gfile.Exists(run_config.model_dir):
        tf.gfile.DeleteRecursively(run_config.model_dir)

    tf.logging.set_verbosity(tf.logging.INFO)

    time_start = datetime.utcnow() 
    print("Experiment started at {}".format(time_start.strftime("%H:%M:%S")))
    print(".......................................") 

    estimator = create_estimator(hparams, run_config)

    tf.estimator.train_and_evaluate(
        estimator=estimator,
        train_spec=train_spec, 
        eval_spec=eval_spec
    )

    time_end = datetime.utcnow() 
    print(".......................................")
    print("Experiment finished at {}".format(time_end.strftime("%H:%M:%S")))
    print("")
    time_elapsed = time_end - time_start
    print("Experiment elapsed time: {} seconds".format(time_elapsed.total_seconds()))


### Experiment Parameters

In [7]:
TRAIN_DATA_FILES = 'iris/data/train-*.csv'
EVAL_DATA_FILES = 'iris/data/train-*.csv'
MODELS_LOCATION = 'iris/models'
MODEL_NAME = 'iris_classifier'
MODEL_DIR = os.path.join(MODELS_LOCATION, MODEL_NAME)

hparams  = tf.contrib.training.HParams(
    train_files=TRAIN_DATA_FILES,
    eval_files=EVAL_DATA_FILES,
    num_epochs=1000, 
    batch_size=50,
    hidden_units=[512, 512, 512, 512],
    max_steps=None,
    eval_throttle_secs=1,
    eval_steps=None
)

run_config = tf.estimator.RunConfig(
    tf_random_seed=19830610,
    save_checkpoints_steps=1000,
    keep_checkpoint_max=3,
    model_dir=MODEL_DIR
)

### Run Experiment

In [8]:
run_experiment(hparams, run_config)

Removing previous artifacts...
Experiment started at 13:46:18
.......................................
INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_global_id_in_cluster': 0, '_session_config': None, '_keep_checkpoint_max': 3, '_tf_random_seed': 19830610, '_task_type': 'worker', '_train_distribute': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x11a1f2110>, '_model_dir': 'iris/models/iris_classifier', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_master': '', '_save_checkpoints_steps': 1000, '_keep_checkpoint_every_n_hours': 10000, '_evaluation_master': '', '_service': None, '_device_fn': None, '_save_summary_steps': 100, '_num_ps_replicas': 0}
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_ch

INFO:tensorflow:global_step/sec: 97.9638
INFO:tensorflow:loss = 5.948579, step = 2301 (1.020 sec)
INFO:tensorflow:global_step/sec: 100.675
INFO:tensorflow:loss = 0.7604561, step = 2401 (0.993 sec)
INFO:tensorflow:global_step/sec: 90.0901
INFO:tensorflow:loss = 1.6357081, step = 2501 (1.111 sec)
INFO:tensorflow:global_step/sec: 93.7716
INFO:tensorflow:loss = 3.4249454, step = 2601 (1.071 sec)
INFO:tensorflow:global_step/sec: 108.654
INFO:tensorflow:loss = 1.380976, step = 2701 (0.915 sec)
INFO:tensorflow:global_step/sec: 134.267
INFO:tensorflow:loss = 8.004151, step = 2801 (0.745 sec)
INFO:tensorflow:global_step/sec: 142.636
INFO:tensorflow:loss = 4.270151, step = 2901 (0.700 sec)
INFO:tensorflow:Saving checkpoints for 3000 into iris/models/iris_classifier/model.ckpt.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-08-13-13:46:50
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from iris/models

## 2. Inspect the Exported SavedModel

In [9]:
%%bash

saved_models_base=iris/models/iris_classifier/export/estimate/
saved_model_dir=${saved_models_base}$(ls ${saved_models_base} | tail -n 1)
echo ${saved_model_dir}
ls ${saved_model_dir}
saved_model_cli show --dir=${saved_model_dir} --all

iris/models/iris_classifier/export/estimate/1534168011
saved_model.pb
variables

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['predict']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['petal_length'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1)
        name: petal_length:0
    inputs['petal_width'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1)
        name: petal_width:0
    inputs['sepal_length'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1)
        name: sepal_length:0
    inputs['sepal_width'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1)
        name: sepal_width:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['class_ids'] tensor_info:
        dtype: DT_INT64
        shape: (-1, 1)
        name: dnn/head/predictions/ExpandDims:0
    outputs['classes'] tensor_info:
        dtype: DT_STRING
        shape: (-1, 1)
        name: d

### Prediction with SavedModel

In [44]:
def inference_test(saved_model_dir, signature="predict", size=100000):

    tf.logging.set_verbosity(tf.logging.ERROR)
    
    time_start = datetime.utcnow() 
    
    predictor = tf.contrib.predictor.from_saved_model(
        export_dir = saved_model_dir,
        signature_def_key=signature
    )
    
    output = predictor(
        {
            'sepal_length': range(size), 
            'sepal_width': range(size),  
            'petal_length': range(size), 
            'petal_width': range(size)

        }
    )
    
    
    time_end = datetime.utcnow() 

    time_elapsed = time_end - time_start
    print("Inference elapsed time: {} seconds".format(time_elapsed.total_seconds()))
    
    print "Sample Prediction output:"
    for key in output.keys():
        print "{}: {}".format(key,output[key][0])

    print(len(output['probabilities']))

## 3. Test Prediction with SavedModel 

In [45]:
export_dir = os.path.join(MODEL_DIR, "export/estimate")
saved_model_dir = os.path.join(export_dir, os.listdir(export_dir)[-1]) 
print(saved_model_dir)
print ""

inference_test(saved_model_dir)

iris/models/iris_classifier/export/estimate/1534168011

Inference elapsed time: 3.291696 seconds
Sample Prediction output:
probabilities: [4.1754058e-05 5.3128667e-07 9.9995768e-01]
logits: [-1.0639172 -5.428168   9.019754 ]
classes: ['versicolor']
class_ids: [2]
100000


### Describe GraphDef

In [12]:
def describe_graph(graph_def, show_nodes=False):
    
    print 'Input Features: {}'.format([node.name for node in graph_def.node if node.op=='Placeholder'])
    print 'Output Probabilities: {}'.format( [node.name for node in graph_def.node if node.op=='Softmax'])
    print 'Constant Count: {}'.format( len([node for node in graph_def.node if node.op=='Const']))
    print 'Variable Count: {}'.format( len([node for node in graph_def.node if 'Variable' in node.op]))
    print 'Identity Count: {}'.format( len([node for node in graph_def.node if node.op=='Identity']))
    print 'Total nodes: {}'.format( len(graph_def.node))
    print ''
    
    if show_nodes==True:
        for node in graph_def.node:
            print(node.op, node.name)
           #print(node.op, node.name, node.attr['value'].tensor)
   

## 4. Describe the SavedModel Graph (before optimisation)

### Load GraphDef from a SavedModel Directory

In [13]:
def get_graph_def_from_saved_model(saved_model_dir):
    
    print saved_model_dir
    print ""
    
    from tensorflow.python.saved_model import tag_constants
    
    with tf.Session() as session:
        meta_graph_def = tf.saved_model.loader.load(
            session,
            tags=[tag_constants.SERVING],
            export_dir=saved_model_dir
        )
        
    return meta_graph_def.graph_def

In [14]:
describe_graph(get_graph_def_from_saved_model(saved_model_dir))

iris/models/iris_classifier/export/estimate/1534168011

Input Features: [u'sepal_width', u'petal_width', u'sepal_length', u'petal_length']
Output Probabilities: [u'dnn/head/predictions/probabilities']
Constant Count: 84
Variable Count: 11
Identity Count: 23
Total nodes: 252



### Get model size

In [15]:
def get_size(model_dir):
    
    print model_dir
    print ""
    
    pb_size = os.path.getsize(os.path.join(model_dir,'saved_model.pb'))
    
    variables_size = 0
    if os.path.exists(os.path.join(model_dir,'variables/variables.data-00000-of-00001')):
        variables_size = os.path.getsize(os.path.join(model_dir,'variables/variables.data-00000-of-00001'))
        variables_size += os.path.getsize(os.path.join(model_dir,'variables/variables.index'))

    print "Model siz: {} KB".format(round(pb_size/(1024.0),3))
    print "Variables size: {} KB".format(round( variables_size/(1024.0),3))
    print "Total Size: {} KB".format(round((pb_size + variables_size)/(1024.0),3))
    

In [16]:
get_size(saved_model_dir)

iris/models/iris_classifier/export/estimate/1534168011

Model siz: 51.087 KB
Variables size: 3094.483 KB
Total Size: 3145.57 KB


## 5. Freeze SavedModel

This function will convert the SavedModel into a GraphDef file (freezed_model.pb), and storing the variables as constrant to the freezed_model.pb

You need to define the graph output nodes for freezing.

In [17]:
def freeze_graph(saved_model_dir):
    
    from tensorflow.python.tools import freeze_graph
    from tensorflow.python.saved_model import tag_constants
    
    output_graph_filename = os.path.join(saved_model_dir, "freezed_model.pb")
    output_node_names = "dnn/head/predictions/probabilities"
    #output_node_names += ", dnn/head/predictions/ExpandDims, dnn/head/predictions/class_string_lookup_Lookup"
    initializer_nodes = ""

    freeze_graph.freeze_graph(
        input_saved_model_dir=saved_model_dir,
        output_graph=output_graph_filename,
        saved_model_tags = tag_constants.SERVING,
        output_node_names=output_node_names,
        initializer_nodes=initializer_nodes,

        input_graph=None, 
        input_saver=False,
        input_binary=False, 
        input_checkpoint=None, 
        restore_op_name=None, 
        filename_tensor_name=None, 
        clear_devices=False,
        input_meta_graph=False,
    )
    
    print "SavedModel graph freezed!"

In [18]:
freeze_graph(saved_model_dir)

SavedModel graph freezed!


In [19]:
%%bash
saved_models_base=iris/models/iris_classifier/export/estimate/
saved_model_dir=${saved_models_base}$(ls ${saved_models_base} | tail -n 1)
echo ${saved_model_dir}
ls ${saved_model_dir}

iris/models/iris_classifier/export/estimate/1534168011
freezed_model.pb
saved_model.pb
variables


## 6. Describe the freezed_model.pb Graph (after freezing)

### Load GraphDef from GraphDef File

In [20]:
def get_graph_def_from_file(graph_filepath):
    
    print graph_filepath
    print ""
    
    from tensorflow.python import ops
    
    with ops.Graph().as_default():
        with tf.gfile.GFile(graph_filepath, "rb") as f:
            graph_def = tf.GraphDef()
            graph_def.ParseFromString(f.read())
            
            return graph_def
            

In [21]:
freezed_filepath=os.path.join(saved_model_dir,'freezed_model.pb')
describe_graph(get_graph_def_from_file(freezed_filepath))

iris/models/iris_classifier/export/estimate/1534168011/freezed_model.pb

Input Features: [u'sepal_width', u'petal_width', u'sepal_length', u'petal_length']
Output Probabilities: [u'dnn/head/predictions/probabilities']
Constant Count: 31
Variable Count: 0
Identity Count: 20
Total nodes: 91



## 8. Optimise the freezed_model.pb

Note that, the optimised graph will replace freezed_model.pb

### Optimise GraphDef

In [22]:
def optimize_graph(model_dir, graph_filename, transforms):
    
    from tensorflow.tools.graph_transforms import TransformGraph
    
    input_names = []
    output_names = ['dnn/head/predictions/probabilities']
    
    graph_def = get_graph_def_from_file(os.path.join(model_dir, graph_filename))
    optimised_graph_def = TransformGraph(graph_def, 
                                         input_names,
                                         output_names,
                                         transforms 
                                        )
    tf.train.write_graph(optimised_graph_def,
                        logdir=model_dir,
                        as_text=False,
                        name='optimised_model.pb')
    
    print "Freezed graph optimised!"

In [23]:
transforms = [
    'remove_nodes(op=Identity)', 
    'fold_constants(ignore_errors=true)',
    'fold_batch_norms',
    'fold_old_batch_norms',
    'round_weights(num_steps=256)',
    'quantize_weights', 
#   'quantize_nodes',
#   'merge_duplicate_nodes',
    'strip_unused_nodes', 
    'sort_by_execution_order'
]

optimize_graph(saved_model_dir, 'freezed_model.pb', transforms)

iris/models/iris_classifier/export/estimate/1534168011/freezed_model.pb

Freezed graph optimised!


In [24]:
%%bash
saved_models_base=iris/models/iris_classifier/export/estimate/
saved_model_dir=${saved_models_base}$(ls ${saved_models_base} | tail -n 1)
echo ${saved_model_dir}
ls ${saved_model_dir}

iris/models/iris_classifier/export/estimate/1534168011
freezed_model.pb
optimised_model.pb
saved_model.pb
variables


## 8. Describe the Optimised Graph

In [25]:
optimised_filepath=os.path.join(saved_model_dir,'optimised_model.pb')
describe_graph(get_graph_def_from_file(optimised_filepath))

iris/models/iris_classifier/export/estimate/1534168011/optimised_model.pb

Input Features: [u'petal_length', u'sepal_length', u'petal_width', u'sepal_width']
Output Probabilities: [u'dnn/head/predictions/probabilities']
Constant Count: 41
Variable Count: 0
Identity Count: 0
Total nodes: 86



## 9. Convert (optimised) graph (GraphDef) to SavedModel

In [48]:
def convert_graph_def_to_saved_model(graph_filepath):

    from tensorflow.python import ops
    export_dir=os.path.join(saved_model_dir,'optimised')

    if tf.gfile.Exists(export_dir):
        tf.gfile.DeleteRecursively(export_dir)

    graph_def = get_graph_def_from_file(graph_filepath)
    
    with tf.Session(graph=tf.Graph()) as session:
        tf.import_graph_def(graph_def, name="")
        tf.saved_model.simple_save(session,
                export_dir,
                inputs={
                    node.name: session.graph.get_tensor_by_name("{}:0".format(node.name)) 
                    for node in graph_def.node if node.op=='Placeholder'},
                outputs={
                    "probabilities": session.graph.get_tensor_by_name("dnn/head/predictions/probabilities:0"),
                    #"class_ids": g.get_tensor_by_name("dnn/head/predictions/ExpandDims:0"),
                    #"classes": g.get_tensor_by_name("dnn/head/predictions/class_string_lookup_Lookup:0"),
                }
            )

        print "Optimised graph converted to SavedModel!"

In [49]:
optimised_filepath=os.path.join(saved_model_dir,'optimised_model.pb')
convert_graph_def_to_saved_model(optimised_filepath)

iris/models/iris_classifier/export/estimate/1534168011/optimised_model.pb

Optimised graph converted to SavedModel!


### Optimised SavedModel Size

In [50]:
optimised_saved_model_dir = os.path.join(saved_model_dir,'optimised') 
get_size(optimised_saved_model_dir)

iris/models/iris_classifier/export/estimate/1534168011/optimised

Model siz: 796.803 KB
Variables size: 0.0 KB
Total Size: 796.803 KB


In [51]:
%%bash

saved_models_base=iris/models/iris_classifier/export/estimate/
saved_model_dir=${saved_models_base}$(ls ${saved_models_base} | tail -n 1)/optimised
ls ${saved_model_dir}
saved_model_cli show --dir ${saved_model_dir} --all

saved_model.pb
variables

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['petal_length'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1)
        name: petal_length:0
    inputs['petal_width'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1)
        name: petal_width:0
    inputs['sepal_length'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1)
        name: sepal_length:0
    inputs['sepal_width'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1)
        name: sepal_width:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['probabilities'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 3)
        name: dnn/head/predictions/probabilities:0
  Method name is: tensorflow/serving/predict


## 10. Prediction with the Optimised SavedModel

In [47]:
freezed_saved_model_dir = os.path.join(saved_model_dir,'optimised') 
print(freezed_saved_model_dir)
inference_test(saved_model_dir=freezed_saved_model_dir, signature='serving_default')

iris/models/iris_classifier/export/estimate/1534168011/optimised
Inference elapsed time: 2.710049 seconds
Sample Prediction output:
probabilities: [4.0991668e-05 4.2484294e-07 9.9995863e-01]
100000
