# Optimising a TensorFlow SavedModel for Serving

This notebooks shows how to optimise the TensorFlow exported SavedModel by **shrinking** its size (to have less memory and disk footprints), and **improving** prediction latency. This can be accopmlished by applying the following:
* **Freezing**: That is, converting the variables stored in a checkpoint file of the SavedModel into constants stored directly in the model graph.
* **Pruning**: That is, stripping unused nodes during the prediction path of the graph, merging duplicate nodes, as well as removing other node ops like summary, identity, etc.
* **Quantisation**:  That is, converting any large float Const op into an eight-bit equivalent, followed by a float conversion op so that the result is usable by subsequent nodes.
* **Other refinements**: That includes constant folding, batch_norm folding, fusing convolusion, etc.

The optimisation operations we apply in this example are from the TensorFlow [Graph Conversion Tool](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md#fold_constants), which is a c++ command-line tool. We use the Python APIs to call the c++ libraries. 

The Graph Transform Tool is designed to work on models that are saved as GraphDef files, usually in a binary protobuf format. However, the model exported after training and estimator is in SavedModel format (saved_model.pb file + variables folder with variables.data-* and variables.index files). 

We need to optimise the mode and keep it the SavedModel format. Thus, the optimisation steps will be:
1. Freeze the SavedModel: SavedModel -> GraphDef
2. Optimisae the freezed model: GraphDef -> GraphDef
3. Convert the optimised freezed model to SavedModel: GraphDef -> SavedModel

In [1]:
import os
import numpy as np
from datetime import datetime

import tensorflow as tf

print "TensorFlow : {}".format(tf.__version__)

TensorFlow : 1.10.0


## 1. Train and Export a Keras Model

### 1.1 Import Data

In [2]:
mnist = tf.contrib.learn.datasets.load_dataset("mnist")
train_data = mnist.train.images
train_labels = np.asarray(mnist.train.labels, dtype=np.int32)
eval_data = mnist.test.images
eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)
NUM_CLASSES = 10

Instructions for updating:
Please use tf.data.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST-data/train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST-data/train-labels-idx1-ubyte.gz
Extracting MNIST-data/t10k-images-idx3-ubyte.gz
Extracting MNIST-data/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


In [3]:
print "Train data shape: {}".format(train_data.shape)
print "Eval data shape: {}".format(eval_data.shape)

Train data shape: (55000, 784)
Eval data shape: (10000, 784)


### 1.2 Estimator

#### 1.2.1 Keras Model Function

In [4]:
def keras_model_fn(params):
    
    inputs = tf.keras.layers.Input(shape=(None,784), name='input_image')
    input_layer = tf.keras.layers.Reshape(target_shape=(28, 28, 1), name='reshape')(inputs)
    
     # convolutional layers
    _inputs = input_layer
    for i in range(params.num_conv_layers):
        
        current_filters = params.init_filters * (2**i)

        conv = tf.keras.layers.Conv2D(kernel_size=3, filters=current_filters, strides=1, 
                                         padding='SAME', name='conv-{}'.format(i+1))(_inputs)
        
        max_pool = tf.keras.layers.MaxPool2D(pool_size=2, strides=2, 
                                            padding='SAME', name='pool-{}'.format(i+1))(conv)
        
        batch_norm = tf.keras.layers.BatchNormalization(name='batch_norm-{}'.format(i+1))(max_pool)
        
        drop_out = tf.keras.layers.Dropout(params.drop_out, name='drop_out-{}'.format(i+1))(batch_norm)
        
        _inputs = drop_out

    flatten = tf.keras.layers.Flatten(name='flatten')(_inputs)
    
    # fully-connected layers
    _inputs = flatten
    for i in range(len(params.hidden_units)):
        
        dense = tf.keras.layers.Dense(units=params.hidden_units[i], name='dense-{}'.format(i+1))(_inputs)
        activation = tf.keras.layers.Activation('relu')(dense)
        drop_out = tf.keras.layers.Dropout(params.drop_out)(activation)
        
        _inputs = drop_out
        
    # unused layer
    unused = tf.keras.layers.Dense(units=100, name='unused')(_inputs)
    
    # softmax classifier
    logits = tf.keras.layers.Dense(units=NUM_CLASSES, name='logits')(_inputs)
    softmax = tf.keras.layers.Activation('softmax', name='softmax')(logits)

    # keras model
    model = tf.keras.models.Model(inputs, softmax)
    return model

#### 1.2.2 Convert Keras model to Estimator

In [5]:
def create_estimator(params, run_config):
    
    keras_model = keras_model_fn(params)
    print keras_model.summary()
    optimizer = tf.keras.optimizers.Adam(lr=params.learning_rate)
    keras_model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    mnist_classifier = tf.keras.estimator.model_to_estimator(
        keras_model=keras_model,
        config=run_config
    )
    
    return mnist_classifier

### 1.3 Train and Evaluate

#### 1.3.1 Experiment Function

In [6]:
def run_experiment(params, run_config):
    
    train_spec = tf.estimator.TrainSpec(
        input_fn = tf.estimator.inputs.numpy_input_fn(
            x={"input_image": train_data},
            y=train_labels,
            batch_size=params.batch_size,
            num_epochs=None,
            shuffle=True),
        max_steps=params.max_traning_steps
    )

    eval_spec = tf.estimator.EvalSpec(
        input_fn = tf.estimator.inputs.numpy_input_fn(
            x={"input_image": eval_data},
            y=eval_labels,
            batch_size=params.batch_size,
            num_epochs=1,
            shuffle=False),
        steps=None,
        throttle_secs=params.eval_throttle_secs
    )

    tf.logging.set_verbosity(tf.logging.INFO)

    time_start = datetime.utcnow() 
    print("Experiment started at {}".format(time_start.strftime("%H:%M:%S")))
    print(".......................................") 

    estimator = create_estimator(params, run_config)

    tf.estimator.train_and_evaluate(
        estimator=estimator,
        train_spec=train_spec, 
        eval_spec=eval_spec
    )

    time_end = datetime.utcnow() 
    print(".......................................")
    print("Experiment finished at {}".format(time_end.strftime("%H:%M:%S")))
    print("")
    time_elapsed = time_end - time_start
    print("Experiment elapsed time: {} seconds".format(time_elapsed.total_seconds()))
    
    return estimator


#### 1.3.2  Experiment Parameters

In [7]:
MODELS_LOCATION = 'models/mnist'
MODEL_NAME = 'keras_classifier'
model_dir = os.path.join(MODELS_LOCATION, MODEL_NAME)

print model_dir

params  = tf.contrib.training.HParams(
    batch_size=100,
    hidden_units=[512, 512],
    num_conv_layers=3, 
    init_filters=64,
    drop_out=0.2,
    max_traning_steps=50,
    eval_throttle_secs=10,
    learning_rate=1e-3,
    debug=True
)

run_config = tf.estimator.RunConfig(
    tf_random_seed=19830610,
    save_checkpoints_steps=1000,
    keep_checkpoint_max=3,
    model_dir=model_dir
)

models/mnist/keras_classifier


### TensorFlow Graph 
![](tf-graph.jpeg)

#### 1.3.3 Run Experiment

In [8]:
if tf.gfile.Exists(model_dir):
    print("Removing previous artifacts...")
    tf.gfile.DeleteRecursively(model_dir)
    
estimator = run_experiment(params, run_config)

Removing previous artifacts...
Experiment started at 12:05:41
.......................................
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_image (InputLayer)     (None, None, 784)         0         
_________________________________________________________________
reshape (Reshape)            (None, 28, 28, 1)         0         
_________________________________________________________________
conv-1 (Conv2D)              (None, 28, 28, 64)        640       
_________________________________________________________________
pool-1 (MaxPooling2D)        (None, 14, 14, 64)        0         
_________________________________________________________________
batch_norm-1 (BatchNormaliza (None, 14, 14, 64)        256       
_________________________________________________________________
drop_out-1 (Dropout)         (None, 14, 14, 64)        0         
________________________________________

### 1.4 Export the model

In [9]:
def make_serving_input_receiver_fn():
    inputs = {'input_image': tf.placeholder(shape=[None,784], dtype=tf.float32, name='input_image')}
    return tf.estimator.export.build_raw_serving_input_receiver_fn(inputs)

export_dir = os.path.join(model_dir, 'export')

if tf.gfile.Exists(export_dir):
    tf.gfile.DeleteRecursively(export_dir)
        
estimator.export_savedmodel(
    export_dir_base=export_dir,
    serving_input_receiver_fn=make_serving_input_receiver_fn()
)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Signatures INCLUDED in export for Eval: None
INFO:tensorflow:Signatures INCLUDED in export for Classify: None
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: ['serving_default']
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Restoring parameters from models/mnist/keras_classifier/model.ckpt-50
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: models/mnist/keras_classifier/export/temp-1538827587/saved_model.pb


'models/mnist/keras_classifier/export/1538827587'

## 2. Inspect the Exported SavedModel

In [10]:
%%bash

saved_models_base=models/mnist/keras_classifier/export/
saved_model_dir=${saved_models_base}$(ls ${saved_models_base} | tail -n 1)
echo ${saved_model_dir}
ls ${saved_model_dir}
saved_model_cli show --dir=${saved_model_dir} --all

models/mnist/keras_classifier/export/1538827587
saved_model.pb
variables

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input_image'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 784)
        name: input_image_1:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['softmax'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 10)
        name: softmax/Softmax:0
  Method name is: tensorflow/serving/predict


### Prediction with SavedModel

In [11]:
def inference_test(saved_model_dir, signature="serving_default", input_name='input_image', batch=200, repeat=200):

    tf.logging.set_verbosity(tf.logging.ERROR)
    
    time_start = datetime.utcnow() 
    
    predictor = tf.contrib.predictor.from_saved_model(
        export_dir = saved_model_dir,
        signature_def_key=signature
    )
    time_end = datetime.utcnow() 
        
    time_elapsed = time_end - time_start
   
    print ""
    print("Model loading time: {} seconds".format(time_elapsed.total_seconds()))
    print ""
    
    time_start = datetime.utcnow() 
    output = None
    for i in range(repeat):
        predictions = predictor(
            {
                input_name: eval_data[:batch]
            }
        )
        
        output=[np.argmax(prediction) for prediction in predictions['softmax']]
    
    time_end = datetime.utcnow() 

    time_elapsed_sec = (time_end - time_start).total_seconds()
    
    #print predictions

    print "Inference elapsed time: {} seconds".format(time_elapsed_sec)
    print ""
    
    print "Prediction produced for {} instances batch, repeated {} times".format(len(output), repeat)
    print "Average latency per batch: {} seconds".format(time_elapsed_sec/repeat)
    print ""
    
    #print "Prediction output for the last batch: {}".format(output)

## 3. Test Prediction with SavedModel 

In [12]:
saved_model_dir = os.path.join(export_dir, os.listdir(export_dir)[-1]) 
print(saved_model_dir)
inference_test(saved_model_dir)

models/mnist/keras_classifier/export/1538827587

Model loading time: 0.146573 seconds

Inference elapsed time: 51.426721 seconds

Prediction produced for 200 instances batch, repeated 200 times
Average latency per batch: 0.257133605 seconds



### Describe GraphDef

In [13]:
def describe_graph(graph_def, show_nodes=False):
    
    print 'Input Feature Nodes: {}'.format([node.name for node in graph_def.node if node.op=='Placeholder'])
    print ""
    print 'Unused Nodes: {}'.format([node.name for node in graph_def.node if 'unused'  in node.name])
    print ""
    print 'Output Nodes: {}'.format( [node.name for node in graph_def.node if 'predictions' in node.name])
    print ""
    print 'Quanitization Nodes: {}'.format( [node.name for node in graph_def.node if 'quant' in node.name])
    print ""
    print 'Constant Count: {}'.format( len([node for node in graph_def.node if node.op=='Const']))
    print ""
    print 'Variable Count: {}'.format( len([node for node in graph_def.node if 'Variable' in node.op]))
    print ""
    print 'Identity Count: {}'.format( len([node for node in graph_def.node if node.op=='Identity']))
    print ""
    print 'Total nodes: {}'.format( len(graph_def.node))
    print ''
    
    if show_nodes==True:
        for node in graph_def.node:
            print 'Op:{} - Name: {}'.format(node.op, node.name)

## 4. Describe the SavedModel Graph (before optimisation)

### Load GraphDef from a SavedModel Directory

In [14]:
def get_graph_def_from_saved_model(saved_model_dir):
    
    print saved_model_dir
    print ""
    
    from tensorflow.python.saved_model import tag_constants
    
    with tf.Session() as session:
        meta_graph_def = tf.saved_model.loader.load(
            session,
            tags=[tag_constants.SERVING],
            export_dir=saved_model_dir
        )
        
    return meta_graph_def.graph_def

In [15]:
describe_graph(get_graph_def_from_saved_model(saved_model_dir))

models/mnist/keras_classifier/export/1538827587

Input Feature Nodes: [u'input_image_1', u'input_image']

Unused Nodes: []

Output Nodes: []

Quanitization Nodes: []

Constant Count: 61

Variable Count: 97

Identity Count: 33

Total nodes: 308



### Get model size

In [16]:
def get_size(model_dir):
    
    print model_dir
    print ""
    
    pb_size = os.path.getsize(os.path.join(model_dir,'saved_model.pb'))
    
    variables_size = 0
    if os.path.exists(os.path.join(model_dir,'variables/variables.data-00000-of-00001')):
        variables_size = os.path.getsize(os.path.join(model_dir,'variables/variables.data-00000-of-00001'))
        variables_size += os.path.getsize(os.path.join(model_dir,'variables/variables.index'))

    print "Model size: {} KB".format(round(pb_size/(1024.0),3))
    print "Variables size: {} KB".format(round( variables_size/(1024.0),3))
    print "Total Size: {} KB".format(round((pb_size + variables_size)/(1024.0),3))
    

In [17]:
get_size(saved_model_dir)

models/mnist/keras_classifier/export/1538827587

Model size: 54.571 KB
Variables size: 10691.968 KB
Total Size: 10746.539 KB


## 5. Freeze SavedModel

This function will convert the SavedModel into a GraphDef file (freezed_model.pb), and storing the variables as constrant to the freezed_model.pb

You need to define the graph output nodes for freezing. We are only interested in the output of **softmax/Softmax** node

In [18]:
def freeze_graph(saved_model_dir):
    
    from tensorflow.python.tools import freeze_graph
    from tensorflow.python.saved_model import tag_constants
    
    output_graph_filename = os.path.join(saved_model_dir, "freezed_model.pb")
    output_node_names = "softmax/Softmax"
    initializer_nodes = ""

    freeze_graph.freeze_graph(
        input_saved_model_dir=saved_model_dir,
        output_graph=output_graph_filename,
        saved_model_tags = tag_constants.SERVING,
        output_node_names=output_node_names,
        initializer_nodes=initializer_nodes,

        input_graph=None, 
        input_saver=False,
        input_binary=False, 
        input_checkpoint=None, 
        restore_op_name=None, 
        filename_tensor_name=None, 
        clear_devices=False,
        input_meta_graph=False,
    )
    
    print "SavedModel graph freezed!"

In [19]:
freeze_graph(saved_model_dir)

SavedModel graph freezed!


In [20]:
%%bash
saved_models_base=models/mnist/keras_classifier/export/
saved_model_dir=${saved_models_base}$(ls ${saved_models_base} | tail -n 1)
echo ${saved_model_dir}
ls ${saved_model_dir}

models/mnist/keras_classifier/export/1538827587
freezed_model.pb
saved_model.pb
variables


## 6. Describe the freezed_model.pb Graph (after freezing)

### Load GraphDef from GraphDef File

In [21]:
def get_graph_def_from_file(graph_filepath):
    
    print graph_filepath
    print ""
    
    from tensorflow.python import ops
    
    with ops.Graph().as_default():
        with tf.gfile.GFile(graph_filepath, "rb") as f:
            graph_def = tf.GraphDef()
            graph_def.ParseFromString(f.read())
            
            return graph_def
            

In [22]:
freezed_filepath=os.path.join(saved_model_dir,'freezed_model.pb')
describe_graph(get_graph_def_from_file(freezed_filepath))

models/mnist/keras_classifier/export/1538827587/freezed_model.pb

Input Feature Nodes: [u'input_image_1']

Unused Nodes: []

Output Nodes: []

Quanitization Nodes: []

Constant Count: 34

Variable Count: 0

Identity Count: 30

Total nodes: 94



## 8. Optimise the freezed_model.pb

### Optimise GraphDef

In [23]:
def optimize_graph(model_dir, graph_filename, transforms):
    
    from tensorflow.tools.graph_transforms import TransformGraph
    
    input_names = []
    output_names = ['softmax/Softmax']
    
    graph_def = get_graph_def_from_file(os.path.join(model_dir, graph_filename))
    optimised_graph_def = TransformGraph(graph_def, 
                                         input_names,
                                         output_names,
                                         transforms 
                                        )
    tf.train.write_graph(optimised_graph_def,
                        logdir=model_dir,
                        as_text=False,
                        name='optimised_model.pb')
    
    print "Freezed graph optimised!"

In [24]:
transforms = [
    'remove_nodes(op=Identity)', 
    'fold_constants(ignore_errors=true)',
    'fuse_resize_pad_and_conv',
    'fold_batch_norms',
#     'quantize_weights',
#     'quantize_nodes',
    'merge_duplicate_nodes',
    'strip_unused_nodes', 
    'sort_by_execution_order'
]

optimize_graph(saved_model_dir, 'freezed_model.pb', transforms)

models/mnist/keras_classifier/export/1538827587/freezed_model.pb

Freezed graph optimised!


In [25]:
%%bash
saved_models_base=models/mnist/keras_classifier/export/
saved_model_dir=${saved_models_base}$(ls ${saved_models_base} | tail -n 1)
echo ${saved_model_dir}
ls ${saved_model_dir}

models/mnist/keras_classifier/export/1538827587
freezed_model.pb
optimised_model.pb
saved_model.pb
variables


## 8. Describe the Optimised Graph

In [26]:
optimised_filepath=os.path.join(saved_model_dir,'optimised_model.pb')
describe_graph(get_graph_def_from_file(optimised_filepath))

models/mnist/keras_classifier/export/1538827587/optimised_model.pb

Input Feature Nodes: [u'input_image_1']

Unused Nodes: []

Output Nodes: []

Quanitization Nodes: []

Constant Count: 29

Variable Count: 0

Identity Count: 0

Total nodes: 59



## 9. Convert Optimised graph (GraphDef) to SavedModel

In [27]:
def convert_graph_def_to_saved_model(graph_filepath):

    from tensorflow.python import ops
    export_dir=os.path.join(saved_model_dir,'optimised')

    if tf.gfile.Exists(export_dir):
        tf.gfile.DeleteRecursively(export_dir)

    graph_def = get_graph_def_from_file(graph_filepath)
    
    with tf.Session(graph=tf.Graph()) as session:
        tf.import_graph_def(graph_def, name="")
        tf.saved_model.simple_save(session,
                export_dir,
                inputs={
                    node.name: session.graph.get_tensor_by_name("{}:0".format(node.name)) 
                    for node in graph_def.node if node.op=='Placeholder'},
                outputs={
                    "softmax": session.graph.get_tensor_by_name("softmax/Softmax:0"),
                }
            )

        print "Optimised graph converted to SavedModel!"

In [28]:
optimised_filepath=os.path.join(saved_model_dir,'optimised_model.pb')
convert_graph_def_to_saved_model(optimised_filepath)

models/mnist/keras_classifier/export/1538827587/optimised_model.pb

Optimised graph converted to SavedModel!


### Optimised SavedModel Size

In [29]:
optimised_saved_model_dir = os.path.join(saved_model_dir,'optimised') 
get_size(optimised_saved_model_dir)

models/mnist/keras_classifier/export/1538827587/optimised

Model size: 10700.945 KB
Variables size: 0.0 KB
Total Size: 10700.945 KB


In [30]:
%%bash

saved_models_base=models/mnist/keras_classifier/export/
saved_model_dir=${saved_models_base}$(ls ${saved_models_base} | tail -n 1)/optimised
ls ${saved_model_dir}
saved_model_cli show --dir ${saved_model_dir} --all

saved_model.pb
variables

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input_image_1'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 784)
        name: input_image_1:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['softmax'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 10)
        name: softmax/Softmax:0
  Method name is: tensorflow/serving/predict


## 10. Prediction with the Optimised SavedModel

In [31]:
optimized_saved_model_dir = os.path.join(saved_model_dir,'optimised') 
print(optimized_saved_model_dir)
inference_test(saved_model_dir=optimized_saved_model_dir, signature='serving_default', input_name='input_image_1')

models/mnist/keras_classifier/export/1538827587/optimised

Model loading time: 0.134012 seconds

Inference elapsed time: 44.96652 seconds

Prediction produced for 200 instances batch, repeated 200 times
Average latency per batch: 0.2248326 seconds



# Cloud ML Engine Deployment and Prediction

In [None]:
PROJECT = 'ksalama-gcp-playground'
BUCKET = 'ksalama-gcs-cloudml'
REGION = 'europe-west1'
MODEL_NAME = 'mnist_classifier'

os.environ['BUCKET'] = BUCKET
os.environ['PROJECT'] = PROJECT
os.environ['REGION'] = REGION
os.environ['MODEL_NAME'] = MODEL_NAME

## 1. Upload the model artefacts to Google Cloud Storage bucket

In [None]:
%%bash

gsutil -m rm -r gs://${BUCKET}/tf-model-optimisation

In [None]:
%%bash

saved_models_base=models/mnist/keras_classifier/export/
saved_model_dir=${saved_models_base}$(ls ${saved_models_base} | tail -n 1)

echo ${saved_model_dir}

gsutil -m cp -r ${saved_model_dir} gs://${BUCKET}/tf-model-optimisation/original

In [None]:
%%bash

saved_models_base=models/mnist/keras_classifier/export/
saved_model_dir=${saved_models_base}$(ls ${saved_models_base} | tail -n 1)/optimised

echo ${saved_model_dir}

gsutil -m cp -r ${saved_model_dir} gs://${BUCKET}/tf-model-optimisation

## 2. Deploy models to Cloud ML Engine

Don't forget to delete the model and the model version if they were previously deployed!

In [None]:
%%bash

echo ${MODEL_NAME}

gcloud ml-engine models create ${MODEL_NAME} --regions=${REGION}

**Version: v_org** is the original SavedModel (before optimisation)

In [None]:
%%bash

MODEL_VERSION='v_org'
MODEL_ORIGIN=gs://${BUCKET}/tf-model-optimisation/original

gcloud ml-engine versions create ${MODEL_VERSION}\
            --model=${MODEL_NAME} \
            --origin=${MODEL_ORIGIN} \
            --runtime-version=1.10

**Version: v_opt** is the optimised SavedModel (after optimisation)

In [None]:
%%bash

MODEL_VERSION='v_opt'
MODEL_ORIGIN=gs://${BUCKET}/tf-model-optimisation/optimised

gcloud ml-engine versions create ${MODEL_VERSION}\
            --model=${MODEL_NAME} \
            --origin=${MODEL_ORIGIN} \
            --runtime-version=1.10

## 3. Cloud ML Engine online predictions

In [None]:
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials

credentials = GoogleCredentials.get_application_default()
api = discovery.build(
    'ml', 'v1', 
    credentials=credentials, 
    discoveryServiceUrl='https://storage.googleapis.com/cloud-ml/discovery/ml_v1_discovery.json'
)

    
def predict(version, instances):

    request_data = {'instances': instances}

    model_url = 'projects/{}/models/{}/versions/{}'.format(PROJECT, MODEL_NAME, version)
    response = api.projects().predict(body=request_data, name=model_url).execute()

    class_ids = None
    
    try:
        class_ids = [item["class_ids"] for item in response["predictions"]]
    except:
        print response
    
    return class_ids

In [None]:
def inference_cmle(version, batch=100, repeat=10):
    
    instances = [
            {'input_image_3': [float(i) for i in list(eval_data[img])] }
        for img in range(batch)
    ]

    #warmup request
    predict(version, instances[0])
    print 'Warm up request performed!'
    print 'Timer started...'
    print ''
    
    time_start = datetime.utcnow() 
    output = None
    
    for i in range(repeat):
        output = predict(version, instances)
    
    time_end = datetime.utcnow() 

    time_elapsed_sec = (time_end - time_start).total_seconds()
    
    print "Inference elapsed time: {} seconds".format(time_elapsed_sec)
    print ""
    
    print "Prediction produced for {} instances batch, repeated {} times".format(len(output), repeat)
    print "Average latency per batch: {} seconds".format(time_elapsed_sec/repeat)
    print ""
    
    print "Prediction output for the last instance: {}".format(output[0])


In [None]:
version='v_org'
inference_cmle(version)

In [None]:
version='v_opt'
inference_cmle(version)

## Happy serving!