# MobileNet-SSD Object Detection Example
This example demonstrates the workflow to convert a publicly available TensorFlow model for object detection into CoreML, and verify its numerical correctness against the TensorFlow model.

We recommend you go through the MNIST example (linear_mnist_example.ipynb) and Inception V3 example before this one, as they contain important documentation for the workflow.

We use a MobileNet + SSD model provided by Google, which can be downloaded at this URL:
https://storage.googleapis.com/download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_android_export.zip

Please refer to the [TensorFlow Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection) for more details.

Also, please refer to [here](https://developer.apple.com/documentation/coreml) for detailed documentation of CoreML.

In [None]:
from __future__ import print_function
import os, sys, zipfile
from os.path import dirname
import numpy as np
import tensorflow as tf
from tensorflow.core.framework import graph_pb2

In [None]:
# Download the model and class label package
mobilenet_ssd_url = 'https://storage.googleapis.com/download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_android_export.zip'
example_dir = '/tmp/tfcoreml_ssd_example/'
if not os.path.exists(example_dir):
    os.makedirs(example_dir)
mobilenet_ssd_fpath = example_dir + 'ssd_mobilenet_v1_android_export.zip'
if sys.version_info[0] < 3:
    import urllib
    urllib.urlretrieve(mobilenet_ssd_url, mobilenet_ssd_fpath)
else:
    import urllib.request
    urllib.request.urlretrieve(mobilenet_ssd_url, mobilenet_ssd_fpath)
zip_ref = zipfile.ZipFile(mobilenet_ssd_fpath, 'r')
zip_ref.extractall(example_dir)
zip_ref.close()

In [None]:
# Load the TF graph definition
tf_model_path = example_dir + 'ssd_mobilenet_v1_android_export.pb'
with open(tf_model_path, 'rb') as f:
    serialized = f.read()
tf.reset_default_graph()
original_gdef = tf.GraphDef()
original_gdef.ParseFromString(serialized)

with tf.Graph().as_default() as g:
    tf.import_graph_def(original_gdef, name='')

The full MobileNet-SSD TF model contains 4 subgraphs: *Preprocessor*, *FeatureExtractor*, *MultipleGridAnchorGenerator*, and *Postprocessor*. Here we will extract the *FeatureExtractor* from the model and strip off the other subgraphs, as these subgraphs contain structures not currently supported in CoreML. The tasks in *Preprocessor*, *MultipleGridAnchorGenerator* and *Postprocessor* subgraphs can be achieved by other means, although they are non-trivial.

By inspecting TensorFlow GraphDef, it can be found that:
(1) the input tensor of MobileNet-SSD Feature Extractor is `Preprocessor/sub:0` of shape `(1,300,300,3)`, which contains the preprocessed image.
(2) The output tensors are: `concat:0` of shape `(1,1917,4)`, the box coordinate encoding for each of the 1917 anchor boxes; and `concat_1:0` of shape `(1,1917,91)`, the confidence scores (logits) for each of the 91 object classes (including 1 class for background), for each of the 1917 anchor boxes.
So we extract the feature extractor out as follows:

In [None]:
# Strip unused subgraphs and save it as another frozen TF model
from tensorflow.python.tools import strip_unused_lib
from tensorflow.python.framework import dtypes
from tensorflow.python.platform import gfile
input_node_names = ['Preprocessor/sub']
output_node_names = ['concat', 'concat_1']
gdef = strip_unused_lib.strip_unused(
        input_graph_def = original_gdef,
        input_node_names = input_node_names,
        output_node_names = output_node_names,
        placeholder_type_enum = dtypes.float32.as_datatype_enum)
# Save the feature extractor to an output file
frozen_model_file = example_dir + 'ssd_mobilenet_feature_extractor.pb'
with gfile.GFile(frozen_model_file, "wb") as f:
    f.write(gdef.SerializeToString())


In [None]:
# Now we have a TF model ready to be converted to CoreML
import tfcoreml
# Supply a dictionary of input tensors' name and shape (with # batch axis)
input_tensor_shapes = {"Preprocessor/sub:0":[1,300,300,3]} # batch size is 1
# Output CoreML model path
coreml_model_file = example_dir + 'ssd_mobilenet_feature_extractor.mlmodel'
# The TF model's ouput tensor name
output_tensor_names = ['concat:0', 'concat_1:0']

# Call the converter. This may take a while
coreml_model = tfcoreml.convert(
        tf_model_path=frozen_model_file,
        mlmodel_path=coreml_model_file,
        input_name_shape_dict=input_tensor_shapes,
        output_feature_names=output_tensor_names)

# CoreML saved at location: /tmp/tfcoreml_ssd_example/ssd_mobilenet_feature_extractor.mlmodel

Now that we have converted the model to CoreML, we can test its numerical correctness by comparing it with TensorFlow model. 

In [None]:
# Load an image as input
import PIL.Image
import requests
from io import BytesIO
from matplotlib.pyplot import imshow
img_url = 'https://upload.wikimedia.org/wikipedia/commons/9/93/Golden_Retriever_Carlos_%2810581910556%29.jpg'
response = requests.get(img_url)
%matplotlib inline
img = PIL.Image.open(BytesIO(response.content))
imshow(np.asarray(img))

In [None]:
# Preprocess the image - normalize to [-1,1]
img = img.resize([300,300], PIL.Image.ANTIALIAS)
img_array = np.array(img).astype(np.float32) * 2.0 / 255 - 1
batch_img_array = img_array[None,:,:,:]

# Evaluate TF
tf.reset_default_graph()
g = tf.import_graph_def(gdef)

tf_input_name = 'Preprocessor/sub:0'
# concat:0 are the bounding-box encodings of the 1917 anchor boxes
# concat_1:0 are the confidence scores of 91 classes of anchor boxes
tf_output_names = ['concat:0', 'concat_1:0']
with tf.Session(graph = g) as sess:
    image_input_tensor = sess.graph.get_tensor_by_name("import/" + tf_input_name)
    tf_output_tensors = [sess.graph.get_tensor_by_name("import/" + output_name)
                         for output_name in tf_output_names]
    tf_output_values = sess.run(tf_output_tensors, 
                                feed_dict={image_input_tensor: batch_img_array})
    tf_box_encodings, tf_scores = tf_output_values


Now we evaluate CoreML model and compare result against TensorFlow model.
CoreML uses 5D arrays to represent rank-1 to rank-5 tensors. The 5 axes are in the order of `(S,B,C,H,W)`, where S is sequence length, B is batch size, C is number of channels, H is height and W is width. This data layout is usually different from TensorFlow's default layout, where a rank-4 tensor for convolutional nets usually uses `(B,H,W,C)` layout. To make a comparison, one of the result should be transposed.

In [None]:
import coremltools
# Input shape should be [1,1,3,300,300]
mlmodel_path = example_dir + 'ssd_mobilenet_feature_extractor.mlmodel'
img_array_coreml = np.transpose(img_array, (2,0,1))[None,None,:,:,:]
mlmodel = coremltools.models.MLModel(mlmodel_path)
# Pay attention to '__0'. We change ':0' to '__0' to make sure MLModel's 
# generated Swift/Obj-C code is semantically correct
coreml_input_name = tf_input_name.replace(':', '__').replace('/', '__')
coreml_output_names = [output_name.replace(':', '__').replace('/', '__') 
                       for output_name in tf_output_names]
coreml_input = {coreml_input_name: img_array_coreml}

# When useCPUOnly == True, Relative error should be around 0.001
# When useCPUOnly == False on GPU enabled devices, relative errors 
# are expected to be larger due to utilization of lower-precision arithmetics

coreml_outputs_dict = mlmodel.predict(coreml_input, useCPUOnly=True)
coreml_outputs = [coreml_outputs_dict[out_name] for out_name in 
                  coreml_output_names]
coreml_box_encodings, coreml_scores = coreml_outputs


In [None]:
# Now we compare the differences of two results
def max_relative_error(x,y):
    den = np.maximum(x,y)
    den = np.maximum(den,1)
    rel_err = (np.abs(x-y))/den
    return np.max(rel_err)

rel_error_box = max_relative_error(coreml_box_encodings.squeeze(), 
        np.transpose(tf_box_encodings.squeeze(),(1,0)))
rel_error_score = max_relative_error(coreml_scores.squeeze(), 
        np.transpose(tf_scores.squeeze(),(1,0)))

In [None]:
print('Max relative error on box encoding: %f' %(rel_error_box))
print('Max relative error on scores: %f' %(rel_error_score))

Up to this point we have converted the MobileNet-SSD feature extractor. The remaining tasks are post-processing tasks including generating anchor boxes, decoding the bounding-boxes, and performing non-maximum suppression. These necessary tasks are not trivial; however, CoreML does not contain out-of-the-box support for these tasks at this time developers should write their own post-processing code.