In this tutorial, we'll take a SSD(Lite) Mobilenet v2 model and deploy it on an iOS device using CoreML. The ssdlite_mobilenet_v2_coco downloaded folder contains the trained SSD model in a few different formats: a frozen graph, a checkpoint, and a SavedModel. tfcoreml needs to use a frozen graph but the downloaded one gives errors — it contains “cycles” or loops, which are a no-go for tfcoreml. We’ll use the SavedModel and convert it to a frozen graph without cycles.

In [1]:
# Import Dependencies
import tensorflow as tf
from tensorflow.python.tools import strip_unused_lib
from tensorflow.python.framework import dtypes
from tensorflow.python.platform import gfile

In [2]:
# Function to Load saved tensorlfow model into a new graph
def load_model(model_path=None):
    # Define a new graph
    graph = tf.Graph()
    # Load the saved model graph into new graph
    with tf.Session(graph=graph) as sess:
        tf.saved_model.loader.load(sess, [tf.saved_model.tag_constants.SERVING], model_path)
    return graph

In [3]:
# Get Saved model graph
saved_model = './ssdlite_mobilenet_v2_coco_2018_05_09/saved_model'
graph = load_model(model_path=saved_model)

INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:The specified SavedModel has no variables; no checkpoints were restored.


Next, we’ll use a helper function to strip away unused subgraphs and save the result as another frozen model.

In [4]:
# Frozen model
frozen_model = './optimized_frozen_model.pb'
input_node = "Preprocessor/sub"
bbox_output_node = "concat"
class_output_node = "Postprocessor/convert_scores"

In [5]:
# Function to Optimize model graph
def optimize_model_graph(graph=None):
    # Strip out the unused subgraphs
    gdef = strip_unused_lib.strip_unused(
            input_graph_def = graph.as_graph_def(),
            input_node_names = [input_node],
            output_node_names = [bbox_output_node, class_output_node],
            placeholder_type_enum = dtypes.float32.as_datatype_enum)
    
    # Save the Optimized graph model as a new model
    with gfile.GFile(frozen_model, "wb") as f:
        f.write(gdef.SerializeToString())

In [6]:
# Get optimized model
optimize_model_graph(graph=graph)

The strip_unused() function will only keep the portion of the graph in between the specified input and output nodes, and removes everything else. What’s left is the only piece of the original graph that Core ML can actually handle — the rest is full of unsupported operations.

The part of the TensorFlow graph that we keep has one input for the image and two outputs: one for the bounding box coordinate predictions and one for the classes. While YOLO combines the coordinate and class predictions into a single tensor, SSD makes these predictions on two separate branches. That’s why we also have to supply the names of two output nodes.

In [7]:
# Convert frozen model to CoreML Model
import tfcoreml

# Define model name
coreml_model = 'SSDLite_MobileNetv2.mlmodel'

# Define Input Image Width and Height
input_width = 300
input_height = 300

# Define Input and Output Tensor Names
input_tensor = input_node + ":0"
bbox_output_tensor = bbox_output_node + ":0"
class_output_tensor = class_output_node + ":0"



In [8]:
# Function to convert tensorflow model to CoreML Model
def tf_to_coreml(tf_model_path=None, coreml_model_path=None):
    ssd_model = tfcoreml.convert(
                tf_model_path=tf_model_path,
                mlmodel_path=coreml_model_path,
                input_name_shape_dict={ input_tensor: [1, input_height, input_width, 3] },
                image_input_names=input_tensor,
                output_feature_names=[bbox_output_tensor, class_output_tensor],
                is_bgr=False,
                red_bias=-1.0,
                green_bias=-1.0,
                blue_bias=-1.0,
                image_scale=2./255)
    return ssd_model

In [9]:
# Get CoreML Model from TF Model
mlmodel = tf_to_coreml(tf_model_path=frozen_model, coreml_model_path=coreml_model)


Loading the TF graph...
Graph Loaded.
Collecting all the 'Const' ops from the graph, by running it....
Done.
Now finding ops in the TF graph that can be dropped for inference
Now starting translation to CoreML graph.
Automatic shape interpretation succeeded for input blob Preprocessor/sub:0
1/1678: Analysing op name: Postprocessor/scale_logits/y ( type:  Const )
2/1678: Analysing op name: concat_1/axis ( type:  Const )
3/1678: Analysing op name: concat/axis ( type:  Const )
4/1678: Analysing op name: BoxPredictor_5/stack_1/2 ( type:  Const )
5/1678: Analysing op name: BoxPredictor_5/stack_1/1 ( type:  Const )
6/1678: Analysing op name: BoxPredictor_5/stack/3 ( type:  Const )
7/1678: Analysing op name: BoxPredictor_5/stack/2 ( type:  Const )
8/1678: Analysing op name: BoxPredictor_5/stack/1 ( type:  Const )
9/1678: Analysing op name: BoxPredictor_5/strided_slice/stack_2 ( type:  Const )
10/1678: Analysing op name: BoxPredictor_5/strided_slice/stack_1 ( type:  Const )
11/1678: Analysing

851/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_8/expand/BatchNorm/beta ( type:  Const )
852/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_8/expand/BatchNorm/beta/read ( type:  Identity )
853/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_8/expand/BatchNorm/gamma ( type:  Const )
854/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_8/expand/BatchNorm/gamma/read ( type:  Identity )
855/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_8/expand/BatchNorm/batchnorm/mul ( type:  Mul )
856/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_8/expand/BatchNorm/batchnorm/mul_2 ( type:  Mul )
857/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_8/expand/BatchNorm/batchnorm/sub ( type:  Sub )
858/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_8/expand/weights ( type:  Const )
859/1678: Analysing op name: FeatureExtractor/MobilenetV2/expa

1210/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv/project/weights ( type:  Const )
1211/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv/project/weights/read ( type:  Identity )
1212/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv/depthwise/BatchNorm/batchnorm/add/y ( type:  Const )
1213/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv/depthwise/BatchNorm/moving_variance ( type:  Const )
1214/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv/depthwise/BatchNorm/moving_variance/read ( type:  Identity )
1215/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv/depthwise/BatchNorm/batchnorm/add ( type:  Add )
1216/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv/depthwise/BatchNorm/batchnorm/Rsqrt ( type:  Rsqrt )
1217/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv/depthwise/BatchNorm/moving_mean ( type:  Const )
1218/1678: Analysing op 

1404/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_9/depthwise_output ( type:  Identity )
1405/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_9/project/Conv2D ( type:  Conv2D )
1406/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_9/project/BatchNorm/batchnorm/mul_1 ( type:  Mul )
1407/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_9/project/BatchNorm/batchnorm/add_1 ( type:  Add )
1408/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_9/project/Identity ( type:  Identity )
1409/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_9/add ( type:  Add )
1410/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_9/output ( type:  Identity )
1411/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_10/input ( type:  Identity )
1412/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_10/expand/Conv2D ( type:  Conv2D )
1413/1678: Ana

1512/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_15/expand/BatchNorm/batchnorm/mul_1 ( type:  Mul )
1513/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_15/expand/BatchNorm/batchnorm/add_1 ( type:  Add )
1514/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_15/expand/Relu6 ( type:  Relu6 )
1515/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_15/expansion_output ( type:  Identity )
1516/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_15/depthwise/depthwise ( type:  DepthwiseConv2dNative )
1517/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_15/depthwise/BatchNorm/batchnorm/mul_1 ( type:  Mul )
1518/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_15/depthwise/BatchNorm/batchnorm/add_1 ( type:  Add )
1519/1678: Analysing op name: FeatureExtractor/MobilenetV2/expanded_conv_15/depthwise/Relu6 ( type:  Relu6 )
1520/1678: Analysing op name: FeatureExtr

1606/1678: Analysing op name: BoxPredictor_3/stack ( type:  Pack )
1607/1678: Analysing op name: BoxPredictor_3/ClassPredictor_depthwise/depthwise ( type:  DepthwiseConv2dNative )
1608/1678: Analysing op name: BoxPredictor_3/ClassPredictor_depthwise/BatchNorm/FusedBatchNorm ( type:  FusedBatchNorm )
1609/1678: Analysing op name: BoxPredictor_3/ClassPredictor_depthwise/Relu6 ( type:  Relu6 )
1610/1678: Analysing op name: BoxPredictor_3/ClassPredictor/Conv2D ( type:  Conv2D )
1611/1678: Analysing op name: BoxPredictor_3/ClassPredictor/BiasAdd ( type:  BiasAdd )
1612/1678: Analysing op name: BoxPredictor_3/Reshape_1 ( type:  Reshape )
1613/1678: Analysing op name: BoxPredictor_3/BoxEncodingPredictor_depthwise/depthwise ( type:  DepthwiseConv2dNative )
1614/1678: Analysing op name: BoxPredictor_3/BoxEncodingPredictor_depthwise/BatchNorm/FusedBatchNorm ( type:  FusedBatchNorm )
1615/1678: Analysing op name: BoxPredictor_3/BoxEncodingPredictor_depthwise/Relu6 ( type:  Relu6 )
1616/1678: Anal

We now have a Core ML model that takes a 300×300 image as input and produces two outputs: a multi-array with the coordinates for 1917 bounding boxes and another multi-array with the class predictions for the same 1917 bounding boxes.

In [10]:
# Cleaning up the model by renaming the inputs and outputs
spec = mlmodel.get_spec()

spec.description.input[0].name = "image"
spec.description.input[0].shortDescription = "Input image"
spec.description.output[0].name = "scores"
spec.description.output[0].shortDescription = "Predicted class scores for each bounding box"
spec.description.output[1].name = "boxes"
spec.description.output[1].shortDescription = "Predicted coordinates for each bounding box"

It’s not enough to change these names in the spec.description. Any layers that are connected to the old input or output names must now use the new names too. Likewise for the object that handles the image preprocessing.

In [11]:
input_mlmodel = input_tensor.replace(":", "__").replace("/", "__")
class_output_mlmodel = class_output_tensor.replace(":", "__").replace("/", "__")
bbox_output_mlmodel = bbox_output_tensor.replace(":", "__").replace("/", "__")

for i in range(len(spec.neuralNetwork.layers)):
    if spec.neuralNetwork.layers[i].input[0] == input_mlmodel:
        spec.neuralNetwork.layers[i].input[0] = "image"
    if spec.neuralNetwork.layers[i].output[0] == class_output_mlmodel:
        spec.neuralNetwork.layers[i].output[0] = "scores"
    if spec.neuralNetwork.layers[i].output[0] == bbox_output_mlmodel:
        spec.neuralNetwork.layers[i].output[0] = "boxes"

spec.neuralNetwork.preprocessing[0].featureName = "image"        

If we look at the outputs using print(spec.description), the "scores" output correctly shows up as a multi-array but its shape is not filled in.

In [12]:
print(spec.description)

input {
  name: "image"
  shortDescription: "Input image"
  type {
    imageType {
      width: 300
      height: 300
      colorSpace: RGB
    }
  }
}
output {
  name: "scores"
  shortDescription: "Predicted class scores for each bounding box"
  type {
    multiArrayType {
      dataType: DOUBLE
    }
  }
}
output {
  name: "boxes"
  shortDescription: "Predicted coordinates for each bounding box"
  type {
    multiArrayType {
      shape: 4
      shape: 1917
      shape: 1
      dataType: DOUBLE
    }
  }
}



We know for a fact that this always outputs an array of shape (91, 1917) because there are 91 classes and 1917 bounding boxes. Why 91 classes? This model was trained on the COCO dataset and so it can detect 90 possible types of objects, plus one class for “no object detected”.

In [13]:
# Let's fill in the output shape for the scores and bounding boxes
num_classes = 90
num_anchors = 1917

spec.description.output[0].type.multiArrayType.shape.append(num_classes + 1)
spec.description.output[0].type.multiArrayType.shape.append(num_anchors)

In [14]:
# Conforming that changes took place
print(spec.description)

input {
  name: "image"
  shortDescription: "Input image"
  type {
    imageType {
      width: 300
      height: 300
      colorSpace: RGB
    }
  }
}
output {
  name: "scores"
  shortDescription: "Predicted class scores for each bounding box"
  type {
    multiArrayType {
      shape: 91
      shape: 1917
      dataType: DOUBLE
    }
  }
}
output {
  name: "boxes"
  shortDescription: "Predicted coordinates for each bounding box"
  type {
    multiArrayType {
      shape: 4
      shape: 1917
      shape: 1
      dataType: DOUBLE
    }
  }
}



The first two dimensions are correct in the box coordinates, but there is no reason to have that third dimension of size 1, so we might as well get rid of it.

In [15]:
# Delete third dimension in bounding box values
del spec.description.output[1].type.multiArrayType.shape[-1]

In [16]:
print(spec.description)

input {
  name: "image"
  shortDescription: "Input image"
  type {
    imageType {
      width: 300
      height: 300
      colorSpace: RGB
    }
  }
}
output {
  name: "scores"
  shortDescription: "Predicted class scores for each bounding box"
  type {
    multiArrayType {
      shape: 91
      shape: 1917
      dataType: DOUBLE
    }
  }
}
output {
  name: "boxes"
  shortDescription: "Predicted coordinates for each bounding box"
  type {
    multiArrayType {
      shape: 4
      shape: 1917
      dataType: DOUBLE
    }
  }
}



Finally, let’s convert the weights to 16-bit floats.

In [17]:
# Import Dependencies
import coremltools

spec = coremltools.utils.convert_neural_network_spec_weights_to_fp16(spec)

In [18]:
# Save the new coreml model with optimized weights
coreml_model_path = 'SSDLite_MobileNetv2_WeightOptimized.mlmodel'
mlmodel = coremltools.models.MLModel(spec)
mlmodel.save(coreml_model_path)

The four numbers that SSD predicts for each bounding box describe how the position and size of the corresponding anchor box should be modified in order to fit the detected object. For example, the predicted numbers may say, “move my anchor box 20 pixels to the left, and make it 5% wider but also 3% less tall.”

The model is trained to make its prediction using the anchor box that best fits the detected object, and then tweak the box a little so that it fits perfectly.

The anchor boxes are chosen prior to training and are always fixed. In other words, they are a hyperparameter. 

To be able to decode the coordinate predictions, we first need to know what the anchor boxes are. It’s possible to dig up the mathematical formula for computing the anchor box positions and sizes — but as this formula is part of the original TensorFlow model, we can also simply ask the graph.

In [19]:
# Import Dependencies
import numpy as np


# Function to get Image bounding box anchor values
def get_anchors(graph, tensor_name):
    image_tensor = graph.get_tensor_by_name("image_tensor:0")
    box_corners_tensor = graph.get_tensor_by_name(tensor_name)
    box_corners = sess.run(box_corners_tensor, feed_dict={
        image_tensor: np.zeros((1, input_height, input_width, 3))})

    ymin, xmin, ymax, xmax = np.transpose(box_corners)
    width = xmax - xmin
    height = ymax - ymin
    ycenter = ymin + height / 2.
    xcenter = xmin + width / 2.
    return np.stack([ycenter, xcenter, height, width])

In [20]:
# Name of Anchor tensor
anchors_tensor = "Concatenate/concat:0"

with graph.as_default():
    with tf.Session(graph=graph) as sess:
        anchors = get_anchors(graph, anchors_tensor)

To get the appropriate anchor boxes for our desired input image size, we must run the graph on such an image. Here we’re simply using a fake image that is all zeros. For the anchor boxes it doesn’t matter what is actually in the image, only how large it is (300×300 pixels in our case).

Adding the decoding logic to the mlmodel involves taking the above formula and implementing it using various Core ML layer types.

We could directly add these layers to the SSD model that we just converted, but instead let’s create a completely new model from scratch. Later, we’ll connect these models together using a pipeline.

In [21]:
# Import Dependencies
from coremltools.models import datatypes
from coremltools.models import neural_network

# Input Fratures to model
input_features = [
    ("scores", datatypes.Array(num_classes + 1, num_anchors, 1)),
    ("boxes", datatypes.Array(4, num_anchors, 1))
]

# Output features from model
output_features = [
    ("raw_confidence", datatypes.Array(num_anchors, num_classes)),
    ("raw_coordinates", datatypes.Array(num_anchors, 4))
]

# Neural Network Builder
builder = neural_network.NeuralNetworkBuilder(input_features, output_features)

The inputs to the decoder model are exactly the same as the outputs from the SSD model. Well, almost. The boxes output from SSD has shape (4, num_anchors) but here we say the shape is (4, num_anchors, 1). Similarly for the scores output.

In Core ML, if the input to a neural network model is a multi-array it must have either one or three dimensions. Since our arrays only have two dimensions, we need to add an unused dimension of size 1 at the front or back.

All right, let’s build this decoder model. First let’s look at the scores input. The decoder needs to do two things with the scores:

swap around the dimensions, and
strip off the predictions for the “unknown” class.
To swap the dimensions we use a permute operation.

In [22]:
builder.add_permute(name="permute_scores",
                    dim=(0, 3, 2, 1),
                    input_name="scores",
                    output_name="permute_scores_output")

Even though our input is a tensor with three dimensions, (91, 1917, 1), the permute layer treats it as having four dimensions. The first dimension is used for sequences and we’ll leave it alone.

After permuting, the shape of the data is now (1, 1, 1917, 91). Each bounding box prediction gets a 91-element vector with the class scores, the first of which is the prediction for class “unknown”. To strip this off we use a slice operation that works on the “width” axis (the last one). We only want to keep the elements 1 through 90, so we set start_index=1 and end_index=91 (the end index is exclusive)

In [23]:
builder.add_slice(name="slice_scores",
                  input_name="permute_scores_output",
                  output_name="raw_confidence",
                  axis="width",
                  start_index=1,
                  end_index=num_classes + 1)

Now the data has shape (1, 1, 1917, 90). That tensor can go straight into the decoder model’s first output, "raw_confidence". Note that we declared this output to have shape (1917, 90). The first two dimensions are automatically dropped by Core ML because they are of size 1.

Next up is the second input, boxes, that has the bounding box “coordinates”.

In [24]:
builder.add_slice(name="slice_yx",
                  input_name="boxes",
                  output_name="slice_yx_output",
                  axis="channel",
                  start_index=0,
                  end_index=2)

In [25]:
builder.add_elementwise(name="scale_yx",
                        input_names="slice_yx_output",
                        output_name="scale_yx_output",
                        mode="MULTIPLY",
                        alpha=0.1)

In [26]:
anchors_yx = np.expand_dims(anchors[:2, :], axis=-1)
anchors_hw = np.expand_dims(anchors[2:, :], axis=-1)

In [27]:
builder.add_load_constant(name="anchors_yx",
                          output_name="anchors_yx",
                          constant_value=anchors_yx,
                          shape=[2, num_anchors, 1])

builder.add_load_constant(name="anchors_hw",
                          output_name="anchors_hw",
                          constant_value=anchors_hw,
                          shape=[2, num_anchors, 1])

In [28]:
builder.add_elementwise(name="yw_times_hw",
                        input_names=["scale_yx_output", "anchors_hw"],
                        output_name="yw_times_hw_output",
                        mode="MULTIPLY")

In [29]:
builder.add_elementwise(name="decoded_yx",
                        input_names=["yw_times_hw_output", "anchors_yx"],
                        output_name="decoded_yx_output",
                        mode="ADD")

In [30]:
builder.add_slice(name="slice_hw",
                  input_name="boxes",
                  output_name="slice_hw_output",
                  axis="channel",
                  start_index=2,
                  end_index=4)

builder.add_elementwise(name="scale_hw",
                        input_names="slice_hw_output",
                        output_name="scale_hw_output",
                        mode="MULTIPLY",
                        alpha=0.2)

In [31]:
builder.add_unary(name="exp_hw",
                  input_name="scale_hw_output",
                  output_name="exp_hw_output",
                  mode="exp")

In [32]:
builder.add_elementwise(name="decoded_hw",
                        input_names=["exp_hw_output", "anchors_hw"],
                        output_name="decoded_hw_output",
                        mode="MULTIPLY")

In [33]:
builder.add_slice(name="slice_y",
                  input_name="decoded_yx_output",
                  output_name="slice_y_output",
                  axis="channel",
                  start_index=0,
                  end_index=1)

builder.add_slice(name="slice_x",
                  input_name="decoded_yx_output",
                  output_name="slice_x_output",
                  axis="channel",
                  start_index=1,
                  end_index=2)

builder.add_slice(name="slice_h",
                  input_name="decoded_hw_output",
                  output_name="slice_h_output",
                  axis="channel",
                  start_index=0,
                  end_index=1)

builder.add_slice(name="slice_w",
                  input_name="decoded_hw_output",
                  output_name="slice_w_output",
                  axis="channel",
                  start_index=1,
                  end_index=2)

builder.add_elementwise(name="concat",
                        input_names=["slice_x_output", "slice_y_output", 
                                     "slice_w_output", "slice_h_output"],
                        output_name="concat_output",
                        mode="CONCAT")

In [34]:
builder.add_permute(name="permute_output",
                    dim=(0, 3, 2, 1),
                    input_name="concat_output",
                    output_name="raw_coordinates")

In [35]:
decoder_model = coremltools.models.MLModel(builder.spec)
decoder_model.save("Decoder.mlmodel")

In [36]:
# Non Maximum Supression

nms_spec = coremltools.proto.Model_pb2.Model()
nms_spec.specificationVersion = 3

for i in range(2):
    decoder_output = decoder_model._spec.description.output[i].SerializeToString()

    nms_spec.description.input.add()
    nms_spec.description.input[i].ParseFromString(decoder_output)

    nms_spec.description.output.add()
    nms_spec.description.output[i].ParseFromString(decoder_output)
    
nms_spec.description.output[0].name = "confidence"
nms_spec.description.output[1].name = "coordinates"

output_sizes = [num_classes, 4]
for i in range(2):
    ma_type = nms_spec.description.output[i].type.multiArrayType
    ma_type.shapeRange.sizeRanges.add()
    ma_type.shapeRange.sizeRanges[0].lowerBound = 0
    ma_type.shapeRange.sizeRanges[0].upperBound = -1
    ma_type.shapeRange.sizeRanges.add()
    ma_type.shapeRange.sizeRanges[1].lowerBound = output_sizes[i]
    ma_type.shapeRange.sizeRanges[1].upperBound = output_sizes[i]
    del ma_type.shape[:]

nms = nms_spec.nonMaximumSuppression
nms.confidenceInputFeatureName = "raw_confidence"
nms.coordinatesInputFeatureName = "raw_coordinates"
nms.confidenceOutputFeatureName = "confidence"
nms.coordinatesOutputFeatureName = "coordinates"
nms.iouThresholdInputFeatureName = "iouThreshold"
nms.confidenceThresholdInputFeatureName = "confidenceThreshold"

default_iou_threshold = 0.6
default_confidence_threshold = 0.4
nms.iouThreshold = default_iou_threshold
nms.confidenceThreshold = default_confidence_threshold

nms.pickTop.perClass = True

labels = np.loadtxt("coco_labels.txt", dtype=str, delimiter="\n")
nms.stringClassLabels.vector.extend(labels)

nms_model = coremltools.models.MLModel(nms_spec)
nms_model.save("NMS.mlmodel")

In [37]:
from coremltools.models.pipeline import *

input_features = [ ("image", datatypes.Array(3, 300, 300)),
                   ("iouThreshold", datatypes.Double()),
                   ("confidenceThreshold", datatypes.Double()) ]

output_features = [ "confidence", "coordinates" ]

pipeline = Pipeline(input_features, output_features)

# We added a dimension of size 1 to the back of the inputs of the decoder 
# model, so we should also add this to the output of the SSD model or else 
# the inputs and outputs do not match and the pipeline is not valid.
ssd_output = mlmodel._spec.description.output
ssd_output[0].type.multiArrayType.shape[:] = [num_classes + 1, num_anchors, 1]
ssd_output[1].type.multiArrayType.shape[:] = [4, num_anchors, 1]

pipeline.add_model(mlmodel)
pipeline.add_model(decoder_model)
pipeline.add_model(nms_model)

# The "image" input should really be an image, not a multi-array.
pipeline.spec.description.input[0].ParseFromString(mlmodel._spec.description.input[0].SerializeToString())

# Copy the declarations of the "confidence" and "coordinates" outputs.
# The Pipeline makes these strings by default.
pipeline.spec.description.output[0].ParseFromString(nms_model._spec.description.output[0].SerializeToString())
pipeline.spec.description.output[1].ParseFromString(nms_model._spec.description.output[1].SerializeToString())

# Add descriptions to the inputs and outputs.
pipeline.spec.description.input[1].shortDescription = "(optional) IOU Threshold override"
pipeline.spec.description.input[2].shortDescription = "(optional) Confidence Threshold override"
pipeline.spec.description.output[0].shortDescription = u"Boxes \xd7 Class confidence"
pipeline.spec.description.output[1].shortDescription = u"Boxes \xd7 [x, y, width, height] (relative to image size)"

# Add metadata to the model.
pipeline.spec.description.metadata.versionString = "ssdlite_mobilenet_v2_coco_2018_05_09"
pipeline.spec.description.metadata.shortDescription = "MobileNetV2 + SSDLite, trained on COCO"
pipeline.spec.description.metadata.author = "Converted to Core ML by Matthijs Hollemans"
pipeline.spec.description.metadata.license = "https://github.com/tensorflow/models/blob/master/research/object_detection"

# Add the list of class labels and the default threshold values too.
user_defined_metadata = {
    "iou_threshold": str(default_iou_threshold),
    "confidence_threshold": str(default_confidence_threshold),
    "classes": ",".join(labels)
}
pipeline.spec.description.metadata.userDefined.update(user_defined_metadata)

# Don't forget this or Core ML might attempt to run the model on an unsupported
# operating system version!
pipeline.spec.specificationVersion = 3

coreml_model_path_final = 'SSDLite_MobileNetv2_Final.mlmodel'

final_model = coremltools.models.MLModel(pipeline.spec)
final_model.save(coreml_model_path_final)

print(final_model)

input {
  name: "image"
  shortDescription: "Input image"
  type {
    imageType {
      width: 300
      height: 300
      colorSpace: RGB
    }
  }
}
input {
  name: "iouThreshold"
  shortDescription: "(optional) IOU Threshold override"
  type {
    doubleType {
    }
  }
}
input {
  name: "confidenceThreshold"
  shortDescription: "(optional) Confidence Threshold override"
  type {
    doubleType {
    }
  }
}
output {
  name: "confidence"
  shortDescription: "Boxes \303\227 Class confidence"
  type {
    multiArrayType {
      dataType: DOUBLE
      shapeRange {
        sizeRanges {
          upperBound: -1
        }
        sizeRanges {
          lowerBound: 90
          upperBound: 90
        }
      }
    }
  }
}
output {
  name: "coordinates"
  shortDescription: "Boxes \303\227 [x, y, width, height] (relative to image size)"
  type {
    multiArrayType {
      dataType: DOUBLE
      shapeRange {
        sizeRanges {
          upperBound: -1
        }
        sizeRanges {
       