## Capstone Project

## Project: Real time object detection magnifier
---

### The Road Ahead

I break the notebook into separate steps.  Feel free to use the links below to navigate the notebook.

* [Step 0](#step0): Environment Setup
* [Step 1](#step1): Train MobileNetv1 with EgoHands to generate CoreML model
* [Step 2](#step2): Train MobileNetv1 with MS-COCO to generate CoreML model
* [Step 3](#step3): Train YOLOv2 with MS-COCO to generate CoreML model
* [Step 4](#step4): Train Tiny YOLO with MS-COCO to generate CoreML model

---
<a id='step0'></a>
## Step 0: Environment Setup

### Python3 Kernel

Change kernel to Python3 (***Kernel > Change kernel > Python3***)

### The Egohands Dataset

The hand detector model is built using data from the [Egohands Dataset](http://vision.soic.indiana.edu/projects/egohands/) dataset. This dataset works well for several reasons. It contains high quality, pixel level annotations (>15000 ground truth labels) where hands are located across 4800 images. All images are captured from an egocentric view (Google glass) across 48 different environments (indoor, outdoor) and activities (playing cards, chess, jenga, solving puzzles etc).

<p float="left">
<img src="images/egohands_0.png" width="180" />
<img src="images/egohands_1.png" width="180" />
<img src="images/egohands_2.png" width="180" />
</p>

The Egohands dataset (zip file with labeled data) contains 48 folders of locations where video data was collected (100 images per folder).
```
-- LOCATION_X
  -- frame_1.jpg
  -- frame_2.jpg
  ...
  -- frame_100.jpg
  -- polygons.mat  // contains annotations for all 100 images in current folder
-- LOCATION_Y
  -- frame_1.jpg
  -- frame_2.jpg
  ...
  -- frame_100.jpg
  -- polygons.mat  // contains annotations for all 100 images in current folder
  ```

### Setup environment variable

In [None]:
TF_OBJECT_DETECTION_HOME = "/root/Project/tensorflow/models/research/object_detection"
DARKNET_HOME = "/root/Project/darknet"
DARKFLOW_HOME = "/root/Project/darkflow"

MLND_CAPSTONE_HOME = "/root/Project/capstone"

IMAGES_HOME = MLND_CAPSTONE_HOME + "/images"
SAMPLE_HOME = MLND_CAPSTONE_HOME + "/sample"

EGOHANDS_HOME = MLND_CAPSTONE_HOME + "/dataset/EgoHands"
MSCOCO_HOME = MLND_CAPSTONE_HOME + "/dataset/MS-COCO"

YOLOV2_EGOHANDS_HOME = MLND_CAPSTONE_HOME + "/model/yolov2_608_egohands"
YOLOV2_TINY_EGOHANDS_HOME = MLND_CAPSTONE_HOME + "/model/yolov2_tiny_608_egohands"
MOBILENET_EGOHANDS_HOME = MLND_CAPSTONE_HOME + "/model/ssd_mobilenet_v1_300_egohands"
YOLOV2_COCO_HOME = MLND_CAPSTONE_HOME + "/model/yolov2_608_coco"
YOLOV2_TINY_COCO_HOME = MLND_CAPSTONE_HOME + "/model/yolov2_tiny_608_coco"
MOBILENET_COCO_HOME = MLND_CAPSTONE_HOME + "/model/ssd_mobilenet_v1_300_coco"

### Setup EgoHands dataset

- All scripts create/support following folder structure to be able to support tensorflow as well as yolo-darknet projects:
```
.
├── data
│   ├── train 
│   │   ├── labels
│   │   │   ├──file1.txt
│   │   │   └── ...
│   │   └── images
│   │       ├──file1.jpg
│   │       └── ...
│   ├── eval
│   │   └── ...
│   │
│   ├── train_labels.csv
│   ├── eval_labels.csv
│   ├── label_map.pbtxt
│   ├── train.record
│   ├── eval.record
│   ├── train.txt
│   └── eval.txt
│   
└── model
```

- The egohands_setup.py will do following things
    - Downloads the egohands datasets
    - Renames all files to include their directory names to ensure each filename is unique
    - Splits the dataset into train (90%) and eval (10%) folders.
    - Reads in polygons.mat for each folder, generates bounding boxes and visualizes them to ensure correctness (see image above).
    - Once the script is done running, you should have an data/train/images and data/eval/images folder. Each of these folders should also contain a csv label document each - data/train_labels.csv, data/eval_labels.csv that can be used to generate tfrecords.
    - The generated data/label_map.pbtxt is used for tensorflow. Below is it's content.
```
        item {
          id: 1
          name: 'hand'
        }
```

Note: While the egohands dataset provides four separate labels for hands (own left, own right, other left, and other right), for my purpose, I am only interested in the general `hand` class and label all training data as `hand`.

In [None]:
%cd $EGOHANDS_HOME

%run egohands_setup.py

### Generate Labels for Darknet

- Now we need to generate the label files, i.e. data/train/labels/* or data/eval/labels, that Darknet uses. Darknet wants a .txt file for each image with a line for each ground truth object in the image that looks like:
```
    <object-class> <x> <y> <width> <height>
```
- The content of generated data/handsnet.data is as below
```
    classes = 1            
    train = /root/Project/capstone/dataset/EgoHands/data/train.txt            
    valid = /root/Project/capstone/dataset/EgoHands/data/eval.txt            
    names = /root/Project/capstone/dataset/EgoHands/data/handsnet.names            
    backup = /root/Project/capstone/dataset/EgoHands/model/yolo_backup/
```
- The content of generated data/handsnet.names is as below
```
    hand
```
- The content of generated data/train.txt or data/eval.txt is as below
```
    /root/Project/capstone/dataset/EgoHands/data/train/images/JENGA_OFFICE_S_B_frame_1914.jpg
    /root/Project/capstone/dataset/EgoHands/data/train/images/CHESS_OFFICE_S_B_frame_0905.jpg
    /root/Project/capstone/dataset/EgoHands/data/train/images/CHESS_LIVINGROOM_H_T_frame_1495.jpg
    ...
```

In [None]:
%cd $EGOHANDS_HOME

%run csv_to_yolo_txt.py

### Generate TFRecord for tensorflow

In [None]:
%cd $EGOHANDS_HOME

%run csv_to_tfrecord.py

### Get pre-trained MobileNet MS-COCO sample model

In [None]:
%cd $SAMPLE_HOME

!tar -zxvf ssd_mobilenet_v1_coco_2017_11_17.tar.gz

<a id='step1'></a>
## Step 1: Train MobileNetv1 with EgoHands to generate CoreML model


### Training the hand detection Model


Once our records files are ready, we are almost ready to train the model.
> **Note**: You don't have to do below things. All these things I have done in this docker image. Just run below cell to do it.

- Download the and pre-trained [ssd_mobilenet_v1_coco](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2017_11_17.tar.gz) model.

- Download the config file for the same model. In my case, I will download [ssd_mobilenet_v1_coco.config](https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v1_coco.config). And make below modifications.


     - Change the number of classes in the file according to our requirement.
```
    -- before
        num_classes: 90
    -- after
        num_classes: 1
```
     - I have no good GPU then I decrease the batch_size.
```
    -- before
        batch_size: 24
    -- After
        batch_size: 6
```
     - Give path to downloaded model i.e ssd_mobilenet_v1_coco; the model we decided to use in step 1.
```    
    -- before
        fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
    -- after
        fine_tune_checkpoint: "/root/Project/capstone/sample/ssd_mobilenet_v1_coco_2017_11_17/model.ckpt"
```
     - Give path to train.record file.
```    
    -- before
        train_input_reader: {  
        tf_record_input_reader {   
        input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record"
        }
        label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
        }
    -- after
        train_input_reader: {  
        tf_record_input_reader {   
        input_path: "/root/Project/capstone/dataset/EgoHangs/data/train.record"
        }
        label_map_path: "/root/Project/capstone/dataset/EgoHands/data/label_map.pbtxt"
        }
```
     - Give path for eval.record file
```    
    -- before
        eval_input_reader: {  
        tf_record_input_reader {
        input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record" 
        }
        label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
        shuffle: false
        num_readers: 1
        }
    -- after
        eval_input_reader: {  
        tf_record_input_reader {
        input_path: "/root/Project/capstone/dataset/EgoHands/data/eval.record" 
        }
        label_map_path: "/root/Project/capstone/dataset/EgoHands/data/label_map.pbtxt"  
        shuffle: false
        num_readers: 1
        }
```
     - The eval image number should be 480.
```
    -- before
        eval_config: {
          num_examples: 8000
    -- after
        eval_config: {
          num_examples: 480
    ```

Run below cells to train the mode

In [None]:
%cd $TF_OBJECT_DETECTION_HOME

In [None]:
%env PYTHONPATH=/root/Project/tensorflow/models/research:/root/Project/tensorflow/models/research/slim::/root/Project/cocoapi/PythonAPI

In [None]:
!python train.py \
    --logtostderr \
    --train_dir=$MOBILENET_EGOHANDS_HOME/training \
    --pipeline_config_path=$MOBILENET_EGOHANDS_HOME/training/ssd_mobilenet_v1_coco_egohands.config

To visualize the training results

In [None]:
!tensorboard --logdir=$MOBILENET_EGOHANDS_HOME/training 

<p float="left">
<img src="sample/ssd_mobilenet_v1_300_egohands/images/total_loss.png" width="300" />
<img src="sample/ssd_mobilenet_v1_300_egohands/images/regularization_loss.png"  width="300" />
</p>

### Evaluating the hand detection Model

In [None]:
!python eval.py \
    --logtostderr \
    --pipeline_config_path=$MOBILENET_EGOHANDS_HOME/training/ssd_mobilenet_v1_coco_egohands.config \
    --checkpoint_dir=$MOBILENET_EGOHANDS_HOME/training/ \
    --eval_dir=$MOBILENET_EGOHANDS_HOME/eval/

To visualize the eval results

In [None]:
!tensorboard --logdir=$MOBILENET_EGOHANDS_HOME/eval

<img src="sample/ssd_mobilenet_v1_300_egohands/images/map.png" />

### Convert to tensorflow model

In [None]:
!python export_inference_graph.py \
    --input_type image_tensor \
    --pipeline_config_path $MOBILENET_EGOHANDS_HOME/training/ssd_mobilenet_v1_coco_egohands.config \
    --trained_checkpoint_prefix $MOBILENET_EGOHANDS_HOME/training/model.ckpt-200000 \
    --output_directory $MOBILENET_EGOHANDS_HOME/egohands_inference_graph

### Check tensorflow model functionality

In [None]:
%cd $TF_OBJECT_DETECTION_HOME

In [None]:
import numpy as np
import tensorflow as tf

from matplotlib import pyplot as plt
from PIL import Image

from utils import label_map_util
from utils import visualization_utils as vis_util

In [None]:
MODEL_NAME = 'egohands_inference_graph'

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MOBILENET_EGOHANDS_HOME + "/" + MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = EGOHANDS_HOME + "/data/label_map.pbtxt"

NUM_CLASSES = 1

#### Load a (frozen) Tensorflow model into memory.

In [None]:
detection_graph = tf.Graph()
with detection_graph.as_default():
  od_graph_def = tf.GraphDef()
  with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
    serialized_graph = fid.read()
    od_graph_def.ParseFromString(serialized_graph)
    tf.import_graph_def(od_graph_def, name='')

#### Loading label map
Label maps map indices to category names, so that when our convolution network predicts `1`, we know that this corresponds to `hand`.  Here we use internal utility functions, but anything that returns a dictionary mapping integers to appropriate string labels would be fine

In [None]:
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)

#### Helper code

In [None]:
def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)

#### Detection

In [None]:
# For the sake of simplicity we will use only 2 images:
# image1.jpg
# image2.jpg
# If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS.
PATH_TO_TEST_IMAGES_DIR = IMAGES_HOME
TEST_IMAGE_PATHS = [ PATH_TO_TEST_IMAGES_DIR + '/oxfordhands_{}.jpg'.format(i) for i in range(0, 6) ]

# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)

In [None]:
def run_inference_for_single_image(image, graph):
  with graph.as_default():
    with tf.Session() as sess:
      # Get handles to input and output tensors
      ops = tf.get_default_graph().get_operations()
      all_tensor_names = {output.name for op in ops for output in op.outputs}
      tensor_dict = {}
      for key in [
          'num_detections', 'detection_boxes', 'detection_scores',
          'detection_classes', 'detection_masks'
      ]:
        tensor_name = key + ':0'
        if tensor_name in all_tensor_names:
          tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
              tensor_name)
      if 'detection_masks' in tensor_dict:
        # The following processing is only for single image
        detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
        detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
        # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
        real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
        detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
        detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
        detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
            detection_masks, detection_boxes, image.shape[0], image.shape[1])
        detection_masks_reframed = tf.cast(
            tf.greater(detection_masks_reframed, 0.5), tf.uint8)
        # Follow the convention by adding back the batch dimension
        tensor_dict['detection_masks'] = tf.expand_dims(
            detection_masks_reframed, 0)
      image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')

      # Run inference
      output_dict = sess.run(tensor_dict,
                             feed_dict={image_tensor: np.expand_dims(image, 0)})

      # all outputs are float32 numpy arrays, so convert types as appropriate
      output_dict['num_detections'] = int(output_dict['num_detections'][0])
      output_dict['detection_classes'] = output_dict[
          'detection_classes'][0].astype(np.uint8)
      output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
      output_dict['detection_scores'] = output_dict['detection_scores'][0]
      if 'detection_masks' in output_dict:
        output_dict['detection_masks'] = output_dict['detection_masks'][0]
  return output_dict

In [None]:
for image_path in TEST_IMAGE_PATHS:
  image = Image.open(image_path)
  # the array based representation of the image will be used later in order to prepare the
  # result image with boxes and labels on it.
  image_np = load_image_into_numpy_array(image)
  # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
  image_np_expanded = np.expand_dims(image_np, axis=0)
  # Actual detection.
  output_dict = run_inference_for_single_image(image_np, detection_graph)
  # Visualization of the results of a detection.
  vis_util.visualize_boxes_and_labels_on_image_array(
      image_np,
      output_dict['detection_boxes'],
      output_dict['detection_classes'],
      output_dict['detection_scores'],
      category_index,
      instance_masks=output_dict.get('detection_masks'),
      use_normalized_coordinates=True,
      line_thickness=8)
  plt.figure(figsize=IMAGE_SIZE)
  plt.imshow(image_np)

<p float="left">
<img src="sample/ssd_mobilenet_v1_300_egohands/images/oxfordhands_0.jpg" width="300"/>
<img src="sample/ssd_mobilenet_v1_300_egohands/images/oxfordhands_1.jpg" width="300"/>
<img src="sample/ssd_mobilenet_v1_300_egohands/images/oxfordhands_2.jpg" width="300"/>
</p>
<p float="left">
<img src="sample/ssd_mobilenet_v1_300_egohands/images/oxfordhands_3.jpg" width="300"/>
<img src="sample/ssd_mobilenet_v1_300_egohands/images/oxfordhands_4.jpg" width="300"/>
<img src="sample/ssd_mobilenet_v1_300_egohands/images/oxfordhands_5.jpg" width="300"/>
</p>

### Convert tensorflow model to CoreML format

In [None]:
# Load the TF graph definition
tf_model_path = PATH_TO_CKPT
with open(tf_model_path, 'rb') as f:
    serialized = f.read()
tf.reset_default_graph()
original_gdef = tf.GraphDef()
original_gdef.ParseFromString(serialized)

with tf.Graph().as_default() as g:
    tf.import_graph_def(original_gdef, name='')

The full MobileNet-SSD TF model contains 4 subgraphs: *Preprocessor*, *FeatureExtractor*, *MultipleGridAnchorGenerator*, and *Postprocessor*. Here we will extract the *FeatureExtractor* from the model and strip off the other subgraphs, as these subgraphs contain structures not currently supported in CoreML. The tasks in *Preprocessor*, *MultipleGridAnchorGenerator* and *Postprocessor* subgraphs can be achieved by other means, although they are non-trivial.

By inspecting TensorFlow GraphDef, it can be found that:
(1) the input tensor of MobileNet-SSD Feature Extractor is `Preprocessor/sub:0` of shape `(1,300,300,3)`, which contains the preprocessed image.
(2) The output tensors are: `concat:0` of shape `(1,1917,4)`, the box coordinate encoding for each of the 1917 anchor boxes; and `concat_1:0` of shape `(1,1917,1)`, the confidence scores (logits) for each of the 1 object classes (including 1 class for background), for each of the 1917 anchor boxes.
So we extract the feature extractor out as follows:

In [None]:
# Strip unused subgraphs and save it as another frozen TF model
from tensorflow.python.tools import strip_unused_lib
from tensorflow.python.framework import dtypes
from tensorflow.python.platform import gfile
input_node_names = ['Preprocessor/sub']
output_node_names = ['concat', 'concat_1']
gdef = strip_unused_lib.strip_unused(
        input_graph_def = original_gdef,
        input_node_names = input_node_names,
        output_node_names = output_node_names,
        placeholder_type_enum = dtypes.float32.as_datatype_enum)
# Save the feature extractor to an output file
frozen_model_file = MOBILENET_EGOHANDS_HOME + "/" + MODEL_NAME + '/ssd_mobilenet_v1_egohands_feature_extractor.pb'
with gfile.GFile(frozen_model_file, "wb") as f:
    f.write(gdef.SerializeToString())

In [None]:
# Now we have a TF model ready to be converted to CoreML
import tfcoreml
# Supply a dictionary of input tensors' name and shape (with # batch axis)
input_tensor_shapes = {"Preprocessor/sub:0":[1,300,300,3]} # batch size is 1
# Output CoreML model path
coreml_model_file = MOBILENET_EGOHANDS_HOME + '/ssd_mobilenet_v1_egohands_feature_extractor.mlmodel'
# The TF model's ouput tensor name
output_tensor_names = ['concat:0', 'concat_1:0']

# Call the converter. This may take a while
coreml_model = tfcoreml.convert(
        tf_model_path=frozen_model_file,
        mlmodel_path=coreml_model_file,
        input_name_shape_dict=input_tensor_shapes,
        output_feature_names=output_tensor_names,
        image_input_names=['Preprocessor/sub:0'],
        image_scale=2./255.,
        red_bias=-1.0,
        green_bias=-1.0,
        blue_bias=-1.0
)

---
<a id='step2'></a>
## Step 2: Train MobileNetv1 with MS-COCO to generate CoreML model

### Check tensorflow model functionality

In [None]:
%cd $TF_OBJECT_DETECTION_HOME

In [None]:
import numpy as np
import tensorflow as tf

from matplotlib import pyplot as plt
from PIL import Image

from utils import label_map_util
from utils import visualization_utils as vis_util

In [None]:
MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17'

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = SAMPLE_HOME +"/" + MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = MSCOCO_HOME + "/data/mscoco_label_map.pbtxt"

NUM_CLASSES = 90

#### Load a (frozen) Tensorflow model into memory.

In [None]:
detection_graph = tf.Graph()
with detection_graph.as_default():
  od_graph_def = tf.GraphDef()
  with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
    serialized_graph = fid.read()
    od_graph_def.ParseFromString(serialized_graph)
    tf.import_graph_def(od_graph_def, name='')

#### Loading label map
Label maps map indices to category names, so that when our convolution network predicts `5`, we know that this corresponds to `airplane`.  Here we use internal utility functions, but anything that returns a dictionary mapping integers to appropriate string labels would be fine

In [None]:
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)

#### Helper code

In [None]:
def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)

#### Detection

In [None]:
# For the sake of simplicity we will use only 2 images:
# image1.jpg
# image2.jpg
# If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS.
PATH_TO_TEST_IMAGES_DIR = MOBILENET_COCO_HOME + '/test_images'
TEST_IMAGE_PATHS = [ PATH_TO_TEST_IMAGES_DIR + '/image{}.jpg'.format(i) for i in range(1, 3) ]

# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)

In [None]:
def run_inference_for_single_image(image, graph):
  with graph.as_default():
    with tf.Session() as sess:
      # Get handles to input and output tensors
      ops = tf.get_default_graph().get_operations()
      all_tensor_names = {output.name for op in ops for output in op.outputs}
      tensor_dict = {}
      for key in [
          'num_detections', 'detection_boxes', 'detection_scores',
          'detection_classes', 'detection_masks'
      ]:
        tensor_name = key + ':0'
        if tensor_name in all_tensor_names:
          tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
              tensor_name)
      if 'detection_masks' in tensor_dict:
        # The following processing is only for single image
        detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
        detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
        # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
        real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
        detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
        detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
        detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
            detection_masks, detection_boxes, image.shape[0], image.shape[1])
        detection_masks_reframed = tf.cast(
            tf.greater(detection_masks_reframed, 0.5), tf.uint8)
        # Follow the convention by adding back the batch dimension
        tensor_dict['detection_masks'] = tf.expand_dims(
            detection_masks_reframed, 0)
      image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')

      # Run inference
      output_dict = sess.run(tensor_dict,
                             feed_dict={image_tensor: np.expand_dims(image, 0)})

      # all outputs are float32 numpy arrays, so convert types as appropriate
      output_dict['num_detections'] = int(output_dict['num_detections'][0])
      output_dict['detection_classes'] = output_dict[
          'detection_classes'][0].astype(np.uint8)
      output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
      output_dict['detection_scores'] = output_dict['detection_scores'][0]
      if 'detection_masks' in output_dict:
        output_dict['detection_masks'] = output_dict['detection_masks'][0]
  return output_dict

In [None]:
for image_path in TEST_IMAGE_PATHS:
  image = Image.open(image_path)
  # the array based representation of the image will be used later in order to prepare the
  # result image with boxes and labels on it.
  image_np = load_image_into_numpy_array(image)
  # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
  image_np_expanded = np.expand_dims(image_np, axis=0)
  # Actual detection.
  output_dict = run_inference_for_single_image(image_np, detection_graph)
  # Visualization of the results of a detection.
  vis_util.visualize_boxes_and_labels_on_image_array(
      image_np,
      output_dict['detection_boxes'],
      output_dict['detection_classes'],
      output_dict['detection_scores'],
      category_index,
      instance_masks=output_dict.get('detection_masks'),
      use_normalized_coordinates=True,
      line_thickness=8)
  plt.figure(figsize=IMAGE_SIZE)
  plt.imshow(image_np)

<p float="left">
<img src="sample/ssd_mobilenet_v1_300_coco/images/image1.jpg" width="300"/>
<img src="sample/ssd_mobilenet_v1_300_coco/images/image2.jpg" width="300"/>
</p>

### Convert tensorflow model to CoreML format

In [None]:
# Load the TF graph definition
tf_model_path = PATH_TO_CKPT
with open(tf_model_path, 'rb') as f:
    serialized = f.read()
tf.reset_default_graph()
original_gdef = tf.GraphDef()
original_gdef.ParseFromString(serialized)

with tf.Graph().as_default() as g:
    tf.import_graph_def(original_gdef, name='')

The full MobileNet-SSD TF model contains 4 subgraphs: *Preprocessor*, *FeatureExtractor*, *MultipleGridAnchorGenerator*, and *Postprocessor*. Here we will extract the *FeatureExtractor* from the model and strip off the other subgraphs, as these subgraphs contain structures not currently supported in CoreML. The tasks in *Preprocessor*, *MultipleGridAnchorGenerator* and *Postprocessor* subgraphs can be achieved by other means, although they are non-trivial.

By inspecting TensorFlow GraphDef, it can be found that:
(1) the input tensor of MobileNet-SSD Feature Extractor is `Preprocessor/sub:0` of shape `(1,300,300,3)`, which contains the preprocessed image.
(2) The output tensors are: `concat:0` of shape `(1,1917,4)`, the box coordinate encoding for each of the 1917 anchor boxes; and `concat_1:0` of shape `(1,1917,91)`, the confidence scores (logits) for each of the 91 object classes (including 1 class for background), for each of the 1917 anchor boxes.
So we extract the feature extractor out as follows:

In [None]:
# Strip unused subgraphs and save it as another frozen TF model
from tensorflow.python.tools import strip_unused_lib
from tensorflow.python.framework import dtypes
from tensorflow.python.platform import gfile
input_node_names = ['Preprocessor/sub']
output_node_names = ['concat', 'concat_1']
gdef = strip_unused_lib.strip_unused(
        input_graph_def = original_gdef,
        input_node_names = input_node_names,
        output_node_names = output_node_names,
        placeholder_type_enum = dtypes.float32.as_datatype_enum)
# Save the feature extractor to an output file
frozen_model_file = 'ssd_mobilenet_v1_feature_extractor.pb'
with gfile.GFile(frozen_model_file, "wb") as f:
    f.write(gdef.SerializeToString())

In [None]:
# Now we have a TF model ready to be converted to CoreML
import tfcoreml
# Supply a dictionary of input tensors' name and shape (with # batch axis)
input_tensor_shapes = {"Preprocessor/sub:0":[1,300,300,3]} # batch size is 1
# Output CoreML model path
coreml_model_file = MOBILENET_COCO_HOME + '/ssd_mobilenet_v1_feature_extractor.mlmodel'
# The TF model's ouput tensor name
output_tensor_names = ['concat:0', 'concat_1:0']

# Call the converter. This may take a while
coreml_model = tfcoreml.convert(
        tf_model_path=frozen_model_file,
        mlmodel_path=coreml_model_file,
        input_name_shape_dict=input_tensor_shapes,
        output_feature_names=output_tensor_names,
        image_input_names=['Preprocessor/sub:0'],
        image_scale=2./255.,
        red_bias=-1.0,
        green_bias=-1.0,
        blue_bias=-1.0,
)

---
<a id='step3'></a>
## Step 3: Train YOLOv2 with MS-COCO to generate CoreML model

### Convert pre-trained darknet weight to tensorflow model
> **Note**: You don't have to do below things. All these things I have done in this docker image. Just run below cell to do it.

1. Download the pre-trained darknet [yolov2 weight](https://pjreddie.com/media/files/yolov2.weights) and rename to bin/yolo.weights.
    
1. Download the [yolov2 configuration file](https://github.com/pjreddie/darknet/blob/master/cfg/yolov2.cfg) and rename to cfg/yolo.cfg.
    - Change width and height to 608
```
    -- before
        width=416
        height=416
    -- after
        width=608
        height=608
```

1. Use the following command to convert the weights to tensorflow pb file
```        
    flow --imgdir sample_img/ --model cfg/yolo.cfg --load bin/yolo.weights --json
```

Run below cells to start to transfer

In [None]:
%cd $YOLOV2_COCO_HOME

In [None]:
!cp loader.py $DARKFLOW_HOME/darkflow/utils

!flow --model cfg/yolo.cfg --load bin/yolo.weights --savepb --verbalise

### Check tensorflow model functionality

In [None]:
!flow --pbLoad built_graph/yolo.pb --metaLoad built_graph/yolo.meta --imgdir sample_img/

In [None]:
%matplotlib inline

from matplotlib import pyplot as plt
from matplotlib import image as mpimg

fig=plt.figure(figsize=(32, 32))
columns = 1
rows = 6
fig.add_subplot(rows, columns, 1)
plt.imshow(mpimg.imread(YOLOV2_COCO_HOME + "/sample_img/out/dog.jpg"))
fig.add_subplot(rows, columns, 2)
plt.imshow(mpimg.imread(YOLOV2_COCO_HOME + "/sample_img/out/eagle.jpg"))
fig.add_subplot(rows, columns, 3)
plt.imshow(mpimg.imread(YOLOV2_COCO_HOME + "/sample_img/out/giraffe.jpg"))
fig.add_subplot(rows, columns, 4)
plt.imshow(mpimg.imread(YOLOV2_COCO_HOME + "/sample_img/out/horses.jpg"))
fig.add_subplot(rows, columns, 5)
plt.imshow(mpimg.imread(YOLOV2_COCO_HOME + "/sample_img/out/person.jpg"))
fig.add_subplot(rows, columns, 6)
plt.imshow(mpimg.imread(YOLOV2_COCO_HOME + "/sample_img/out/scream.jpg"))

<p float="left">
<img src="sample/yolov2_608_coco/images/dog.jpg" width="300"/>
<img src="sample/yolov2_608_coco/images/eagle.jpg" width="300"/>
<img src="sample/yolov2_608_coco/images/giraffe.jpg" width="300"/>
</p>
<p float="left">
<img src="sample/yolov2_608_coco/images/horses.jpg" width="300"/>
<img src="sample/yolov2_608_coco/images/person.jpg" width="300"/>
<img src="sample/yolov2_608_coco/images/scream.jpg" width="300"/>
</p>



### Convert tensorflow model to CoreML format

In [None]:
%cd $YOLOV2_COCO_HOME

In [None]:
import tfcoreml as tf_converter

import tensorflow as tf

#### Output the graph

In this step we just want to know the exact name of input and output nodes in the tensorflow graph

In [None]:
def load_graph(frozen_graph_filename):
    # We load the protobuf file from the disk and parse it to retrieve the 
    # unserialized graph_def
    with tf.gfile.GFile(frozen_graph_filename, "rb") as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())

    # Then, we import the graph_def into a new Graph and return it 
    with tf.Graph().as_default() as graph:
        tf.import_graph_def(graph_def, name="")
    return graph

graph = load_graph('built_graph/yolo.pb')
# for op in graph.get_operations(): 
#     print (op.name)

#### Convert to mlmodel format

In the previose step, we know the output and input node names. And we can also get the input shape size from the cfg file. We specify these in the convert procedure and save the mlmodel file as yolo.mlmodel .

In [None]:
coreml_model = tf_converter.convert(tf_model_path = 'built_graph/yolo.pb',
                                     mlmodel_path = 'yolo.mlmodel',
                             output_feature_names = ['output:0'],  # the output node name we get from the previouse step
                                 image_input_names= ['input:0'],   # CoreML allows image as the input, the only thing we need to do is to set which node is the image input node 
                            input_name_shape_dict = {'input:0' : [1, 608, 608, 3]},  # the input node name we get from the previous step, and check the cfg file to know the exact input shape size
                                   is_bgr = True,   # the channel order is by BGR instead of RGB
                                   image_scale = 1 / 255.0)	 # the weights is already normalized in the range from 0 to 1

---
<a id='step4'></a>
## Step 4: Train Tiny YOLO with MS-COCO to generate CoreML model

### Transfer pre-trained darknet weight to tensorflow format
> **Note**: You don't have to do below things. All these things I have done in this docker image. Just run below cell to do it.

1. Download the pre-trained darknet [yolov2-tiny weight](https://pjreddie.com/media/files/yolov2-tiny.weights) and rename to bin/tiny-yolo.weights.

1. Download the [yolov2-tiny configuration file](https://github.com/pjreddie/darknet/blob/master/cfg/yolov2-tiny.cfg) and rename to cfg/tiny-yolo.cfg.
    - Change width and height to 608
```
    -- before
        width=416
        height=416
    -- after
        width=608
        height=608
```

1. Modify darkflow/darkflow/utils/loader.py to fix build error
```
    -- before
        self.offset = 16
    -- after    
        self.offset = 20
```

1. Use the following command to convert the weights to tensorflow pb file   
```
    flow --imgdir sample_img/ --model cfg/tiny-yolo.cfg --load bin/tiny-yolo.weights --json
```

Run below cells to start to transfer


In [None]:
%cd $YOLOV2_TINY_COCO_HOME

In [None]:
!cp loader.py $DARKFLOW_HOME/darkflow/utils

!flow --model cfg/tiny-yolo.cfg --load bin/tiny-yolo.weights --savepb --verbalise

### Check tensorflow model functionality

In [None]:
!flow --pbLoad built_graph/tiny-yolo.pb --metaLoad built_graph/tiny-yolo.meta --imgdir sample_img/

In [None]:
%matplotlib inline

from matplotlib import pyplot as plt
from matplotlib import image as mpimg

fig=plt.figure(figsize=(32, 32))
columns = 1
rows = 6
fig.add_subplot(rows, columns, 1)
plt.imshow(mpimg.imread(YOLOV2_TINY_COCO_HOME + "/sample_img/out/dog.jpg"))
fig.add_subplot(rows, columns, 2)
plt.imshow(mpimg.imread(YOLOV2_TINY_COCO_HOME + "/sample_img/out/eagle.jpg"))
fig.add_subplot(rows, columns, 3)
plt.imshow(mpimg.imread(YOLOV2_TINY_COCO_HOME + "/sample_img/out/giraffe.jpg"))
fig.add_subplot(rows, columns, 4)
plt.imshow(mpimg.imread(YOLOV2_TINY_COCO_HOME + "/sample_img/out/horses.jpg"))
fig.add_subplot(rows, columns, 5)
plt.imshow(mpimg.imread(YOLOV2_TINY_COCO_HOME + "/sample_img/out/person.jpg"))
fig.add_subplot(rows, columns, 6)
plt.imshow(mpimg.imread(YOLOV2_TINY_COCO_HOME + "/sample_img/out/scream.jpg"))

<p float="left">
<img src="sample/yolov2_tiny_608_coco/images/dog.jpg" width="300"/>
<img src="sample/yolov2_tiny_608_coco/images/eagle.jpg" width="300"/>
<img src="sample/yolov2_tiny_608_coco/images/giraffe.jpg" width="300"/>
</p>
<p float="left">
<img src="sample/yolov2_tiny_608_coco/images/horses.jpg" width="300"/>
<img src="sample/yolov2_tiny_608_coco/images/person.jpg" width="300"/>
<img src="sample/yolov2_tiny_608_coco/images/scream.jpg" width="300"/>
</p>

### Convert tensorflow weight to CoreML format

In [None]:
%cd $YOLOV2_TINY_COCO_HOME

In [None]:
import tfcoreml as tf_converter

import tensorflow as tf

#### Output the graph

In this step we just want to know the exact name of input and output nodes in the tensorflow graph

In [None]:
def load_graph(frozen_graph_filename):
    # We load the protobuf file from the disk and parse it to retrieve the 
    # unserialized graph_def
    with tf.gfile.GFile(frozen_graph_filename, "rb") as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())

    # Then, we import the graph_def into a new Graph and return it 
    with tf.Graph().as_default() as graph:
        tf.import_graph_def(graph_def, name="")
    return graph

graph = load_graph('built_graph/tiny-yolo.pb')
# for op in graph.get_operations(): 
#     print (op.name)

#### Convert to mlmodel format

In the previose step, we know the output and input node names. And we can also get the input shape size from the cfg file. We specify these in the convert procedure and save the mlmodel file as tiny_yolo.mlmodel .

In [None]:
coreml_model = tf_converter.convert(tf_model_path = 'built_graph/tiny-yolo.pb',
                                     mlmodel_path = 'tiny_yolo.mlmodel',
                             output_feature_names = ['output:0'],  # the output node name we get from the previouse step
                                 image_input_names= ['input:0'],   # CoreML allows image as the input, the only thing we need to do is to set which node is the image input node 
                            input_name_shape_dict = {'input:0' : [1, 608, 608, 3]},  # the input node name we get from the previous step, and check the cfg file to know the exact input shape size
                                   is_bgr = True,   # the channel order is by BGR instead of RGB
                                   image_scale = 1 / 255.0)	 # the weights is already normalized in the range from 0 to 1