## Inference with the TensorFlow Object Detection API

Date created: May 20, 2018   
Last modified: May 29, 2018  
Tags: TensorFlow Object Detection API, Faster RCNN, COCO dataset

In the examples below, we will use models trained on the [COCO dataset](http://cocodataset.org/) for out-of-the-box inference. The models are  available in the TensorFlow Object Detection API [Model Zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md). The objective is to detect, localize, label (and mask, if using Mask RCNN) objects in an input video and generate an output video with these labelings. 

The Tensorflow Object Detection API and Tensorflow need to be installed prior to running these examples. These are the  [installation instructions](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md).


### I. Imports

In [6]:
import os
import sys

import numpy as np
import tensorflow as tf

import time
import multiprocessing

import cv2
from matplotlib import pyplot as plt

# object detection imports
sys.path.append("..")
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util

# imports to edit/save/watch video clips
from moviepy.editor import VideoFileClip
from IPython.display import HTML

### II. Model preparation 

#### 1. Variables

Any model exported using the `export_inference_graph.py` tool can be loaded here simply by changing `PATH_TO_CKPT` to point to a new .pb file.  

See the [detection model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md) for a list of models that can be run out-of-the-box with varying speeds and accuracies.

In [7]:
# Model (download and untar the model into the object detection folder)
#MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17'
#MODEL_NAME = 'mask_rcnn_resnet101_atrous_coco_2018_01_28'
MODEL_NAME = 'faster_rcnn_resnet101_coco_2018_01_28'

# Path to frozen detection graph. This is the actual model used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings. These are the labels for each box.
PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')

# The COCO dataset has 90 classes
NUM_CLASSES = 90

#### 2. Load a (frozen) Tensorflow model into memory

In [8]:
detection_graph = tf.Graph()
with detection_graph.as_default():
    od_graph_def = tf.GraphDef()
    with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
        serialized_graph = fid.read()
        od_graph_def.ParseFromString(serialized_graph)
        tf.import_graph_def(od_graph_def, name='')

#### 3. Loading label map
Label maps map indices to category names, so that when our convolution network predicts `5`, we know that this corresponds to `airplane`.  Here we use internal utility functions, but anything that returns a dictionary mapping integers to appropriate string labels would be fine

In [9]:
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)

### III. Object detection for video

from Priyanka Dwivedi's object detection [github page](https://github.com/priya-dwivedi/Deep-Learning/blob/master/Object_Detection_Tensorflow_API.ipynb)

In [11]:
def detect_objects(image_np, sess, detection_graph):
    # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
    image_np_expanded = np.expand_dims(image_np, axis=0)
    image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')

    boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
    scores = detection_graph.get_tensor_by_name('detection_scores:0')
    classes = detection_graph.get_tensor_by_name('detection_classes:0')
    num_detections = detection_graph.get_tensor_by_name('num_detections:0')

    # Actual detection
    (boxes, scores, classes, num_detections) = sess.run(
        [boxes, scores, classes, num_detections],
        feed_dict={image_tensor: image_np_expanded})

    # Visualization of the results of a detection.
    vis_util.visualize_boxes_and_labels_on_image_array(
        image_np,
        np.squeeze(boxes),
        np.squeeze(classes).astype(np.int32),
        np.squeeze(scores),
        category_index,
        use_normalized_coordinates=True,
        line_thickness=8)
    return image_np

In [12]:
def process_image(image):
    # NOTE: The output you return should be a color image (3 channel) for processing video below
    
    with detection_graph.as_default():
        with tf.Session(graph=detection_graph) as sess:
            image_process = detect_objects(image, sess, detection_graph)
            return image_process

In [29]:
white_output = 'worthit_cake4_out.mp4'
clip1 = VideoFileClip("worthit_cake.mp4").subclip(175,190)  # time in seconds
white_clip = clip1.fl_image(process_image) #NOTE: this function expects color images
%time white_clip.write_videofile(white_output, audio=False)

[MoviePy] >>>> Building video worthit_cake4_out.mp4
[MoviePy] Writing video worthit_cake4_out.mp4


100%|██████████| 360/360 [2:41:40<00:00, 26.95s/it]


[MoviePy] Done.
[MoviePy] >>>> Video ready: worthit_cake4_out.mp4 

CPU times: user 3h 58s, sys: 42min 44s, total: 3h 43min 42s
Wall time: 2h 41min 41s


### IV. Output examples
##### 1. The Worthit guys eating cake

In [3]:
HTML("""
<div align="middle">
<video width="80%" controls>
      <source src="worthit_cake4_out.mp4" type="video/mp4">
</video></div>""")

##### 2. Dog performing cartwheels

In [1]:
%%HTML
<div align="middle">
<video width="80%" controls>
      <source src="dog_cartwheel2_out.mp4" type="video/mp4">
</video></div>

Author:  Meena Mani  <br>
Email:   meenas.mailbag@gmail.com   <br> 
Twitter: @meena_uvaca    <br>