<a href="https://colab.research.google.com/github/nyp-sit/sdaai-pdc2-students/blob/master/iti107/session-5/od_using_tfod_api/object_detection_using_tfod_api.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" align="left"/></a>

# Object Detection using Tensorflow Object Detection API (aka TFOD API)

Welcome to the programming exercise of 'Object Detection using TFOD API'.  This notebook will walk you through, step by step, the process of using the TFOD API for object detection.

Before you can run the codes in this notebook, ensure the TFOD API has been installed. If you are using the lab machine or the cloud VM that is provided, the TFOD API has been already been installed. If you are using your own machine, make sure to follow the [TFOD API installation instructions](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md) before you start. 

Ensure that you are using Tensorflow 1.14 environment.

***Credit: This notebook is adapted from the Object Detection Tutorial in the TFOD API.***

## 1. Imports

In [None]:
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
from shutil import copy2

from distutils.version import StrictVersion
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image


### Import Tensorflow object detection API 

As Tensorflow object detection library is not installed as PIP package, we need to specifically tell the Python interpreter where to look for the different modules in the object detection package.  We need to add the install directory to the search path of the Python interpreter by appending them to the `sys.path` or the environment variable `PYTHONPATH`. Modify the following according to your own system environment.

## 2. Environment setup

In [None]:
# root Tensorflow model directory. Modify this accordingly
#TF_MODELS_RESEARCH_DIR = '/home/ubuntu/git/tensorflow/models/research'
TF_MODELS_RESEARCH_DIR = '/Users/markk/git/tensorflow/models/research'
TF_SLIM_DIR = os.path.join(TF_MODELS_RESEARCH_DIR, 'slim')
TF_OD_DIR = os.path.join(TF_MODELS_RESEARCH_DIR, 'object_detection')

sys.path.append(TF_MODELS_RESEARCH_DIR)
sys.path.append(TF_SLIM_DIR)
sys.path.append(TF_OD_DIR)

### TFOD API imports
Here are the imports of the required object detection modules in TFOD API

In [None]:
from utils import ops as utils_ops
from utils import label_map_util
from utils import visualization_utils as vis_util

## 3. Model preparation 

### Choose the detection model

Any model exported using the `export_inference_graph.py` tool of TFOD_API can be loaded here simply by changing `PATH_TO_FROZEN_GRAPH` to point to a new .pb file.   We will cover the export_inference_graph tool in the next exercise when we do our own custom training.

By default we use an "SSD with Mobilenet" model here. See the [detection model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md) for a list of other models that can be run out-of-the-box with varying speeds and accuracies. Note the filename of the downloaded file is in the format of \<model name\>.tar.gz, e.g. **faster_rcnn_resnet50_coco_2018_01_28.tar.gz**. Change the variable **MODEL_NAME** below to the \<model name\>, e.g. **faster_rcnn_resnet50_coco_2018_01_28**. 

You will also need to provide the path to the appropriate label map file (explained later in 'Loading Label Map'). A list of label map files (with the file suffix .pbtxt) is provided in the `data` subfolder in the TFOD API object detection folder. So depending on the model you chose, copy the mapping file (.pbtxt) to appropriate working directory (e.g. the current directory of this notebook). In the example below, since we chose the model 'ssd_mobilenet_v1_coco_2017_11_17' which is trained on mscoco dataset, we will copy the file 'mscoco_label_map.pbtxt' to the current directory. If you train your own custom detection model, you will need to provide your own label map file.

In [None]:
# What model to download.
MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17'
MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb'


# List of the strings that is used to add correct label for each box.
LABEL_FILE = 'mscoco_label_map.pbtxt'
PATH_TO_LABELS = os.path.join(TF_OD_DIR, 'data',LABEL_FILE)

copy2(PATH_TO_LABELS, LABEL_FILE)


### Download Model

Now we download the pre-trained model from the model zoo and extract the frozen graph.

In [None]:
opener = urllib.request.URLopener()
opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
tar_file = tarfile.open(MODEL_FILE)
for file in tar_file.getmembers():
    file_name = os.path.basename(file.name)
    if 'frozen_inference_graph.pb' in file_name:
        tar_file.extract(file, os.getcwd())

### Load the (frozen) Tensorflow model into memory.

We will now load the frozne graph (downloaded earlier from model zoo) into the memory. A frozen graph is a tensorflow graph that cannot be trained (hence the word frozen) and is meant for inference only.

In [None]:
detection_graph = tf.Graph()
with detection_graph.as_default():
    od_graph_def = tf.GraphDef()
    with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:
        serialized_graph = fid.read()
        od_graph_def.ParseFromString(serialized_graph)
        tf.import_graph_def(od_graph_def, name='')

### Loading label map
A 'Label map' maps indices to category names, so that when our convolution network predicts `5`, we know that this corresponds to `airplane`.  Here we use internal utility function of TFOD API, but anything that returns a dictionary mapping integers to appropriate string labels would be fine.

In [None]:
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)

## 4. Object Detection on Image

In [None]:
# This is needed to display the images.
%matplotlib inline

### Helper code

The image is read using Pillow as an Image object. Image.size gives the dimension of image as widht, height ordering. `Image.getdata()` gives a flattened array of bytes, so we need to reshape it to `(height, width, channels)`

In [None]:
def load_image_into_numpy_array(image):
    (im_width, im_height) = image.size
    return np.array(image.getdata()).reshape(
        (im_height, im_width, 3)).astype(np.uint8)

For the models that are trained with TFOD API, some standard tensor names are used, e.g. num_detections, detection_boxes, 'detection_scores', 'detection_classes', etc. 

The following codes assume the presence of the following tensors 

- detection_boxes: coordinates of the detection boxes in the image.
- detection_scores: detection scores for the detection boxes in the image.
- detection_classes: detection-level class labels.
- num_detections: number of detections in the batch.

In our case, our training specifies maximum total detections (max_total_detections) of 100 and also maximum detections per class (max_detections_per_class) of 100, the output tensors for detection_scores, detection_classes are of the shape (?,100) and for the detection_boxes it is (?, 100, 4) where the 4 refer to the diagonal corners of the bounding box.

Here, we read the image file using pillow Image class.  Remember that our network always expect the tensors to be fed in batches, we need to add additional dimension as first axis, by calling np.expand_dims(x, axis=0).

We then call the `run_inference_for_single_image()` defined above to get the predicted bounding boxes and classes.  We use the utility function provided by TFOD API: visualization_utils.visualize_boxes_and_labels_on_image_array() to draw the boxes on the image. We can control the score threshold for a box to be visualized by changing the `min_score_thresh` parameter value. 

If the label text is not clear or illegible, you may want to change the font used by the `visualize_boxes_and_labels_on_image_array()`. By default, it will try to load the font called arial.ttf and if there is an error in loading, it will then call `ImageFont.load_default()` and this default font may not be legible on certain platform (e.g. MacOS).  For more info on ImageFont, refers to [PIL documentation](https://pillow.readthedocs.io/en/stable/reference/ImageFont.html)


In [None]:
def run_inference_for_single_image(image_path, graph):
    image = Image.open(image_path)
    
    with graph.as_default():
        with tf.Session() as sess:
        # Get handles to input and output tensors
            image_tensor = graph.get_tensor_by_name('image_tensor:0')
            detection_boxes = graph.get_tensor_by_name('detection_boxes:0')
            detection_scores = graph.get_tensor_by_name('detection_scores:0')
            detection_classes = graph.get_tensor_by_name('detection_classes:0')
            num_detections = graph.get_tensor_by_name('num_detections:0')

            image_np = load_image_into_numpy_array(image)
            # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
            image_np_expanded = np.expand_dims(image_np, axis=0)
            [detection_boxes, detection_scores, detection_classes, num_detections],
            # Run inference
            
            (boxes, scores, classes, num) = sess.run(
                            [detection_boxes, detection_scores, detection_classes, num_detections],
                            feed_dict={image_tensor: image_np_expanded})
            vis_util.visualize_boxes_and_labels_on_image_array(
                        image_np,
                        np.squeeze(boxes),
                        np.squeeze(classes).astype(np.int32),
                        np.squeeze(scores),
                        category_index,
                        min_score_thresh=0.4,
                        use_normalized_coordinates=True,
                        line_thickness=10)
            

            # Size, in inches, of the output images.
            IMAGE_SIZE = (12, 8)
            plt.figure(figsize=IMAGE_SIZE)
            plt.imshow(image_np)
    

In [None]:
image = 'data/dog.jpg'
run_inference_for_single_image(image, detection_graph)

## 5. Object Detection on Video (Optional) 

Only run this when you are using a local computer, as the cv2 video player window is shown as a separate window on local computer, not within the notebook. 

In [None]:
import cv2

def run_inference_for_video(video_filepath, graph):
    video_player = cv2.VideoCapture(video_filepath)

    with graph.as_default():
        with tf.Session() as sess:
            image_tensor = graph.get_tensor_by_name('image_tensor:0')
            detection_boxes = graph.get_tensor_by_name('detection_boxes:0')
            detection_scores = graph.get_tensor_by_name('detection_scores:0')
            detection_classes = graph.get_tensor_by_name('detection_classes:0')
            num_detections = graph.get_tensor_by_name('num_detections:0')

            while video_player.isOpened():
                ret, image_np = video_player.read()
                if ret:
                    image_np_expanded = np.expand_dims(image_np, axis=0)

                    (boxes, scores, classes, num) = sess.run(
                      [detection_boxes, detection_scores, detection_classes, num_detections],
                      feed_dict={image_tensor: image_np_expanded})

                    vis_util.visualize_boxes_and_labels_on_image_array(
                        image_np,
                        np.squeeze(boxes),
                        np.squeeze(classes).astype(np.int32),
                        np.squeeze(scores),
                        category_index,
                        use_normalized_coordinates=True,
                        line_thickness=10)

                    cv2.imshow('Object Detection', image_np)
                    if cv2.waitKey(1) == 13: #13 is the Enter Key
                        break
                else:
                    break
                    
    # Release camera and close windows
    video_player.release()
    cv2.destroyAllWindows() 
    cv2.waitKey(1)

In [None]:
video_file = 'data/dashcam.mp4'
run_inference_for_video(video_file, detection_graph)