<a href="https://colab.research.google.com/github/DataKind-SG/otters-spotting/blob/master/monkey_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Extract Identified Monkeys in Bounding Boxes

Uses pretrained object detection model in Tensorflow

Potential improvements:
1. Currently takes the first monkey that is detected, also doesn't throw an error if no monkey is detected.
2. Too slow, 1 min / pic.

**Mount Google Drive**

In [0]:
from google.colab import drive
drive.mount('/content/gdrive/')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/gdrive/


**Clone Model**

In [0]:
!git clone https://github.com/tensorflow/models.git
!apt-get -qq install libprotobuf-java protobuf-compiler
!protoc ./models/research/object_detection/protos/string_int_label_map.proto --python_out=.
!cp -R models/research/object_detection/ object_detection/
!rm -rf models

Cloning into 'models'...
remote: Enumerating objects: 6, done.[K
remote: Counting objects: 100% (6/6), done.[K
remote: Compressing objects: 100% (6/6), done.[K
remote: Total 21658 (delta 0), reused 0 (delta 0), pack-reused 21652[K
Receiving objects: 100% (21658/21658), 559.41 MiB | 33.85 MiB/s, done.
Resolving deltas: 100% (12712/12712), done.
Checking out files: 100% (2809/2809), done.
Selecting previously unselected package libprotobuf10:amd64.
(Reading database ... 18408 files and directories currently installed.)
Preparing to unpack .../libprotobuf10_3.0.0-9ubuntu5_amd64.deb ...
Unpacking libprotobuf10:amd64 (3.0.0-9ubuntu5) ...
Selecting previously unselected package libprotoc10:amd64.
Preparing to unpack .../libprotoc10_3.0.0-9ubuntu5_amd64.deb ...
Unpacking libprotoc10:amd64 (3.0.0-9ubuntu5) ...
Selecting previously unselected package libprotobuf-java.
Preparing to unpack .../libprotobuf-java_3.0.0-9ubuntu5_all.deb ...
Unpacking libprotobuf-java (3.0.0-9ubuntu5) ...
Selectin

**Import necessary packages**

In [0]:
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
import re

from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt

from PIL import Image

from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util

This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.

The backend was *originally* set to 'module://ipykernel.pylab.backend_inline' by the following code:
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/usr/local/lib/python3.6/dist-packages/traitlets/config/application.py", line 657, in launch_instance
    app.initialize(argv)
  File "<decorator-gen-121>", line 2, in initialize
  File "/usr/local/lib/python3.6/dist-packages/traitlets/config/application.py", line 87, in catch_config_error
    return method(app, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-pac

**Check GPU allocated**

Note: without GPU it's quite slow.

In [0]:
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Found GPU at: /device:GPU:0


In [0]:
# Check whether monkey image is inside the drive
# This set up assumes that you've got a Dir named "monkey_image" on your google drive root that have all the images
# And a sub-directory in that folder called "monkey_image_processed" where the cropped photos will be stored.
# !ls "/content/gdrive/My Drive/monkey_image"

**Download model for processing**

In [0]:
# What model to download.
# MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17'

# model with more accurancy but up to you use a diferent model
MODEL_NAME = 'faster_rcnn_inception_resnet_v2_atrous_lowproposals_oid_2018_01_28'

MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('object_detection/data', 'oid_bbox_trainable_label_map.pbtxt')

NUM_CLASSES = 90

opener = urllib.request.URLopener()
opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
tar_file = tarfile.open(MODEL_FILE)
for file in tar_file.getmembers():
  file_name = os.path.basename(file.name)
  if 'frozen_inference_graph.pb' in file_name:
    tar_file.extract(file, os.getcwd())
    
detection_graph = tf.Graph()
with detection_graph.as_default():
  od_graph_def = tf.GraphDef()
  with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
    serialized_graph = fid.read()
    od_graph_def.ParseFromString(serialized_graph)
    tf.import_graph_def(od_graph_def, name='')
    
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)

In [0]:
def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)

**Loading label map**

Monkey is 169

In [0]:
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)

**Helper code**

In [0]:
def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)

**Find the directory and the images**

Will need to change later when we have more than 219 images.

In [0]:
PATH_TO_TEST_IMAGES_DIR = '/content/gdrive/My Drive/monkey_image'
PATH_TO_OUTPUT_IMAGES_DIR = '/content/gdrive/My Drive/monkey_image/monkey_images_processed'
CONTAINED_IMAGE_PATHS = [f for f in os.listdir(PATH_TO_TEST_IMAGES_DIR) if f.endswith('.jpg')]
TEST_IMAGE_PATHS = set(CONTAINED_IMAGE_PATHS) - set(os.listdir(PATH_TO_OUTPUT_IMAGES_DIR))

**Run Inference for a single image**

This need to be put in a loop later on.

In [0]:
def run_inference_for_single_image(image, graph):
  with graph.as_default():
    with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
      # Get handles to input and output tensors
      ops = tf.get_default_graph().get_operations()
      all_tensor_names = {output.name for op in ops for output in op.outputs}
      tensor_dict = {}
      for key in [
          'num_detections', 'detection_boxes', 'detection_scores',
          'detection_classes', 'detection_masks'
      ]:
        tensor_name = key + ':0'
        if tensor_name in all_tensor_names:
          tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
              tensor_name)
      if 'detection_masks' in tensor_dict:
        # The following processing is only for single image
        detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
        detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
        # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
        real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
        detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
        detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
        detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
            detection_masks, detection_boxes, image.shape[0], image.shape[1])
        detection_masks_reframed = tf.cast(
            tf.greater(detection_masks_reframed, 0.5), tf.uint8)
        # Follow the convention by adding back the batch dimension
        tensor_dict['detection_masks'] = tf.expand_dims(
            detection_masks_reframed, 0)
      image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')

      # Run inference
      output_dict = sess.run(tensor_dict,
                             feed_dict={image_tensor: np.expand_dims(image, 0)})

      # all outputs are float32 numpy arrays, so convert types as appropriate
      output_dict['num_detections'] = int(output_dict['num_detections'][0])
      output_dict['detection_classes'] = output_dict[
          'detection_classes'][0].astype(np.uint8)
      output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
      output_dict['detection_scores'] = output_dict['detection_scores'][0]
      if 'detection_masks' in output_dict:
        output_dict['detection_masks'] = output_dict['detection_masks'][0]
  return output_dict

**Label images**

Will keep time out on Colab about every 30+ mins as they do not allow long jobs. Current speed with GPU is about 1 min / pic. Speed up over CPU is not significant, not sure what I'm doing wrong.

Currently takes the first monkey that is detected, also doesn't throw an error if no monkey is detected.

In [0]:
for image_path in TEST_IMAGE_PATHS:
    INPUT_IMAGE_PATH = os.path.join(PATH_TO_TEST_IMAGES_DIR, image_path)
    
    # Stores image in the output directory
    OUTPUT_IMAGE_PATH = os.path.join(PATH_TO_OUTPUT_IMAGES_DIR, image_path)
    image = Image.open(INPUT_IMAGE_PATH)
    # the array based representation of the image will be used later in order to prepare the
    # result image with boxes and labels on it.
    image_np = load_image_into_numpy_array(image)
    # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
    image_np_expanded = np.expand_dims(image_np, axis=0)
    # Actual detection.
    output_dict = run_inference_for_single_image(image_np, detection_graph)
    # Retrieve the first box labelled as monkey.
    # Catch instances where no monkey is detected, but could be improved to throw an error as well.
    if np.sum([output_dict['detection_classes'] == 169, ]) != 0:
      x, y = image.size
      ymin, xmin, ymax, xmax = output_dict['detection_boxes'][output_dict['detection_classes'] == 169, ][0]
      coords = (xmin * x, ymin * y, xmax * x, ymax * y)
      cropped_image = image.crop(coords)
      cropped_image.save(OUTPUT_IMAGE_PATH)

In [0]:
# Cell to test stuff
TEST_IMAGE_PATHS

{'image_00038.jpg',
 'image_00043.jpg',
 'image_00167.jpg',
 'image_00176.jpg',
 'image_00204.jpg',
 'image_00206.jpg'}