# Mask R-CNN Image Segmentation

In this notebook, we will explore how to use a [Mask R-CNN](https://arxiv.org/abs/1703.06870) model from TensorFlow Hub for object detection and instance segmentation. This model not only identifies bounding boxes but also predicts segmentation masks for each class instance in an image. In this notebook, there are many familiar commands, which we'll now adapt for instance segmentation models.

<br/>

*Note: It is a good idea to use a TPU runtime for this notebook because of the processing requirements for this model. Therefore, the suggestion is to open this notebook in Google Colab and switch the runtime by selecting Runtime --> Change runtime type and then choosing TPU.*

## Installation

As mentioned, we will utilize the TensorFlow 2 [Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection). This involves cloning the [TensorFlow Model Garden](https://github.com/tensorflow/models) and installing the object detection packages

In [None]:
# Clone the tensorflow models repository
!git clone --depth 1 https://github.com/tensorflow/models

Cloning into 'models'...
remote: Enumerating objects: 4107, done.[K
remote: Counting objects: 100% (4107/4107), done.[K
remote: Compressing objects: 100% (3108/3108), done.[K
remote: Total 4107 (delta 1188), reused 2036 (delta 937), pack-reused 0[K
Receiving objects: 100% (4107/4107), 45.35 MiB | 50.75 MiB/s, done.
Resolving deltas: 100% (1188/1188), done.


In [None]:
%%bash
sudo apt install -y protobuf-compiler
cd models/research/
protoc object_detection/protos/*.proto --python_out=.
cp object_detection/packages/tf2/setup.py .
python -m pip install .

Reading package lists...
Building dependency tree...
Reading state information...
protobuf-compiler is already the newest version (3.12.4-1ubuntu7.22.04.1).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Processing /content/models/research
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting avro-python3 (from object-detection==0.1)
  Downloading avro-python3-1.10.2.tar.gz (38 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting apache-beam (from object-detection==0.1)
  Downloading apache_beam-2.56.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.5/14.5 MB 57.6 MB/s eta 0:00:00
Collecting lxml (from object-detection==0.1)
  Downloading lxml-5.2.2-cp310-cp310-manylinux_2_28_x86_64.whl (5.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.0/5.0 MB 64.3 MB/s eta 0:00:00
Collect





## Import libraries

In [None]:
import matplotlib
import matplotlib.pyplot as plt

import numpy as np
from six import BytesIO
from PIL import Image
from six.moves.urllib.request import urlopen

import tensorflow as tf
import tensorflow_hub as hub

from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as viz_utils
from object_detection.utils import ops as utils_ops

tf.get_logger().setLevel('ERROR')

%matplotlib inline

## Utilities

For convenience, we will use a function to convert an image to a numpy array. This function can handle both local paths and URLs to images, as demonstrated in the `TEST_IMAGES` dictionary. Some paths in this dictionary refer to test images included with the API package (e.g., `Beach`), while others are URLs pointing to online images (e.g., `Street`).

In [None]:
def load_image_into_numpy_array(path):
  """Load an image from file into a numpy array.

  Puts image into numpy array to feed into tensorflow graph.
  Note that by convention we put it into a numpy array with shape
  (height, width, channels), where channels=3 for RGB.

  Args:
    path: the file path to the image

  Returns:
    uint8 numpy array with shape (img_height, img_width, 3)
  """
  image = None
  if(path.startswith('http')):
    response = urlopen(path)
    image_data = response.read()
    image_data = BytesIO(image_data)
    image = Image.open(image_data)
  else:
    image_data = tf.io.gfile.GFile(path, 'rb').read()
    image = Image.open(BytesIO(image_data))

  (im_width, im_height) = (image.size)
  return np.array(image.getdata()).reshape(
      (1, im_height, im_width, 3)).astype(np.uint8)


# Dictionary with image tags as keys, and image paths as values
TEST_IMAGES = {
  'Beach' : 'models/research/object_detection/test_images/image2.jpg',
  'Dogs' : 'models/research/object_detection/test_images/image1.jpg',
  # By Américo Toledano, Source: https://commons.wikimedia.org/wiki/File:Biblioteca_Maim%C3%B3nides,_Campus_Universitario_de_Rabanales_007.jpg
  'Phones' : 'https://upload.wikimedia.org/wikipedia/commons/thumb/0/0d/Biblioteca_Maim%C3%B3nides%2C_Campus_Universitario_de_Rabanales_007.jpg/1024px-Biblioteca_Maim%C3%B3nides%2C_Campus_Universitario_de_Rabanales_007.jpg',
  # By 663highland, Source: https://commons.wikimedia.org/wiki/File:Kitano_Street_Kobe01s5s4110.jpg
  'Street' : 'https://upload.wikimedia.org/wikipedia/commons/thumb/0/08/Kitano_Street_Kobe01s5s4110.jpg/2560px-Kitano_Street_Kobe01s5s4110.jpg'
}

## Load the Model

TensorFlow Hub offers a Mask-RCNN model integrated with the Object Detection API. The specifics of this model can he explored [here](https://tfhub.dev/tensorflow/mask_rcnn/inception_resnet_v2_1024x1024/1). We'll begin by loading the model and then explore how to use it for inference in the following section.

In [None]:
model_display_name = 'Mask R-CNN Inception ResNet V2 1024x1024'
model_handle = 'https://tfhub.dev/tensorflow/mask_rcnn/inception_resnet_v2_1024x1024/1'

print('Selected model:'+ model_display_name)
print('Model Handle at TensorFlow Hub: {}'.format(model_handle))

Selected model:Mask R-CNN Inception ResNet V2 1024x1024
Model Handle at TensorFlow Hub: https://tfhub.dev/tensorflow/mask_rcnn/inception_resnet_v2_1024x1024/1


In [None]:
# This will take 10 to 15 minutes to finish
print('loading model...')
hub_model = hub.load(model_handle)
print('model loaded!')

loading model...
model loaded!


## Inference

We will utilize the model we just loaded to perform instance segmentation on an image. We start doing this by selecting one of the test images we previously defined and converting it into a numpy array.

For inference, we pass the numpy array of a *single* image to the model, as it does not support batch processing. This will generate a dictionary containing the results, detailed in the `Outputs` section of the [documentation](https://tfhub.dev/tensorflow/mask_rcnn/inception_resnet_v2_1024x1024/1).

In [None]:
# Choose one and use as key for TEST_IMAGES below:
# ['Beach', 'Street', 'Dogs','Phones']

image_path = TEST_IMAGES['Street']

image_np = load_image_into_numpy_array(image_path)

plt.figure(figsize=(24,32))
plt.imshow(image_np[0])
plt.show()

Output hidden; open in https://colab.research.google.com to view.

In [None]:
# Run inference
results = hub_model(image_np)

# Output values are tensors and we only need the numpy() parameter when we visualize the results
result = {key:value.numpy() for key,value in results.items()}

# Print the keys
for key in result.keys():
  print(key)

num_proposals
detection_classes
rpn_objectness_predictions_with_background
detection_scores
image_shape
class_predictions_with_background
rpn_features_to_crop
num_detections
detection_multiclass_scores
rpn_box_predictor_features
detection_masks
proposal_boxes_normalized
mask_predictions
detection_anchor_indices
box_classifier_features
rpn_box_encodings
refined_box_encodings
raw_detection_scores
final_anchors
proposal_boxes
anchors
detection_boxes
raw_detection_boxes


## Visualizing the results

Next, to visualize the results on the original image, create a `category_index` dictionary containing class IDs and names. Since the model was trained on the [COCO2017 dataset](https://cocodataset.org/), and the API package stores labels in a different format (`mscoco_label_map.pbtxt`), use the `create_category_index_from_labelmap` utility function from the TensorFlow Object Detection API to convert this to the required dictionary format. This will facilitate mapping from model outputs to human-readable class names during result plotting.

In [None]:
PATH_TO_LABELS = './models/research/object_detection/data/mscoco_label_map.pbtxt'
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)

# Print sample outputs
print(category_index[1])
print(category_index[2])
print(category_index[4])

{'id': 1, 'name': 'person'}
{'id': 2, 'name': 'bicycle'}
{'id': 4, 'name': 'motorcycle'}


Next, let's process the segmentation masks and visualize the results:

1. **Mask Processing**: The result dictionary from the model includes a `detection_masks` key containing segmentation masks for each detected box. These need to be reformatted to match the full image size, ensuring they overlay correctly.

2. **Thresholding Masks**: We'll apply a threshold to the mask pixel values to refine the mask quality. The threshold is set at 0.6, but adjusting this value can influence the mask's precision. A lower threshold might include more pixels that don't belong to the object, thereby expanding the mask area inaccurately.

3. **Visualization**: For plotting the results, use the `visualize_boxes_and_labels_on_image_array()` function, similar to previous tasks. This time, include the `instance_masks` parameter with the adjusted masks to display the segmentation on the image alongside the detection boxes.

By following these steps, we ensure that the segmentation masks are accurately represented on the image, providing a clear visual confirmation of the model's performance in detecting and segmenting objects within the scene. Adjusting the threshold allows us to refine this visualization to best suit the specifics of the given image or the requirements of the task.

In [None]:
# Handle models with masks:
label_id_offset = 0
image_np_with_mask = image_np.copy()

if 'detection_masks' in result:

  # Convert np.arrays to tensors
  detection_masks = tf.convert_to_tensor(result['detection_masks'][0])
  detection_boxes = tf.convert_to_tensor(result['detection_boxes'][0])

  # Reframe the the bounding box mask to the image size.
  detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
            detection_masks, detection_boxes,
              image_np.shape[1], image_np.shape[2])

  # Filter mask pixel values that are above a specified threshold
  detection_masks_reframed = tf.cast(detection_masks_reframed > 0.6,
                                      tf.uint8)

  # Get the numpy array
  result['detection_masks_reframed'] = detection_masks_reframed.numpy()

# Overlay labeled boxes and segmentation masks on the image
viz_utils.visualize_boxes_and_labels_on_image_array(
      image_np_with_mask[0],
      result['detection_boxes'][0],
      (result['detection_classes'][0] + label_id_offset).astype(int),
      result['detection_scores'][0],
      category_index,
      use_normalized_coordinates=True,
      max_boxes_to_draw=100,
      min_score_thresh=.70,
      agnostic_mode=False,
      instance_masks=result.get('detection_masks_reframed', None),
      line_thickness=8)

plt.figure(figsize=(24,32))
plt.imshow(image_np_with_mask[0])
plt.show()

Output hidden; open in https://colab.research.google.com to view.