<a href="https://colab.research.google.com/github/tonyscan6003/CE6003/blob/master/Example_5_2_MaskRCNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Mask R-CNN

In this example, we're going to look at an open source example of [Mask R-CNN](https://arxiv.org/abs/1703.06870).

![Mask R-CNN from our video lectures](https://github.com/tonyscan6003/CE6003/blob/master/images/CE6003_RCNN.jpg?raw=true)

The implementation of Mask R-CNN we are going to use is one by [Matterport](https://matterport.com/), a company specialising in 3D capture.  Their implementation is based on TensorFlow 1.x using the Keras API, and is available on [Github](https://github.com/matterport/Mask_RCNN) 

The model generates bounding boxes and segmentation masks for each instance of an object in the image. It's based on Feature Pyramid Network (FPN) and a ResNet101 as its base classifier. As it is quite a large network, it is not feasible for us to train Mask-RCNN for a specific object detection task within the Colab environment - so our usage here is restricted to demonstrating its powerful inference capabilities.  However, the GitHub link above does include information on downloading and training Mask R-CNN, including a Jupyter Notebook tutorial on training.




First, we'll import TensorFlow 1.x. (Note: As Tensorflow 1.x is now the legacy version of tensorflow, do not spend time trying to understand tensorflow 1.x commands etc. It is better to study [examples](https://www.tensorflow.org/tutorials) written in Tensorflow 2.x as it more user friendly and integrated with Keras/python better)

In [None]:
%tensorflow_version 1.x
import tensorflow as tf
import keras


*To* ensure this demo runs as fast as possible, from the menu above select **Edit > Notebook settings or Runtime > Change runtime type** and select GPU as the Hardware accelerator option.

Let's test that we are running using the GPU.

In [None]:
tf.test.gpu_device_name()

**If** this outputs '', then we are running on CPU only. If it outputs something like '/device:GPU:0' then we are running on GPU. If you see something like ...

    Failed to assign a backend
    No backend with GPU available. WOuld you like to use a runtime with no accelerator?

This suggests that many other users have all the GPU resources on colab occupied at the moment, so perhaps try later or try using with the TPU instead.


Now, we'll continue to import the other Python libraries we need for this demo.

In [None]:
import os
import sys

import skimage.io
import random

import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

# Matterport implementation of Mask R-CNN

We'll clone the Mask R-CNN code directly from the Matterport GitHub repository. If we have run this step already, you'll see an error of the form `fatal: destination path 'Mask_RCNN' already exists and is not an empty directory` which you can safely ignore.

In [None]:
!git clone https://github.com/matterport/Mask_RCNN.git

In [None]:
# Root directory of the project
ROOT_DIR = os.path.abspath("./Mask_RCNN")

# Import Mask RCNN
sys.path.append(ROOT_DIR)  # To find local version of the library
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize

# Import COCO config - this is use to configure our model later on
sys.path.append(os.path.join(ROOT_DIR, "samples/coco/"))  # To find local version
import coco

# Pre-trained weights - "Here's one we made earlier..."

We can leverage the fact that the model was already pre-trained using the Microsoft COCO dataset and down load the pre-trained weights for the model.





In [None]:
# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "logs")

# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")

# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)

# Directory of images to run detection on
IMAGE_DIR = os.path.join(ROOT_DIR, "images")

We'll create a list of class names for COCO, that we can use later in the lab. We have 81 separate classes, and so 81 separate names.

In [None]:
# COCO Class names
# Index of the class in the list is its ID. For example, to get ID of
# the teddy bear class, use: class_names.index('teddy bear')
class_names = ['BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
               'bus', 'train', 'truck', 'boat', 'traffic light',
               'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird',
               'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear',
               'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',
               'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
               'kite', 'baseball bat', 'baseball glove', 'skateboard',
               'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
               'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
               'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
               'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
               'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
               'keyboard', 'cell phone', 'microwave', 'oven', 'toaster',
               'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
               'teddy bear', 'hair drier', 'toothbrush']
len(class_names)

Let's take a look at the size of our downloaded weights file:

In [None]:
!ls -lh /content/Mask_RCNN/mask_rcnn_coco.h5

That's quite a lot of weight parameters when you think about it!  

We need to set our batch size to 1 for inference, so we'll subclass the config structure used in the Mask-RCNN code as follows, and we'll output the configuration - which will help us understand what type of network we're dealing with...

In [None]:
class InferenceConfig(coco.CocoConfig):
    # Set batch size to 1 since we'll be running inference on
    # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1

config = InferenceConfig()
config.display()

Okay, we can see our base network ("BACKBONE") is *resnet101*, a 101-layer ResNet model. We have a mask shape ("MASK_SHAPE") of 28x28 pixels. We have a learning rate of 0.001 and a learning momentum of 0.9.

Our model has been trained with 81 classes ("NUM_CLASSES"), and we can see that we conveniently setup an array of class names earlier...

In [None]:
len(class_names)


# Model Creation, and Loading Weights

Now we need to create our model (using the Mask R-CNN code API) and then load the pre-trained weights into our model.

The Matterport code uses Keras, but the Keras version on Colab warns about some deprecated functions in TensorFlow 1.x that are changing name in TensorFlow 2.x.  As we don't care about this, we'll disable most of these warnings. You may still see some of the form `WARNING: ... is deprecated and will be removed in a future version` but you can safely ignore these.

In [None]:
# Disable warnings about deprecated TF 1.x functions vis-a-vis TF 2.x API
try:
    from tensorflow.python.util import module_wrapper as deprecation
except ImportError:
    from tensorflow.python.util import deprecation_wrapper as deprecation
deprecation._PER_MODULE_WARNING_LIMIT = 0


# Create model object in inference mode.
model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)

# Load weights trained on MS-COCO
model.load_weights(COCO_MODEL_PATH, by_name=True)

Let's take a quick peek at our model. It is hidden as a `keras_model` variable within our mrcnn `model`:

In [None]:
model.keras_model.summary()

# Performing Inference

At this stage, we can take any random image and pass it through our model for inference.

You can execute the following cell as many times as you like, and it will pull a random image from the images folder, and perform inference on it, displaying the output.

In [None]:
# Load a random image from the images folder
file_names = next(os.walk(IMAGE_DIR))[2]
print("Looking at %s" % (file_names))
image = skimage.io.imread(os.path.join(IMAGE_DIR, random.choice(file_names)))

# Run detection
results = model.detect([image], verbose=1)

# Visualize results
r = results[0]
visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'], class_names, r['scores'])

## Examing the results in more detail

The results dictionary, `r` is interesting. We can see it has a number of items in it, idenfified the by the keys `rois`, `class_ids`, `scores`, and `masks`.

In [None]:
r.keys()

We can easily see how many objects we identified and what they were...

In [None]:
numObjects = r['class_ids'].shape[0]

print("We identified %d objects:\n" % numObjects, end=" ")

count = 0
for class_id in r['class_ids']:
  if (count == 10):
    print("\n", end = " ")
    count = 0
  else:
    count += 1 
  print("%s" % class_names[class_id], end = " ")

The rois key is used to get the *region of interest* bounding boxes.

In [None]:
r['rois']

The bounding box format is `[topLeft_x, topLeft_y, bottomRight_x, bottomRight_y]`.

Now, let's take a look at the masks item in more detail.

In [None]:
mask = r['masks']
mask = mask.astype(int)
print("Our mask shape (x_width, y_height, num_classes) is ", mask.shape, " - it has the same width and height as the input image.")
print("Let's plot mask 0...")
plt.imshow(mask[:,:,0])

In [None]:

(tl_x, tl_y, br_x, br_y) = r['rois'][0]
print("\nLet's plot again, this time restricting to just the bounding box (%d, %d, %d, %d)..." % (tl_x, tl_y, br_x, br_y))

plt.imshow(mask[tl_x:br_x, tl_y:br_y, 0])

Masks is the same size as our input image, but note the number of channels (the third parameter above) -- this corresponds to the number of objects we detected.  In our masks, we have arrays of 0s and 1s which delineate the shape of each object.

The following cell will allow you to iterate through the shapes and example each one - set i to the index you want to look at...

In [None]:
i = 0; # set i to the individual shape you want to look at...

class_id = r['class_ids'][i]
print("There are %d shape%s in this image, we're going to look at shape %d." % (mask.shape[2], "s" if mask.shape[2] != 1 else "", i))
print("Shape %d is class_id %d, which has the label \"%s\"\n" % (i, class_id, class_names[class_id]))

temp = image.copy()
for j in range(temp.shape[2]):
  temp[:,:,j] = temp[:,:,j] * mask[:,:,i]
plt.figure(figsize=(8,8))
plt.imshow(temp)


# Try an example image from a web URL

You can also set the following variable `image_url` to point to a URL, and run the following cell to perform Mask R-CNN instance segmentation on any image. By default, we'll look at this image from a musical event from the Irish World Academy of Music and Dance ![Social Singing Event](https://www.ul.ie/sites/default/files/user_media/Social%20Singing%20Event.jpg)

In [None]:
import numpy as np
import cv2
import urllib
 
def url_to_image(url):
	resp = urllib.request.urlopen(url)
	temp_image = np.asarray(bytearray(resp.read()), dtype="uint8")
	temp_image = cv2.imdecode(temp_image, cv2.IMREAD_COLOR)
	temp_image = cv2.cvtColor(temp_image, cv2.COLOR_BGR2RGB) # OpenCV defaults to BGR, but we need RGB here..
	return temp_image

# read in test image
image_url = "https://www.ul.ie/sites/default/files/user_media/Social%20Singing%20Event.jpg"
image = url_to_image(image_url)

# Run detection
results = model.detect([image], verbose=1)

# Visualize results
r = results[0]
visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'], class_names, r['scores'])