# Detectron2 for Computer Vision Image Recognition

**Benedict Aryo**

We will try to utilize latest package by Facebook AI Research [FAIR](https://ai.facebook.com/) from the creator of Mask-RCNN https://arxiv.org/abs/1703.06870 which aim to be the next-generation platform for object detection and segmentation.

<br>

___

<img src="https://dl.fbaipublicfiles.com/detectron2/Detectron2-Logo-Horz.png" width="300">

Detectron2 is Facebook AI Research's next generation software system
that implements state-of-the-art object detection algorithms.
It is a ground-up rewrite of the previous version,
[Detectron](https://github.com/facebookresearch/Detectron/),
and it originates from [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark/). Detectron2 can be downloaded in: https://github.com/facebookresearch/detectron2

<div align="center">
  <img src="https://user-images.githubusercontent.com/1381301/66535560-d3422200-eace-11e9-9123-5535d469db19.png"/>
</div>

---
## Detectron2 Benchmark
Detectron2 having fastest training time compared with some other popular open source Mask R-CNN implementations. <br>
**Here's the bechmarck:**

<img src="https://github.com/BenedictusAryo/Mask-RCNN_Detectron2/raw/master/assets/detectron2_result.png" width="400">

___
## Detectron2 Model Zoo
provide a large set of baseline results and trained models available for download in the [Detectron2 Model Zoo](https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md).

Detectron2 Pretrained model architecture can be used to:
* Object Detection
* Instance Segmentation
* Panoptic Segmentation
* Person Keypoint Detection
* Semantic Segmentation (soon)

___

# Inference using Detectron2
We will try using Detectron2 pretrained model to test it's prediction output while learning about it's functionality.

See `detectron2` Documentation at: https://detectron2.readthedocs.io/

In [None]:
# Some basic setup:
# Setup detectron2 logger
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

# import some common libraries
import os
import numpy as np
import cv2
import random
import matplotlib.pyplot as plt

# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog

### Load Images to predict
We need to prepare the sample input image to run prediction. In this case we will use sample image from [Ms. COCO Dataset](https://cocodataset.org/#home)

In [None]:
# download sample image using wget
!wget http://images.cocodataset.org/val2017/000000439715.jpg -O input.jpg

In [None]:
# Load BGR test-image using opencv
img = cv2.imread("./input.jpg")

# Show the RGB image using matplotlib imshow
plt.imshow(img[...,::-1]);

<br>

## Mask R-CNN Instance Segmentation inference
There are many implementation of Instance Segmentation provided by `detectron2`. 

Read about detectron2 model zoo here: <br> https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md

For now, we will use the **`Mask R-CNN with ResNet-50 backbone and Feature Pyramid Network (FPN)`**

In [None]:
# Since the model we use is inside the 'COCO-InstanceSegmentation' directory, add to variable
mrcnn_model = "COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"

<br>

Then, we create a detectron2 config and a detectron2 `DefaultPredictor` to run inference on this image.

**Note:** that if this is the first time you use this model, it will download the model weights file

<br>

In [None]:
# Create Model Configuration object
cfg = get_cfg()

# Add model architecture config, which we set earlier
cfg.merge_from_file(model_zoo.get_config_file(mrcnn_model))

# Load the model weights that we set earlier
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(mrcnn_model)

# Set threshold for this model
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7  

# Create Predictor for inference based on configuration set
predictor = DefaultPredictor(cfg)

<br>

After setting the model configuration, Now we can simply use the predictor object to predict the Instance segmentation of input image

**Note:** The Predictor is always take BGR image as the input (raw opencv img), so if previously you visualize using matplotlib (by changing to RGB, don't forget to change back to BGR)

<br>

In [None]:
# Predict the Instance segmentation of input image using predictor
outputs = predictor(img)

In [None]:
# You can examine the output structure if you wish
outputs

<br>

The Output of prediction will always in the `dictionary` format. <br>
Also, since the model that we are using (**`Mask R-CNN`**) is in category **`Instance Segmentation`** so the output dictionary will have `instances` keys.

For detail output format specification, See: https://detectron2.readthedocs.io/tutorials/models.html#model-output-format

The output is inside the class attributes such as: 
* `.scores` for probability score
* `.pred_classes` for classification output
* `.pred_boxes` for bounding box detection
* `.pred_masks` for instance segmentation


And now, to Examine the inference result of this image we will print the class prediction and bounding box.

**Note:** Some output tensor is still in `cuda` format tensor because we do inference using GPU.<br> It is identified at the end of the tensor by **`device='cuda:0'`** notation. <br>Before visualizing, we need to convert those tensor back to cpu, using `.to('cpu')` or convert the tensor to **list** using `.tolist()` so it can be visualized.

<br>

In [None]:
# Print how many object detected
print('Object Detected: ', len(outputs['instances'].scores))

# Print the Classification class
print('Classification class: ', outputs['instances'].pred_classes.tolist())

# Print bounding box for each predicted object
print('Bounding Box: \n', outputs['instances'].pred_boxes)

<br>

Since the model is trained using `COCO Datasets`, it has **`80 Class Object`** such as `person`, `dog`, `cat`, `car`, etc. More information please see: https://cocodataset.org/#home

We can mapping the class number using `MetadataCatalog` which we already imported

<br>

In [None]:
# Create metadata namespace which refer to the Train Dataset in the configuration files
metadata = MetadataCatalog.get(cfg.DATASETS.TRAIN[0])

# Now we can check for example class `17` is what class
print('Class 17 is ', metadata.thing_classes[17])

# Now we can check for example class `0` is what class
print('Class 0 is ', metadata.thing_classes[0])

<br>

## Visualize the output prediction
Detectron2 also provide `Visualizer` module to draw the predictions on the image. the visualization is based on the `metadata` classes which we set previously.

**Things to note:** `Visualizer()` function input is in `RGB` data, you can reverse the color channels using `[:, :, ::-1]` or using `cvt.cvtColor()` function from opencv

In [None]:
# create viz object from `Visualizer`
vis = Visualizer(img_rgb=img[...,::-1], metadata=metadata, scale=0.6) # Scale is the text size

# Draw the prediction using `.draw_instance_prediction()` function
out = vis.draw_instance_predictions(outputs['instances'].to('cpu'))

# Show the image segmentation result
plt.figure(figsize=(7,10))
plt.imshow(out.get_image());

<br>

We can also save the output image using
```python
out.save('filename')
```

<br>

In [None]:
# Save the output result to disk
out.save('mrcnn_result.png')

# Check whether it's exported
!ls | grep mrcnn_result.png

<br>

## Other types of builtin models

What if you want to do another types of computer vision detection, for example `human keypoints detection` ?

How to do that in detectron2 ?<br>
Simple, you only need to change the model types

In [None]:
# Change model to Human Keypoints detection
keypoints_model = "COCO-Keypoints/keypoint_rcnn_R_50_FPN_3x.yaml"

<br>

The rests of the scrips is still the same as before

<br>

In [None]:
# Create model configuration
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file(keypoints_model))
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(keypoints_model)
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.9

# Predict the keypoints result
predictor = DefaultPredictor(cfg)
outputs = predictor(img)

# Print how many human keypoints detected
print("Person keypoints detected: ", len(outputs['instances'].scores))

# Get metadata for visualization
metadata = MetadataCatalog.get(cfg.DATASETS.TRAIN[0])

# Visualize the results
vis = Visualizer(img_rgb=img[...,::-1], metadata=metadata, scale=1)
out = vis.draw_instance_predictions(outputs['instances'].to('cpu'))
plt.figure(figsize=(8,10))
plt.imshow(out.get_image());

<br><br>

___

# Training Mask R-CNN on a custom dataset <br> (Optional if still have time)
This section is about using detectron2 to train Mask R-CNN with you own dataset. Detectron2 is providing simple API to train the models provided using our own image data.

**Note:** That Mask R-CNN using COCO Dataset format, read more about the differences: https://medium.com/towards-artificial-intelligence/understanding-coco-and-pascal-voc-annotations-for-object-detection-bb8ffbbb36e3 

For simplicity purposes, we will use [Fruits dataset](https://github.com/Tony607/detectron2_instance_segmentation_demo/releases/tag/V0.1) provided by [Chengwei Zhang](https://github.com/Tony607) which only has 18 images.

In [None]:
# download & decompress the data (uncomment for download)
# !wget https://github.com/Tony607/detectron2_instance_segmentation_demo/releases/download/V0.1/data.zip
!unzip -o data.zip 

<br>

After Successfully download the balloon dataset, we can see the folder structure of the dataset using linux command `tree`

<br>

In [None]:
!ls data
!tree data

<br>

Since our dataset is already in COCO Dataset Format, you can see in above file that there's **`.json`** format, for example `trainval.json` that holds all image annotations of **class, bounding box,** and **instance mask.**

So we can simply register the coco instances using  `register_coco_instances()` function from detectron2.

**Note:** If your dataset format is in `VOC Pascal` you ca use function `register_pascal_voc()` from detectron2.data.datasets.pascal_voc

<br>

Then, to register the fruits_nuts dataset to detectron2, we will following the [detectron2 custom dataset tutorial](https://detectron2.readthedocs.io/tutorials/datasets.html).


In [None]:
# Import `register_coco_instances()` function
from detectron2.data.datasets import register_coco_instances

# Register our coco dataset format by giving the directory folder
register_coco_instances("fruits_nuts", {}, "./data/trainval.json", "./data/images")

<br>

**Remember:** you can't register the same dataset twice, for example if you run above cell again, It will throw **`AssertionError: Dataset 'fruits_nuts' is already registered!`**. <br>
But don't worry if you want to clear those dataset, you can do it by:

```python
from detectron2.data.catalog import DatasetCatalog
DatasetCatalog.clear()
```

After the coco format of the dataset is registered, now we can create the dataset & metadata catalog

<br>

In [None]:
# Import `MetadataCatalog` and `DatasetCatalog`
from detectron2.data import MetadataCatalog, DatasetCatalog

# Create Metadata & Dataset Catalog
fruits_nuts_metadata = MetadataCatalog.get("fruits_nuts")
dataset_dicts = DatasetCatalog.get("fruits_nuts")

<br>

If success you will get notification like:

```python
Loaded ** images in COCO format from data/folder
```

Now, to verify the data loading is correct, let's visualize the annotations of randomly selected samples in the dataset:

In [None]:
from detectron2.utils.visualizer import Visualizer
import random


for d in random.sample(dataset_dicts, 3):
    img = cv2.imread(d["file_name"])
    visualizer = Visualizer(img[:, :, ::-1], 
                            metadata=fruits_nuts_metadata, 
                            scale=0.4)
    vis = visualizer.draw_dataset_dict(d)
    plt.imshow(vis.get_image()[:, :, ::-1])
    plt.show()
    

<br>

## Train the Model !

Now, let's fine tune a coco-pretrained R-50 FPN Mask R-CNN Model to our dataset.
<br>
We will use **`DefaultTrainer`** from `detectron2.engine` module

In [None]:
# Import DefaultTrainer from engine & config function
from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg

# Mask R-CNN Model used
mrcnn_model = "COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"

# Set configuration
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file(mrcnn_model))
cfg.DATASETS.TRAIN = ("fruits_nuts",)
cfg.DATASETS.TEST = ()   # no metrics implemented for this dataset
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(mrcnn_model) # initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.02
cfg.SOLVER.MAX_ITER = 300    # 300 iterations seems good enough, but you can certainly train longer
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # faster, and good enough for this toy dataset
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 3  # 3 classes (data, fig, hazelnut)

# Create output folder & Train
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()

<br>

## Inference & Evaluation using trained model
Now let's evaluate the training by inference it with the image, to do inference we must create a predictor using model that we just trained

In [None]:
# Load model weights
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, 'model_final.pth')
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set the testing threshold for this model
cfg.DATASETS.TEST = ("fruits_nuts", )

# Create Predictor for inference
predictor = DefaultPredictor(cfg)

<br>

Now we will use `predictor` to inference and visualize it's output

<br>

In [None]:
from detectron2.utils.visualizer import ColorMode

for d in random.sample(dataset_dicts, 3):    
    im = cv2.imread(d["file_name"])
    outputs = predictor(im)
    v = Visualizer(im[:, :, ::-1],
                   metadata=fruits_nuts_metadata, 
                   scale=0.4,
                   # remove the colors of unsegmented pixels
                   instance_mode=ColorMode.IMAGE_BW   )
    v = v.draw_instance_predictions(outputs["instances"].to("cpu"))
    plt.imshow(v.get_image()[:, :, ::-1])
    plt.show()

<br>

We can also evaluate it's performance using AP metric implemented in COCO API

<br>

In [None]:
from detectron2.evaluation import COCOEvaluator
evaluator = COCOEvaluator("fruits_nuts", cfg, False, output_dir="./output/")
trainer.test(cfg, trainer.model, evaluator)

<br>

### Where to go from here
You can test using different model or tuning the model parameter.

To get hands on with another dataset, you can try [the balloon segmentation dataset](https://github.com/matterport/Mask_RCNN/tree/master/samples/balloon)
which only has one class: balloon.

simple download it by

```python
# download and decompress the data
!wget https://github.com/matterport/Mask_RCNN/releases/download/v2.1/balloon_dataset.zip
!unzip balloon_dataset.zip
```

Or you can try some projects that using detectron2, for example [Point Rend](https://github.com/facebookresearch/detectron2/tree/master/projects/PointRend)

<img src="https://github.com/BenedictusAryo/Mask-RCNN_Detectron2/raw/master/assets/pointrend.png" width="500" align="center">

For another tutorial: https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5

