# Detectron2: Inference on an Image

<img src="https://dl.fbaipublicfiles.com/detectron2/Detectron2-Logo-Horz.png" width="500">

In this Lab session, we will run a pre-trained [Detectron2](https://github.com/facebookresearch/detectron2) model on an:
* Image
* Video
* Custom dataset.

Detectron2 is an open-source project by Facebook AI Research (FAIR).

You can use Detectron2 for state-of-the-art object detection, segmentation and person keypoint detection tasks.

There are numerous models in the [Model Zoo](https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md)


# Step 1: Installs & Imports

## Install Dependencies

In [None]:
# Install dependencies: 
!pip install pyyaml==5.1 pycocotools>=2.0.1
import torch, torchvision
print(torch.__version__, torch.cuda.is_available())
!gcc --version
# opencv is pre-installed on colab

In [None]:
# install detectron2: (Colab has CUDA 11.1 + torch 1.10)
# See https://detectron2.readthedocs.io/tutorials/install.html for instructions
import torch
assert torch.__version__.startswith("1.10")
!pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.10/index.html # This url needs to be adapted to your torch/CUDA version
exit(0)  # After installation, you need to "restart runtime" in Colab. This line can also restart runtime

## Import Libraries

In [None]:
# Some basic setup:
# Setup detectron2 logger
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

# import some common libraries
import numpy as np
import os, json, cv2, random
from google.colab.patches import cv2_imshow

# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog

## Where does the code come from?

Let's have a look and see where this code is coming from.

Open the Detectron Repo in GitHub:


*   GitHub: https://github.com/facebookresearch/detectron2




# Step 2: Load an Image

Let's grab an image from the COCO dataset and use this code to display it. 

*   https://cocodataset.org/#home

In [None]:
# Bike - https://farm1.staticflickr.com/103/300626851_2ef81f255a_z.jpg
# Living Room - https://farm5.staticflickr.com/4017/4445210526_45c53f6dc2_z.jpg
# Tennis Player - https://farm4.staticflickr.com/3334/3593807246_67b87f30b5_z.jpg

!wget https://farm5.staticflickr.com/4017/4445210526_45c53f6dc2_z.jpg -q -O input.jpg

# Step 3: Visualize the Image

In [None]:
# Show an image with OpenCV
im = cv2.imread("./input.jpg")
cv2_imshow(im)

# Step 4: Define the Model

Now create a detectron2 config and a detectron2 `DefaultPredictor` to run inference on this image.

In [None]:
cfg = get_cfg()
# add project-specific config (e.g., TensorMask) here if you're not running a model in detectron2's core library
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # set threshold for this model
# Find a model from detectron2's model zoo. You can use the https://dl.fbaipublicfiles... url as well
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")


# Step 5: Inference on the Image

In [None]:
predictor = DefaultPredictor(cfg)
outputs = predictor(im)

In [None]:
# look at the outputs. See https://detectron2.readthedocs.io/tutorials/models.html#model-output-format for specification
print(outputs["instances"].pred_classes)
print(outputs["instances"].pred_boxes)

# Step 6: Visualize the Output Image

In [None]:
# We can use `Visualizer` to draw the predictions on the image.
v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2)
out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
cv2_imshow(out.get_image()[:, :, ::-1])

# Step 7: Inference on a Video

## Display the Video

In [None]:
# This is the video we're going to process
from IPython.display import YouTubeVideo, display
video = YouTubeVideo("ehtsmxu1w10", width=500)
display(video)

## Install Additional Dependencies

In [None]:
# Install dependencies, download the video, and crop 5 seconds for processing
!pip install youtube-dl
# !pip uninstall -y opencv-python-headless opencv-contrib-python
# !apt install python3-opencv  # the one pre-installed have some issues
!youtube-dl https://www.youtube.com/watch?v=ehtsmxu1w10 -f 22 -o video.mp4
!ffmpeg -i video.mp4 -t 00:00:06 -c:v copy video-clip.mp4

## Run frame-by-frame Inference

In [None]:
# Run frame-by-frame inference demo on this video (takes 3-4 minutes) with the "demo.py" tool we provided in the repo.
!git clone https://github.com/facebookresearch/detectron2
!python detectron2/demo/demo.py --config-file detectron2/configs/COCO-PanopticSegmentation/panoptic_fpn_R_101_3x.yaml --video-input video-clip.mp4 --confidence-threshold 0.6 --output video-output.mkv \
  --opts MODEL.WEIGHTS detectron2://COCO-PanopticSegmentation/panoptic_fpn_R_101_3x/139514519/model_final_cafdb1.pkl

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Download the Resultant Video

In [None]:
# Download the results
from google.colab import files
files.download('video-output.mkv')

## Watch the Video

[Code Credit](https://stackoverflow.com/questions/57377185/how-play-mp4-video-in-google-colab)

In [None]:
from IPython.display import HTML
from base64 import b64encode
import os

# Input video path
save_path = 'video-output.mkv'

# Compressed video path
compressed_path = 'video-output-compressed.mp4'

os.system(f'ffmpeg -i {save_path} -vcodec libx264 {compressed_path}')

# Show video
mp4 = open(compressed_path,'rb').read()
data_url = 'data:video/mp4;base64,' + b64encode(mp4).decode()
HTML('''
<video width=400 controls>
      <source src="%s" type="video/mp4">
</video>
''' % data_url)

# Step 8: Load the Custom Dataset


## Download the Zip

In [None]:
# download, decompress the data
!wget https://github.com/matterport/Mask_RCNN/releases/download/v2.1/balloon_dataset.zip
!unzip balloon_dataset.zip

## Register the Custom Dataset

In [None]:
# if your dataset is in COCO format, this cell can be replaced by the following three lines:
# from detectron2.data.datasets import register_coco_instances
# register_coco_instances("my_dataset_train", {}, "json_annotation_train.json", "path/to/image/dir")
# register_coco_instances("my_dataset_val", {}, "json_annotation_val.json", "path/to/image/dir")

from detectron2.structures import BoxMode

def get_balloon_dicts(img_dir):
    json_file = os.path.join(img_dir, "via_region_data.json")
    with open(json_file) as f:
        imgs_anns = json.load(f)

    dataset_dicts = []
    for idx, v in enumerate(imgs_anns.values()):
        record = {}
        
        filename = os.path.join(img_dir, v["filename"])
        height, width = cv2.imread(filename).shape[:2]
        
        record["file_name"] = filename
        record["image_id"] = idx
        record["height"] = height
        record["width"] = width
      
        annos = v["regions"]
        objs = []
        for _, anno in annos.items():
            assert not anno["region_attributes"]
            anno = anno["shape_attributes"]
            px = anno["all_points_x"]
            py = anno["all_points_y"]
            poly = [(x + 0.5, y + 0.5) for x, y in zip(px, py)]
            poly = [p for x in poly for p in x]

            obj = {
                "bbox": [np.min(px), np.min(py), np.max(px), np.max(py)],
                "bbox_mode": BoxMode.XYXY_ABS,
                "segmentation": [poly],
                "category_id": 0,
            }
            objs.append(obj)
        record["annotations"] = objs
        dataset_dicts.append(record)
    return dataset_dicts

for d in ["train", "val"]:
    DatasetCatalog.register("balloon_" + d, lambda d=d: get_balloon_dicts("balloon/" + d))
    MetadataCatalog.get("balloon_" + d).set(thing_classes=["balloon"])
balloon_metadata = MetadataCatalog.get("balloon_train")

# Step 9: Visualize the Custom Data

To verify the data loading is correct, let's visualize the annotations of randomly selected samples in the training set:



In [None]:
dataset_dicts = get_balloon_dicts("balloon/train")
for d in random.sample(dataset_dicts, 3):
    img = cv2.imread(d["file_name"])
    visualizer = Visualizer(img[:, :, ::-1], metadata=balloon_metadata, scale=0.5)
    out = visualizer.draw_dataset_dict(d)
    cv2_imshow(out.get_image()[:, :, ::-1])

# Step 10: Fine-tune the Model

Now, let's fine-tune a COCO-pretrained R50-FPN Mask R-CNN model on the balloon dataset. It takes ~6 minutes to train 300 iterations on Colab's K80 GPU, or ~2 minutes on a P100 GPU.


In [None]:
from detectron2.engine import DefaultTrainer

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("balloon_train",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")  # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025  # pick a good LR
cfg.SOLVER.MAX_ITER = 300    # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # faster, and good enough for this toy dataset (default: 512)
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class (ballon). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)

# Step 11: Train the Model

In [None]:
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg) 
trainer.resume_or_load(resume=False)
trainer.train()

# Step 12: View Results on Tensorboard


In [None]:
# Look at training curves in tensorboard:
%load_ext tensorboard
%tensorboard --logdir output

# Step 13: Inference on the Custom Dataset
Now, let's run inference with the trained model on the balloon validation dataset. First, let's create a predictor using the model we just trained:



In [None]:
# cfg already contains everything we've set previously. Now we changed it a little bit for inference:
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")  # path to the model we just trained
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set a custom testing threshold
predictor = DefaultPredictor(cfg)

# Step 14: Visualize the Output

Then, we randomly select several samples to visualize the prediction results.

In [None]:
from detectron2.utils.visualizer import ColorMode
dataset_dicts = get_balloon_dicts("balloon/val")
for d in random.sample(dataset_dicts, 3):    
    im = cv2.imread(d["file_name"])
    outputs = predictor(im)  # format is documented at https://detectron2.readthedocs.io/tutorials/models.html#model-output-format
    v = Visualizer(im[:, :, ::-1],
                   metadata=balloon_metadata, 
                   scale=0.5, 
                   instance_mode=ColorMode.IMAGE_BW   # remove the colors of unsegmented pixels. This option is only available for segmentation models
    )
    out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
    cv2_imshow(out.get_image()[:, :, ::-1])

# Step 15: Metrics

We can also evaluate its performance using AP metric implemented in COCO API.
This gives an AP of ~70. Not bad!

In [None]:
from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from detectron2.data import build_detection_test_loader
evaluator = COCOEvaluator("balloon_val", cfg, False, output_dir="./output/")
val_loader = build_detection_test_loader(cfg, "balloon_val")
print(inference_on_dataset(trainer.model, val_loader, evaluator))
# another equivalent way to evaluate the model is to use `trainer.test`

# Further Reading

Check out the links below to learn more about the models in Detectron.

*   This code has been modified from [Detectron2](https://github.com/facebookresearch/detectron2)

*   [Detectron2 Docs - Models](https://detectron2.readthedocs.io/tutorials/models.html)

*  [Detectron2 Docs - Datasets](https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)

*   [Digging into Detectron 2 ](https://medium.com/@hirotoschwert/digging-into-detectron-2-47b2e794fabd)

*  [Mask R-CNN for Object Detection and Segmentation](https://github.com/matterport/Mask_RCNN)

*  [Semantic Segmentation TorchVision](https://pytorch.org/docs/stable/torchvision/models.html#semantic-segmentation)

*  [Matterport Balloon Color Splash](https://github.com/matterport/Mask_RCNN/tree/master/samples/balloon)

*  [Image Credit in Video](https://medium.com/onepanel/instance-segmentation-with-mask-r-cnn-and-tensorflow-on-onepanel-6a072a4273dd)

*  [Average Precision](https://jonathan-hui.medium.com/map-mean-average-precision-for-object-detection-45c121a31173#:~:text=IoU%20measures%20the%20overlap%20between,positive%20or%20a%20false%20positive)

