# Object Detection with Detectron 2 - PyTorch 🔥🔥

![Detectron2](https://miro.medium.com/max/4000/0*VbMjGBHMC6GnDKUp.png)

**In this notebook we'll be checking out the new object detection framework "Detectron2" within PyTorch. It allows us to quickly build object detection models.**

*Note: This notebook is a work in progress. I will keep on extending this notebook to fully explain and explore all the capabilities of the "Detectron2"- framework!*



## Table of contents
* 1.[Introduction](#intro)
* 2.[Installing dependencies and libraries](#import)
* 3.[Loading & using existing model](#loadmodel)
    * 3.1.[The base data](#basedata)
    * 3.2.[The model zoo](#modelzoo)
    * 3.3.[Inference with a pretrained model](#pretrainedinference)
* 4.[Train with custom data](#trainmodel)
    * 4.1.[Download the dataset](#downloaddataset)
    * 4.2.[Helper functions](#helperfunction)
    * 4.3.[Training](#traincustom)
    * 4.4.[Model evaluation](#modelevaluation)
* 5.[Other models](#othermodels)
    * 5.1.[Keypoint detection](#keypoint)
    * 5.2.[Panoptic segmentation](#panoptic)
    * 5.3.[Semantic, Densepose, ...](#semantic)
* 6.[Video](#video)
    * 6.1.[Libraries](#videolib)
    * 6.2.[The video](#thevideo)
    * 6.3.[Inferencing](#videoinference)

<a id=intro ></a>
# 1. Introduction
[Detectron2](https://ai.facebook.com/blog/-detectron2-a-pytorch-based-modular-object-detection-library-/) is Facebook AI Research's next generation software system that implements state-of-the-art object detection algorithms. It is a ground-up rewrite of the previous version, Detectron that started from maskrcnn-benchmark.

This platform is implemented in PyTorch. Thanks to its modular design its a very flexible and extensible framework providing fast training.

It includes implementations of state-of-the-art object detection algorithms such as:
* Box detection
* Mask detection
* KeyPoint detection
* Densepose detection
* Semantic segmentation
* Panoptic segmentation

In this notebook we'll have a look at several of these implementations and show how you can use custom datasets to train your own customer model.

*Note: This notebook is a work in progress. I will keep on extending this notebook to fully explore all the capabilities of the "Detectron2"- framework!*
*Future updates will consist in adding more high-end object detection algorithms and explaining more of the functions within the framework.*

<a id=import ></a>
# 2. Installing dependencies and libraries

We can use both CPU and GPU for training and inference of the models.

Running on CPU:

In [None]:
# !pip install -q -U torch torchvision -f https://download.pytorch.org/whl/torch_stable.html 
# !pip install -q -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
# !pip install -q detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/index.html

Running on GPU:

In [None]:
# Helper function, used these for debugging purposes
# detector2 build only succeeds if CUDA version is correct

#!nvidia-smi
#!nvcc --version

#import torch
#torch.__version__
#import torchvision
#torchvision.__version__

!pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.7/index.html
!pip install fastai

Underneed you'll find the extra libraries we'll use in this notebook. More libraries will be added througout the notebook when needed.

In [None]:
# Base setup:
# detectron2 logger
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

# common libraries
import numpy as np
import os, json, cv2, random
import matplotlib.pyplot as plt
%matplotlib inline

# detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog
from detectron2.structures import BoxMode

<a id="loadmodel" ></a>
# 3. Loading and using an existing model

In this chapter we'll have a look at a pretrained model and the base data it's trained on. Later on we'll do some inference with it.

<a id="basedata" ></a>
## 3.1. The base data

The model we'll be using is pretrained on the [COCO](https://cocodataset.org/#home) dataset (2017). This dataset contains a lot of labeled data people can use to train there Object detection models on. (Object, Keypoint, Panoptic, Instance and Densepose detectors) 

Let's take a look at a sample:

In [None]:
# path = Path('/kaggle/input/dsta-brainhack-2021/c1_release/c1_release');path.ls()
# imgs, lbl_bbox = get_annotations(path/'train.json')


!wget http://images.cocodataset.org/val2017/000000282037.jpg -q -O input.jpg
im = cv2.imread("./input.jpg")

plt.figure(figsize=(15,7.5))
plt.imshow(im[..., ::-1]) #bgr to rgb

<a id="modelzoo"></a>
## 3.2. The model zoo
Many pretrained models can be found back within the "[modelzoo](https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md)". This is a collection of models pretrained on a certain dataset that are ready to be used. Mostly people will use the pretrained weights of these model for initalization of there own custom model. This significantly shortens the training time and performance. And that's exactly what we'll be doing!

The model we'll be using can be found [here](https://github.com/facebookresearch/detectron2/blob/master/configs/COCO-Detection/retinanet_R_50_FPN_3x.yaml). 

How does it work?
As we can find [here](https://www.researchgate.net/figure/Our-Mask-R-CNN-framework-In-the-first-stage-we-use-Resnet50-Resnet101-and-Resnet_fig1_334011187):
> Region proposal network (RPN) utilizes feature maps at one of the intermediate layers (usually the last convolutional layer) of the CNN feature extractor networks to generate box proposals (300 boxes in our study). The proposed boxes are a grid of anchors tiled in different aspect ratios and scales. The second stage predicts the confidence value, the offsets for the proposed box and the mask within the box for each anchor.
Source publication

![Mask R-CNN Resnet](https://www.researchgate.net/profile/Hemin_Ali_Qadir/publication/334011187/figure/fig1/AS:774289183735808@1561616335154/Our-Mask-R-CNN-framework-In-the-first-stage-we-use-Resnet50-Resnet101-and-Resnet.ppm)

Alright, so now that we know how our model works lets test it out!

<a id="pretrainedinference"></a>
## 3.3. Inference with a pretrained model

In this first "coding" part are two important utils from Detector2. On the one hand we are using **cfg** or better [configs](https://detectron2.readthedocs.io/en/latest/tutorials/configs.html) which represents the complete configuration of a object detection model. These configurations are stored within a YAML-file and can be easily received from the modelzoo.

After the configuration is complete we'll use the [**DefaultPredictor**](https://detectron2.readthedocs.io/en/latest/tutorials/models.html?highlight=DefaultPredictor#use-a-model) class to make predictions.

In [None]:
cfg = get_cfg()

cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # set threshold for this model

cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
predictor = DefaultPredictor(cfg)
outputs = predictor(im[..., ::-1])

The output of the predictions is saved within the outputs variable. See [model output format](https://detectron2.readthedocs.io/tutorials/models.html#model-output-format) for all the available functions. 

In [None]:
print(outputs["instances"].pred_classes)
# print(outputs["instances"].pred_boxes)

In the output above we see an array which shows us the predictions made by the model. But what number stands for which class?

Every dataset is associated with [metadata](https://detectron2.readthedocs.io/en/latest/tutorials/datasets.html#metadata-for-datasets). It is a key-value mapping that contains information about the dataset. It can be used to further interpret the dataset. This information can later be used for data augmentation, evaluation, visualization, logging, ... .

In [None]:
MetadataCatalog.get(cfg.DATASETS.TRAIN[0])

For example you can find which objects it can recognize:

In [None]:
import pandas as pd
modelclasses = MetadataCatalog.get(cfg.DATASETS.TRAIN[0]).thing_classes
df = pd.DataFrame(modelclasses,columns=['Model classes'])
df

Now that we know each label through the metadata. Let's visualize the result of the pretrained model from the modelzoo. For this we'll use the [**Visualizer**](https://detectron2.readthedocs.io/en/latest/modules/utils.html#module-detectron2.utils.visualizer) class.

In [None]:
v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2)
out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
plt.figure(figsize=(20,10))
plt.imshow(out.get_image()[..., ::-1][..., ::-1])

<a id=trainmodel > </a>
# 4. Train on a custom dataset

Alright so it's pretty easy to run an existing model. Let's now train the model with our own data!

<a id=downloaddataset > </a>
## 4.1. Download the dataset

Without data we are nothing. So let's download our dataset!

In [None]:
# !wget https://github.com/matterport/Mask_RCNN/releases/download/v2.1/balloon_dataset.zip
# !unzip balloon_dataset.zip > /dev/null


<a id=helperfunction > </a>
## 4.2. Helper functions

Let's create some helper functions.

The **get_balloon_dicts()** will convert our data to the correct format. The [**BoxMode**](https://detectron2.readthedocs.io/en/latest/modules/structures.html?highlight=BoxMode#detectron2.structures.BoxMode) can be used to get the structure.

Afterwards we'll have to add or register our new dataset with [**DatasetCatalog.register()**](https://detectron2.readthedocs.io/en/latest/modules/data.html?highlight=DatasetCatalog#detectron2.data.DatasetCatalog). Ofcourse don't forget to add your metadata with [**MetadataCatalog.get()**](https://detectron2.readthedocs.io/en/latest/modules/data.html?highlight=MetadataCatalog#detectron2.data.MetadataCatalog).

In [None]:
training_path = "/kaggle/input/dsta-brainhack-2021/c1_release/c1_release"
train_annotation = os.path.join(training_path, "train.json")
val_annotation = os.path.join(training_path, "val.json")
image_path = os.path.join(training_path,"images")



from detectron2.structures import BoxMode
# if your dataset is in COCO format, this cell can be replaced by the following three lines:
from detectron2.data.datasets import register_coco_instances
register_coco_instances("train_data", {}, train_annotation, image_path)
register_coco_instances("val_data", {}, val_annotation, image_path)



# def get_balloon_dicts(img_dir):
#     json_file = os.path.join(img_dir, "via_region_data.json")
#     with open(json_file) as f:
#         imgs_anns = json.load(f)

#     dataset_dicts = []
#     for idx, v in enumerate(imgs_anns.values()):
#         record = {}
        
#         filename = os.path.join(img_dir, v["filename"])
#         height, width = cv2.imread(filename).shape[:2]
        
#         record["file_name"] = filename
#         record["image_id"] = idx
#         record["height"] = height
#         record["width"] = width
      
#         annos = v["regions"]
#         objs = []
#         for _, anno in annos.items():
#             assert not anno["region_attributes"]
#             anno = anno["shape_attributes"]
#             px = anno["all_points_x"]
#             py = anno["all_points_y"]
#             poly = [(x + 0.5, y + 0.5) for x, y in zip(px, py)]
#             poly = [p for x in poly for p in x]

#             obj = {
#                 "bbox": [np.min(px), np.min(py), np.max(px), np.max(py)],
#                 "bbox_mode": BoxMode.XYXY_ABS,
#                 "segmentation": [poly],
#                 "category_id": 0,
#             }
#             objs.append(obj)
#         record["annotations"] = objs
#         dataset_dicts.append(record)
#     return dataset_dicts

# for d in ["train", "val"]:
#     DatasetCatalog.register("balloon_" + d, lambda d=d: get_balloon_dicts("balloon/" + d))
#     MetadataCatalog.get("balloon_" + d).set(thing_classes=["balloon"])
# balloon_metadata = MetadataCatalog.get("balloon_train")

<a id=traincustom> </a>
## 4.3. Training with a custom dataset
Let's first check our training data! Ofcourse we'll use the **Visualizer** class again.

In [None]:
# dataset_dicts = get_balloon_dicts("balloon/train")
# for d in random.sample(dataset_dicts, 3):
#     img = cv2.imread(d["file_name"])
#     visualizer = Visualizer(img[:, :, ::-1], metadata=balloon_metadata, scale=0.5)
#     out = visualizer.draw_dataset_dict(d)
#     plt.figure(figsize=(15,7))
#     plt.imshow(out.get_image()[:, :, ::-1][..., ::-1])


#visualize training data
my_dataset_train_metadata = MetadataCatalog.get("train_data")
dataset_dicts = DatasetCatalog.get("train_data")

my_dataset_val_metadata = MetadataCatalog.get("val_data")
val_dicts = DatasetCatalog.get("val_data")

import random
from detectron2.utils.visualizer import Visualizer
import cv2
import matplotlib.pyplot as plt

for d in random.sample(dataset_dicts, 3):
    img = cv2.imread(d["file_name"])
    visualizer = Visualizer(img[:, :, ::-1], metadata=my_dataset_train_metadata, scale=0.5)
    vis = visualizer.draw_dataset_dict(d)
    plt.imshow(vis.get_image()[:, :, ::-1])

In [None]:
# # DATA AUG

# from detectron2.data import transforms as T
# # Define a sequence of augmentations:
# augs = T.AugmentationList([
#     T.RandomBrightness(0.9, 1.1),
#     T.RandomFlip(prob=0.5),
#     T.RandomCrop("absolute", (640, 640))
# ])  # type: T.Augmentation

# # Define the augmentation input ("image" required, others optional):
# input = T.AugInput(image, boxes=boxes, sem_seg=sem_seg)

# # Apply the augmentation:
# transform = augs(input)  # type: T.Transform
# image_transformed = input.image  # new image
# sem_seg_transformed = input.sem_seg  # new semantic segmentation

# # For any extra data that needs to be augmented together, use transform, e.g.:
# image2_transformed = transform.apply_image(image2)
# polygons_transformed = transform.apply_polygons(polygons)

Since the our new data looks good. Let's now train our model!

As initial weights we'll use the pretrained weights from a model from the modelzoo. After finishing setting up the config we'll use the [**DefaultTrainer()**](https://detectron2.readthedocs.io/en/latest/modules/utils.html#module-detectron2.utils.visualizer) class to train our model!

More information about updating the config-file can be found [here](https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets).

In [None]:
# # Run training

from detectron2.engine import DefaultTrainer
from detectron2.evaluation import COCOEvaluator


class CocoTrainer(DefaultTrainer):
    @classmethod
    def build_evaluator(cls, cfg, dataset_name, output_folder=None):
        if output_folder is None:
            os.makedirs("coco_eval", exist_ok=True)
            output_folder = "coco_eval"
        return COCOEvaluator(dataset_name, cfg, False, output_folder)
    


In [None]:
from detectron2.config.config import CfgNode as CN

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("train_data",)
cfg.DATASETS.TEST = ("val_data",)
cfg.DATALOADER.NUM_WORKERS = 4
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml")  # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 4
cfg.SOLVER.BASE_LR = 0.001
cfg.SOLVER.WARMUP_ITERS = 1000
cfg.SOLVER.MAX_ITER = 30000 #adjust up if val mAP is still rising, adjust down if overfit
# cfg.SOLVER.STEPS = [0,20000,40000]
cfg.SOLVER.GAMMA = 0.05
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 64
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 5
# cfg.TEST.EVAL_PERIOD = 1000


os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = CocoTrainer(cfg) 
# trainer.resume_or_load(resume=False)
trainer.resume_or_load(resume=False)

trainer.train()

In [None]:
# # Checkpoiints
# from detectron2.checkpoint import DetectionCheckpointer

# checkpointer = DetectionCheckpointer(trainer.model, save_dir=cfg.OUTPUT_DIR)
# cfg.MODEL.WEIGHTS = os.path.join("/kaggle/input/objectron-retinanetv1/model_0034999.pth")  # path to the model we trained
# trainer.resume_or_load(resume=True)
# trainer.train()

# cfg = get_cfg()
# cfg.merge_from_file(model_zoo.get_config_file('COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml'))
# cfg.MODEL.WEIGHTS = os.path.join("/kaggle/input/objectron-retinanetv1/model_0034999.pth")  # path to the model we trained

In [None]:
# cfg = get_cfg()
# cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/retinanet_R_50_FPN_1x.yaml"))
# cfg.DATASETS.TRAIN = ("train_data",)
# cfg.DATASETS.TEST = ("val_data",)
# cfg.DATALOADER.NUM_WORKERS = 4
# cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/retinanet_R_50_FPN_1x.yaml")  # Let training initialize from model zoo
# cfg.SOLVER.IMS_PER_BATCH = 4
# cfg.SOLVER.BASE_LR = 0.00025  # pick a good LR
# cfg.SOLVER.MAX_ITER = 300    # 300 iterations enough for this dataset; Train longer for a practical dataset
# cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # faster, enough for this dataset (default: 512)
# # cfg.MODEL.ROI_HEADS.NUM_CLASSES = 5  # classes for RCNN
# cfg.MODEL.RETINANET.NUM_CLASSES = 5 # Classes for Retina
# cfg.TEST.EVAL_PERIOD = 500


# os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
# trainer = DefaultTrainer(cfg) 
# trainer.resume_or_load(resume=False)
# trainer.train()

Tensorboard usable in kaggle?

In [None]:
# Look at training curves in tensorboard:
%load_ext tensorboard
%tensorboard --logdir logs --bind_all

*Note: I suppose TensorBoard doesn't work on Kaggle?*

<a id="modelevaluation" ></a>
## 4.4. Model evaluation
Let's check out the performance of our model!

First of all let's make some predictions! We're going to use the [**DefaultPredictor**](https://detectron2.readthedocs.io/en/latest/modules/engine.html?highlight=DefaultPredictor#detectron2.engine.defaults.DefaultPredictor) class. Ofcourse we'll use the same cfg that we used during training. We'll change two parameters for our inferencing.

In [None]:
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")  # path to the model we trained

In [None]:
# model saved weights
# cfg.MODEL.WEIGHTS = os.path.join("/kaggle/input/objectron-retinanetv1/model_final.pth")

cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set a testing threshold
predictor = DefaultPredictor(cfg)

Notice that by using the [**ColorMode.IMAGE_BW**](https://detectron2.readthedocs.io/en/latest/modules/utils.html?highlight=ColorMode#module-detectron2.utils.visualizer) we we're capable of removing the colors from objects which aren't detected!

In [None]:
from detectron2.utils.visualizer import ColorMode
val_dict = DatasetCatalog.get("val_data")

for d in random.sample(val_dict, 3):    
    im = cv2.imread(d["file_name"])
    outputs = predictor(im) 
    v = Visualizer(im[:, :, ::-1],
                   metadata=my_dataset_train_metadata, 
                   scale=0.5, 
                   instance_mode=ColorMode.IMAGE_BW   # remove the colors of unsegmented pixels. Only available for segmentation models
    )
    out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
    plt.figure(figsize=(15,7))
    plt.imshow(out.get_image()[:, :, ::-1][..., ::-1])

Above we can see that our models performs pretty well! Let's now evaluate our custom model with [Evaluators](https://detectron2.readthedocs.io/en/latest/modules/engine.html?highlight=DefaultPredictor#detectron2.engine.defaults.DefaultPredictor). Two evaluators can be used:
* [**COCOEvaluator**](https://detectron2.readthedocs.io/en/latest/modules/evaluation.html#detectron2.evaluation.COCOEvaluator) can evaluate AP (Average Precision) for box detection, instance segmentation and keypoint detection.
* [**SemSegEvaluator**](https://detectron2.readthedocs.io/en/latest/modules/evaluation.html#detectron2.evaluation.SemSegEvaluator) can evaluate semantic segmentation metrics.

Afterwards we'll use the [**build_detection_test_loader**](https://detectron2.readthedocs.io/en/latest/modules/data.html?highlight=build_detection_test_loader#detectron2.data.build_detection_test_loader) which returns a torch DataLoader, that loads the given detection dataset.

At last we'll use the model, evaluated and dataloader within the [inference_on_dataset](https://detectron2.readthedocs.io/en/latest/modules/evaluation.html#detectron2.evaluation.inference_on_dataset) function. It runs the model on the dataloader and evaluates the metric with the evaluator.

In [None]:
from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from detectron2.data import build_detection_test_loader
from detectron2.modeling import build_model


evaluator = COCOEvaluator("val_data", None, False, output_dir="./output/")
# evaluator = COCOEvaluator("val_data", ("bbox", "segm"), False, output_dir="./output/")

# Loading model
model_uploaded = build_model(cfg)

val_loader = build_detection_test_loader(cfg, "val_data")
# print(inference_on_dataset(trainer.model, val_loader, evaluator))
print(inference_on_dataset(model_uploaded, val_loader, evaluator))

## Test

In [None]:
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
# cfg.MODEL.WEIGHTS = os.path.join("/kaggle/input/objectron-retinanetv1/model_final.pth")


test_img_path = "/kaggle/input/dsta-brainhack-2021/c1_test_release/c1_test_release/images" # extracted testing images path
cfg.DATASETS.TEST = ("my_dataset_test", )
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set the testing threshold for this model
predictor = DefaultPredictor(cfg)
test_metadata = MetadataCatalog.get("my_dataset_test")

from detectron2.utils.visualizer import ColorMode
import glob

ou_test = []
for imageName in glob.glob('/kaggle/input/dsta-brainhack-2021/c1_test_release/c1_test_release/images/*jpg'):
  im = cv2.imread(imageName)
  outputs = predictor(im)
  ou_test.append(outputs)
  v = Visualizer(im[:, :, ::-1],
                metadata=test_metadata, 
                scale=0.8
                 )
  out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
  plt.imshow(out.get_image()[:, :, ::-1])

In [None]:
ou_test[0]['instances']

In [None]:
# generate detections on the folder of test images (this will be used for submission)
from PIL import Image, ImageDraw
from torchvision import transforms
from torchvision.ops import batched_nms
from torchvision.transforms import functional as F
import torch

detections = []

for imageName in glob.glob('/kaggle/input/dsta-brainhack-2021/c1_test_release/c1_test_release/images/*jpg'):
        

        im = cv2.imread(imageName)
        outputs = predictor(im)
        classes = outputs["instances"].pred_classes.tolist()
        box_round = outputs["instances"].pred_boxes.tensor.tolist()
        score_output = outputs["instances"].scores.tolist()
        head, tail = os.path.split(imageName)
        img_id = int(tail.split('.')[0])

        for i in range(len(box_round)):


            x1, y1, x2, y2 = box_round[i]
            label = int(classes[i])
            score = float(score_output[i])

            left = int(x1)
            top = int(y1)
            width = int(x2 - x1)
            height = int(y2 - y1)

            detections.append({'image_id':img_id, 'category_id':label, 'bbox':[left, top, width, height], 'score':score})

test_pred_json = os.path.join("/kaggle/working", "test_preds.json")
with open(test_pred_json, 'w') as f:
    json.dump(detections, f)

In [None]:
# Check 
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval

sample_json_path = os.path.join("/kaggle/input/dsta-brainhack-2021/c1_test_release/c1_test_release", "c1_test_sample.json")

coco_gt = COCO(sample_json_path)
coco_dt = coco_gt.loadRes(test_pred_json)
cocoEval = COCOeval(cocoGt=coco_gt, cocoDt=coco_dt, iouType='bbox')
cocoEval.evaluate()
cocoEval.accumulate()
cocoEval.summarize()

<a id="othermodels" ></a>
# 5. Other models

It's possible to use other high-end object detection models aswell. Let's check it out!

<a id="keypoint" ></a>
## 5.1. Keypoint detection

Reload the data.

In [None]:
# !wget http://images.cocodataset.org/val2017/000000282037.jpg -q -O input.jpg
# im = cv2.imread("./input.jpg")

In [None]:
# cfg = get_cfg()   # fresh config
# cfg.merge_from_file(model_zoo.get_config_file("COCO-Keypoints/keypoint_rcnn_R_50_FPN_3x.yaml"))
# cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7  # set threshold for this model
# cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Keypoints/keypoint_rcnn_R_50_FPN_3x.yaml")
# predictor = DefaultPredictor(cfg)
# outputs = predictor(im)
# v = Visualizer(im[:,:,::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2)
# out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
# plt.figure(figsize=(15,7))
# plt.imshow(out.get_image()[:, :, ::-1][..., ::-1])

<a id="panoptic" ></a>
## 5.2. Panoptic segmentation

In [None]:
# !wget http://images.cocodataset.org/val2017/000000282037.jpg -q -O input.jpg
# im = cv2.imread("./input.jpg")

In [None]:
# cfg = get_cfg()
# cfg.merge_from_file(model_zoo.get_config_file("COCO-PanopticSegmentation/panoptic_fpn_R_101_3x.yaml"))
# cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-PanopticSegmentation/panoptic_fpn_R_101_3x.yaml")
# predictor = DefaultPredictor(cfg)
# panoptic_seg, segments_info = predictor(im)["panoptic_seg"]
# v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2)
# out = v.draw_panoptic_seg_predictions(panoptic_seg.to("cpu"), segments_info)
# plt.figure(figsize=(25,15))
# plt.imshow(out.get_image()[:, :, ::-1][..., ::-1])

<a id="semantic" ></a>
## 5.3. Semantic, Densepose, ...

Will be added in a future version! Stay tuned!

<a id="video" ></a>
# 6. Video

So up until now we've been working with images only. Can we quickly use the models for videos? The answer is YES!

<a id="videolib" ></a>
## 6.1. Libraries
As you can see we actually don't need many other libraries. Lets import a library to handle the video.

In [None]:
# from IPython.display import YouTubeVideo, display, Video # for viewing the video
# !pip install youtube-dl # for downloading the video

<a id="thevideo" ></a>
## 6.2. The video

In [None]:
# #video = YouTubeVideo("ll8TgCZ0plk", width=500)#7HaJArMDKgI
# video = YouTubeVideo("7HaJArMDKgI", width=750, height= 450)#
# display(video)

Downloading the video and cropping 6 seconds for processing


In [None]:
# !youtube-dl https://www.youtube.com/watch?v=7HaJArMDKgI -f 22 -o video.mp4
# !ffmpeg -i video.mp4 -t 00:00:10 -c:v copy video-clip.mp4 

<a id="videoinference" ></a>
## 6.3. Inference on the video
Let's now run an panoptic model over the video above.

*note: For now I'll be using some [demo](https://github.com/facebookresearch/detectron2/tree/master/demo) files, I'll later add the code implementations to this notebook.*

In [None]:
# !git clone https://github.com/facebookresearch/detectron2
# !python detectron2/demo/demo.py --config-file detectron2/configs/COCO-PanopticSegmentation/panoptic_fpn_R_101_3x.yaml --video-input video-clip.mp4 --confidence-threshold 0.6 --output 1video-output.mkv \
#   --opts MODEL.WEIGHTS detectron2://COCO-PanopticSegmentation/panoptic_fpn_R_101_3x/139514519/model_final_cafdb1.pkl

Let's check the result! 

*I've ran into some trouble with video encoding opencv and ffmpeg (fix in future version of this notebook).*

In [None]:
# !git clone https://github.com/vandeveldemaarten/tempdetector2video.git
# Video("./tempdetector2video/myvideo.mkv")

# That's all for now!

Thank you for reading this notebook! If you enjoyed it, please upvote!

*More coming soon!*