# Ensemble of Object Detection Models based on Detectron2

## Prepare Development Environment
1. Environments
    * GPU: RTX3070 8GB
    * OS: Ubuntu20.04
    * CUDA: 11.3
    * Pytorch==1.10
    * Detectron2==0.6

2. Installation
    ```bash
    sudo apt-get install -y python3-dev python3-venv
    python3 -m venv env
    source env/bin/activate
    python -m pip install pip -U
    python -m pip install -r requirements.txt
    python -m ipykernel install --user --name env --display-name ensemble_detectron2
    python -m pip install "git+https://github.com/facebookresearch/detectron2@v0.6"
    ```

## Download Dataset

In [7]:
!wget -c http://images.cocodataset.org/zips/val2017.zip
!unzip val2017.zip
!rm val2017.zip
!wget -c http://images.cocodataset.org/annotations/annotations_trainval2017.zip
!unzip annotations_trainval2017.zip
!rm annotations_trainval2017.zip

## Import Modules

In [2]:
import torch
from tqdm import tqdm

from detectron2.data.datasets import register_coco_instances
from detectron2.config import get_cfg
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.evaluation.coco_evaluation import COCOEvaluator
from detectron2.data.build import build_detection_test_loader
from detectron2.evaluation.evaluator import inference_on_dataset
from detectron2.evaluation.evaluator import inference_context
from detectron2.structures import Instances, Boxes

from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval

import ensemble_boxes

## Prepare Models, Data and Evaluation

In [3]:
register_coco_instances("dataset_val", {}, "./annotations/instances_val2017.json", "./val2017")

In [4]:
model_configs = [
    "faster_rcnn_R_50_C4_1x.yaml",
    "faster_rcnn_R_50_C4_3x.yaml",
    "faster_rcnn_R_50_DC5_1x.yaml",
    "faster_rcnn_R_50_FPN_1x.yaml",
    # "faster_rcnn_R_50_FPN_3x.yaml",
    # "faster_rcnn_R_50_DC5_3x.yaml",
    # "faster_rcnn_R_101_DC5_3x.yaml",
    # "faster_rcnn_R_101_C4_3x.yaml",
    "retinanet_R_50_FPN_1x.yaml",
    "retinanet_R_50_FPN_3x.yaml",
    # "retinanet_R_101_FPN_3x.yaml",
]

models = dict()
for config in model_configs:
    cfg = get_cfg()
    cfg.merge_from_file(model_zoo.get_config_file(f"COCO-Detection/{config}"))
    cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(f"COCO-Detection/{config}")
    cfg.DATASETS.VAL = ("dataset_val",)

    models[config] = DefaultPredictor(cfg).model

cfg = get_cfg()
cfg.DATASETS.VAL = ("dataset_val",)
val_loader = build_detection_test_loader(cfg, "dataset_val")

Loading config /home/kyungpyo/git/Ensemble-Object-Detection-using-Detectron2/env/lib/python3.8/site-packages/detectron2/model_zoo/configs/COCO-Detection/../Base-RetinaNet.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content.
The checkpoint state_dict contains keys that are not used by the model:
  [35mpixel_mean[0m
  [35mpixel_std[0m
Loading config /home/kyungpyo/git/Ensemble-Object-Detection-using-Detectron2/env/lib/python3.8/site-packages/detectron2/model_zoo/configs/COCO-Detection/../Base-RetinaNet.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content.
The checkpoint state_dict contains keys that are not used by the model:
  [35mpixel_mean[0m
  [35mpixel_std[0m

Category ids in annotations are not in [1, #categories]! We'll apply a mapping for you.



## Evaluate Baseline Models

In [38]:
for config, model in models.items():
    evaluator = COCOEvaluator("dataset_val", False, output_dir=f"results/{config.split('.')[0]}")
    evaluator.reset()
    with inference_context(model), torch.no_grad():
        iter = tqdm(val_loader, total=len(val_loader))
        for idx, inputs in enumerate(iter):
            outputs = model(inputs)
            torch.cuda.synchronize()
            evaluator.process(inputs, outputs)

    print("\n================================================================\n")
    print(config)
    print("\n================================================================\n")
    results = evaluator.evaluate()
    print("\n================================================================\n")

100%|██████████| 5000/5000 [14:25<00:00,  5.78it/s]




faster_rcnn_R_101_C4_3x.yaml


Loading and preparing results...
DONE (t=0.14s)
creating index...
index created!
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.411
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.614
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.441
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.222
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.455
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.559
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.340
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.527
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.551
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.335
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.599
 Avera

## Box mAP of Baseline Models

| Baselines                   | Box AP @(IoU=0.50:0.95, area=all, maxDets=100) |
|-----------------------------|------|
|faster_rcnn_R_50_C4_1x.yaml  | 0.357|
|faster_rcnn_R_50_C4_3x.yaml  | 0.384|
|faster_rcnn_R_50_DC5_1x.yaml | 0.373|
|faster_rcnn_R_50_FPN_1x.yaml | 0.379|
|retinanet_R_50_FPN_1x.yaml   | 0.374|

## Ensemble using Weighted Boxes Fusion Method

### Load Detection Results and Ground Truth

In [5]:
gt_path = "./annotations/instances_val2017.json"
coco_gt = COCO(gt_path)

dt_paths = [
    "./results/faster_rcnn_R_50_C4_1x/coco_instances_results.json",
    "./results/faster_rcnn_R_50_C4_3x/coco_instances_results.json",
    "./results/faster_rcnn_R_50_DC5_1x/coco_instances_results.json",
    # "./results/faster_rcnn_R_50_DC5_3x/coco_instances_results.json",
    "./results/faster_rcnn_R_50_FPN_1x/coco_instances_results.json",
    # "./results/faster_rcnn_R_50_FPN_3x/coco_instances_results.json",
    # "./results/faster_rcnn_R_101_DC5_3x/coco_instances_results.json",
    "./results/retinanet_R_50_FPN_1x/coco_instances_results.json",
    "./results/retinanet_R_50_FPN_3x/coco_instances_results.json",
    # "./results/retinanet_R_101_FPN_3x/coco_instances_results.json",
    # "./results/faster_rcnn_R_101_C4_3x/coco_instances_results.json",
]

coco_dts = [coco_gt.loadRes(dt_path) for dt_path in dt_paths]
img_ids = coco_gt.getImgIds()

loading annotations into memory...
Done (t=0.42s)
creating index...
index created!
Loading and preparing results...
DONE (t=1.08s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.93s)
creating index...
index created!
Loading and preparing results...
DONE (t=1.42s)
creating index...
index created!
Loading and preparing results...
DONE (t=1.05s)
creating index...
index created!
Loading and preparing results...
DONE (t=2.87s)
creating index...
index created!
Loading and preparing results...
DONE (t=2.32s)
creating index...
index created!


In [7]:
# parameters
iou_thr = 0.7
skip_box_thr = 0.0001

print("\n================================================================\n")
print("Ensemble using Weighted Boxes Fusion")
print(f"iou thr: {iou_thr}, skip_box_thr: {skip_box_thr}")
print("\n================================================================\n")

ensemble = []
cnt_id = 0
iter = tqdm(img_ids, total=len(img_ids))
for img_id in iter:
    height = float(coco_gt.loadImgs(img_id)[0]["height"])
    width = float(coco_gt.loadImgs(img_id)[0]["width"])

    tmp_anns = []
    boxes_list = []
    scores_list = []
    labels_list = []

    for coco_dt in coco_dts:
        boxes = []
        scores = []
        labels = []
        for ann in coco_dt.imgToAnns[img_id]:
            x1, y1 = ann["bbox"][0], ann["bbox"][1]
            x2 = ann["bbox"][0] + ann["bbox"][2]
            y2 = ann["bbox"][1] + ann["bbox"][3]
            x1, x2 = x1/width, x2/width
            y1, y2 = y1/height, y2/height

            x1 = min(1.000, max(0.000, x1))
            x2 = min(1.000, max(0.000, x2))
            y1 = min(1.000, max(0.000, y1))
            y2 = min(1.000, max(0.000, y2))
                
            boxes.append([x1,y1,x2,y2])
            scores.append(ann["score"])
            labels.append(ann["category_id"])

        boxes_list.append(boxes)
        scores_list.append(scores)
        labels_list.append(labels)
    
    boxes, scores, labels = ensemble_boxes.weighted_boxes_fusion(
                                            boxes_list, 
                                            scores_list, 
                                            labels_list, 
                                            weights=None, 
                                            iou_thr=iou_thr, 
                                            skip_box_thr=skip_box_thr)

    for box, score, label in zip(boxes, scores, labels):
        x1, y1, x2, y2 = box
        
        x1 *= width
        x2 *= width
        y1 *= height
        y2 *= height

        ann = dict(
            image_id=img_id,
            category_id=label,
            bbox=[x1, y1, x2-x1, y2-y1],
            score=score,
            id=cnt_id,
        )

        ensemble.append(ann)
        
        cnt_id += 1

coco_ensemble = coco_gt.loadRes(ensemble)
coco_eval = COCOeval(coco_gt, coco_ensemble, "bbox")
coco_eval.evaluate()
coco_eval.accumulate()
coco_eval.summarize()
print("\n================================================================\n")



Ensemble using Weighted Boxes Fusion
iou thr: 0.7, skip_box_thr: 0.0001




100%|██████████| 5000/5000 [00:48<00:00, 102.31it/s]


Loading and preparing results...
DONE (t=0.71s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=43.71s).
Accumulating evaluation results...
DONE (t=8.61s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.425
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.620
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.473
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.266
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.470
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.554
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.345
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.570
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.622
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1

## Results

| Models                   | Box AP @(IoU=0.50:0.95, area=all, maxDets=100) |
|-----------------------------|------|
|faster_rcnn_R_50_C4_1x.yaml  | 0.357|
|faster_rcnn_R_50_DC5_1x.yaml | 0.373|
|retinanet_R_50_FPN_1x.yaml   | 0.374|
|__Ensemble Model__   | __*0.403 (+0.029)*__|

---

| Models                   | Box AP @(IoU=0.50:0.95, area=all, maxDets=100) |
|-----------------------------|------|
|faster_rcnn_R_50_C4_1x.yaml  | 0.357|
|faster_rcnn_R_50_DC5_1x.yaml | 0.373|
|faster_rcnn_R_50_FPN_1x.yaml | 0.379|
|retinanet_R_50_FPN_1x.yaml   | 0.374|
|__Ensemble Model__   | __*0.411 (+0.032)*__|

---

| Models                   | Box AP @(IoU=0.50:0.95, area=all, maxDets=100) |
|-----------------------------|------|
|faster_rcnn_R_50_C4_1x.yaml  | 0.357|
|faster_rcnn_R_50_C4_3x.yaml  | 0.384|
|faster_rcnn_R_50_DC5_1x.yaml | 0.373|
|faster_rcnn_R_50_FPN_1x.yaml | 0.379|
|retinanet_R_50_FPN_1x.yaml   | 0.374|
|__Ensemble Model__   | __*0.418 (+0.034)*__|

---


| Models                   | Box AP @(IoU=0.50:0.95, area=all, maxDets=100) |
|-----------------------------|------|
|faster_rcnn_R_50_C4_1x.yaml  | 0.357|
|faster_rcnn_R_50_C4_3x.yaml  | 0.384|
|faster_rcnn_R_50_DC5_1x.yaml | 0.373|
|faster_rcnn_R_50_FPN_1x.yaml | 0.379|
|retinanet_R_50_FPN_1x.yaml   | 0.374|
|retinanet_R_50_FPN_3x.yaml   | 0.387|
|__Ensemble Model__   | __*0.425 (+0.038)*__|

---

| Models                   | Box AP @(IoU=0.50:0.95, area=all, maxDets=100) |
|-----------------------------|------|
|faster_rcnn_R_50_C4_1x.yaml  | 0.357|
|faster_rcnn_R_50_C4_3x.yaml  | 0.384|
|faster_rcnn_R_50_DC5_1x.yaml | 0.373|
|faster_rcnn_R_50_DC5_3x.yaml | 0.391|
|faster_rcnn_R_50_FPN_1x.yaml | 0.379|
|retinanet_R_50_FPN_1x.yaml   | 0.374|
|retinanet_R_50_FPN_3x.yaml   | 0.387|
|__Ensemble Model__   | __*0.429 (+0.038)*__|

---
| Models                   | Box AP @(IoU=0.50:0.95, area=all, maxDets=100) |
|-----------------------------|------|
|faster_rcnn_R_50_C4_1x.yaml  | 0.357|
|faster_rcnn_R_50_C4_3x.yaml  | 0.384|
|faster_rcnn_R_50_DC5_1x.yaml | 0.373|
|faster_rcnn_R_50_DC5_3x.yaml | 0.391|
|faster_rcnn_R_50_FPN_1x.yaml | 0.379|
|faster_rcnn_R_50_FPN_3x.yaml | 0.402|
|retinanet_R_50_FPN_1x.yaml   | 0.374|
|retinanet_R_50_FPN_3x.yaml   | 0.387|
|__Ensemble Model__   | __*0.434 (+0.032)*__|


---

| Models                   | Box AP @(IoU=0.50:0.95, area=all, maxDets=100) |
|-----------------------------|------|
|faster_rcnn_R_50_C4_1x.yaml  | 0.357|
|faster_rcnn_R_50_C4_3x.yaml  | 0.384|
|faster_rcnn_R_50_DC5_1x.yaml | 0.373|
|faster_rcnn_R_50_DC5_3x.yaml | 0.391|
|faster_rcnn_R_50_FPN_1x.yaml | 0.379|
|faster_rcnn_R_50_FPN_3x.yaml | 0.402|
|retinanet_R_50_FPN_1x.yaml   | 0.374|
|retinanet_R_50_FPN_3x.yaml   | 0.387|
|retinanet_R_101_FPN_3x.yaml  | 0.404|
|__Ensemble Model__   | __*0.448 (+0.035)*__|


---


| Models                   | Box AP @(IoU=0.50:0.95, area=all, maxDets=100) |
|-----------------------------|------|
|faster_rcnn_R_50_C4_1x.yaml  | 0.357|
|faster_rcnn_R_50_C4_3x.yaml  | 0.384|
|faster_rcnn_R_50_DC5_1x.yaml | 0.373|
|faster_rcnn_R_50_DC5_3x.yaml | 0.391|
|faster_rcnn_R_50_FPN_1x.yaml | 0.379|
|faster_rcnn_R_50_FPN_3x.yaml | 0.402|
|faster_rcnn_R_101_DC5_3x.yaml| 0.406|
|faster_rcnn_R_101_DC5_3x     | 0.411|
|retinanet_R_50_FPN_1x.yaml   | 0.374|
|retinanet_R_50_FPN_3x.yaml   | 0.387|
|retinanet_R_101_FPN_3x.yaml  | 0.404|
|__Ensemble Model__   | __*0.448 (+0.037)*__|

