# Training an Object Detection model using Detectron2

This notebook demonstrates how to train a [Detectron2](https://github.com/facebookresearch/detectron2/) model on object detection datasets and produce predictions required to run cleanlab's tutorial on detecting label errors in object detection data.  Note that this notebook fits the model to an entire training set and produces predictions on a held-out validation set. Thus these predictions are only *out-of-sample* for the validation data, and should ideally *only* be used to find mislabeled images amongst the validation set. To instead find mislabeled images amongst an entire dataset, see the analogous notebook in this folder which uses K-fold cross-validation to produce out-of-sample predictions for every image in the dataset.

In object detection data, each image is annotated with multiple bounding boxes.  Each bounding box surrounds a physical object within an image scene, and is annotated with a given class label. Using this labeled data, we train a model to predict the locations and classes of objects in an image. The trained model can subsequently be used to identify mislabeled images, which when corrected, allow you to train an even better model without changing your training code!

Here we will fit a state-of-the-art neural network trained starting from a pretrained [X-101](https://github.com/facebookresearch/detectron2/blob/main/MODEL_ZOO.md#imagenet-pretrained-models) network backbone. First let's import the required packages and download the data.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/cleanlab/examples/blob/master/object_detection/detectron2_training.ipynb)

In [None]:
# Install PyTorch and torchvision (if needed, adjust versions as necessary)
!pip install torch torchvision

# Install Detectron2
!python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

# Import Detectron2 to verify installation
try:
    import detectron2
    print("Detectron2 was successfully installed!")
except ImportError as e:
    print("Error during Detectron2 installation:", e)

Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m21.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m823.6/823.6 kB[0m [31m50.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cuda-cupti-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.1/14.1 MB[0m [31m56.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cudnn-cu12==8.9.2.26 (from torch)
  Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg
import pickle
# import some common libraries
import numpy as np
import os, json, cv2, random
from detectron2.data import build_detection_test_loader
from detectron2.evaluation import COCOEvaluator, inference_on_dataset
# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog
from detectron2.data.datasets import register_coco_instances


In [None]:
!wget -nc "https://cleanlab-public.s3.amazonaws.com/ObjectDetectionBenchmarking/DATASET_annotations/instances_val2017_5labels.json"
!wget -nc "https://cleanlab-public.s3.amazonaws.com/ObjectDetectionBenchmarking/DATASET_annotations/instances_train2017_5labels.json"
!wget -nc "https://cleanlab-public.s3.amazonaws.com/ObjectDetectionBenchmarking/tutorial_obj/labels.pkl"
!wget -nc "http://images.cocodataset.org/zips/val2017.zip" && unzip -q -o val2017.zip
!wget -nc "http://images.cocodataset.org/zips/train2017.zip" && unzip -q -o train2017.zip

--2024-03-16 10:44:07--  https://cleanlab-public.s3.amazonaws.com/ObjectDetectionBenchmarking/DATASET_annotations/instances_val2017_5labels.json
Resolving cleanlab-public.s3.amazonaws.com (cleanlab-public.s3.amazonaws.com)... 52.217.50.44, 3.5.21.112, 52.216.176.163, ...
Connecting to cleanlab-public.s3.amazonaws.com (cleanlab-public.s3.amazonaws.com)|52.217.50.44|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11699868 (11M) [application/json]
Saving to: ‘instances_val2017_5labels.json’


2024-03-16 10:44:09 (13.0 MB/s) - ‘instances_val2017_5labels.json’ saved [11699868/11699868]

--2024-03-16 10:44:09--  https://cleanlab-public.s3.amazonaws.com/ObjectDetectionBenchmarking/DATASET_annotations/instances_train2017_5labels.json
Resolving cleanlab-public.s3.amazonaws.com (cleanlab-public.s3.amazonaws.com)... 52.217.50.44, 3.5.21.112, 52.216.176.163, ...
Connecting to cleanlab-public.s3.amazonaws.com (cleanlab-public.s3.amazonaws.com)|52.217.50.44|:443... connecte

Before you begin training on a custom dataset, be sure to review the COCO dataset guidelines for formatting your data, which can be found on their [website](https://cocodataset.org/#format-data).

Here we use a custom dataset named "my_dataset" for training. A subset of the labels ["car", "chair", "cup", "person", and "traffic light"] are used for training and detecting errors in this notebook.

In [None]:
IMAGE_PATH = "/content/"
TRAIN_PATH = os.path.join(IMAGE_PATH,"train2017")
VAL_PATH = os.path.join(IMAGE_PATH,"val2017")
register_coco_instances("my_dataset_train", {}, "instances_train2017_5labels.json",
                        TRAIN_PATH)
register_coco_instances("my_dataset_val", {}, "instances_val2017_5labels.json",
                        VAL_PATH)


We define the configuration settings for training an object detection model using Detectron2. The model architecture used in this example is "faster_rcnn_X_101_32x8d_FPN_3x" from the COCO-Detection model zoo. The training data is specified by the "my_dataset_train" dataset and validation data is specified by the "my_dataset_val" dataset which refer to COCO2017 train and val containing only the subset of labels specified before.

Here the number of worker threads is set to 2 and the batch size is set to 2. The learning rate and maximum number of iterations are also specified. You'll want to tinker with these values to get the best performance for your own data.
The model is initialized from the COCO-Detection model zoo and the output directory for the trained model is created. Finally, this configuration is passed to the DefaultTrainer class for training the object detection model.

<strong>Note:</strong> The number of iterations was set based on [early stopping.](https://en.wikipedia.org/wiki/Early_stopping#:~:text=In%20machine%20learning%2C%20early%20stopping,training%20data%20with%20each%20iteration.)

In [None]:
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("my_dataset_train",)
cfg.DATASETS.TEST = ("my_dataset_val",)
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")  # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2  # This is the real "batch size" commonly known to deep learning people
cfg.SOLVER.BASE_LR = 0.00025  # IMPORTANT: pick a good Learning Rate for your dataset
cfg.SOLVER.MAX_ITER = 30000    #
cfg.SOLVER.STEPS = []        # do not decay learning rate
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # The "RoIHead batch size".
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 5  # only 5 classes ["car", "chair", "cup", "person", and "traffic light"]
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)



## Train the model


In [None]:
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()

[03/16 11:06:29 d2.engine.defaults]: Model:
GeneralizedRCNN(
  (backbone): FPN(
    (fpn_lateral2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (top_block): LastLevelMaxPool()
    (bottom_up): ResNet(
      (stem): BasicStem(
        (conv1): Conv2d(
          3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
      )
      (res

model_final_280758.pkl: 167MB [00:01, 101MB/s]                           
roi_heads.box_predictor.bbox_pred.{bias, weight}
roi_heads.box_predictor.cls_score.{bias, weight}


[03/16 11:07:03 d2.engine.train_loop]: Starting training from iteration 0


  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]


[03/16 11:07:12 d2.utils.events]:  eta: 2:28:48  iter: 19  total_loss: 2.667  loss_cls: 1.787  loss_box_reg: 0.7842  loss_rpn_cls: 0.02757  loss_rpn_loc: 0.03303    time: 0.2657  last_time: 0.1876  data_time: 0.0205  last_data_time: 0.0071   lr: 4.9953e-06  max_mem: 2477M
[03/16 11:07:26 d2.utils.events]:  eta: 2:18:06  iter: 39  total_loss: 2.514  loss_cls: 1.701  loss_box_reg: 0.7783  loss_rpn_cls: 0.01867  loss_rpn_loc: 0.02611    time: 0.2515  last_time: 0.3093  data_time: 0.0094  last_data_time: 0.0066   lr: 9.9902e-06  max_mem: 2477M
[03/16 11:07:32 d2.utils.events]:  eta: 2:13:37  iter: 59  total_loss: 2.487  loss_cls: 1.569  loss_box_reg: 0.8207  loss_rpn_cls: 0.02485  loss_rpn_loc: 0.03557    time: 0.2629  last_time: 0.2601  data_time: 0.0098  last_data_time: 0.0148   lr: 1.4985e-05  max_mem: 2477M
[03/16 11:07:39 d2.utils.events]:  eta: 2:15:49  iter: 79  total_loss: 2.221  loss_cls: 1.402  loss_box_reg: 0.7278  loss_rpn_cls: 0.02362  loss_rpn_loc: 0.02871    time: 0.2767  la

## Inference & evaluation using the trained model
If you wish to load a trained model to run inference

In [None]:
evaluator = COCOEvaluator("my_dataset_val", output_dir="/content/output")
val_loader = build_detection_test_loader(cfg, "my_dataset_val")
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")  # path to the model we just trained
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set a custom testing threshold
predictor = DefaultPredictor(cfg)

[03/16 15:54:55 d2.data.datasets.coco]: Loaded 5000 images in COCO format from instances_val2017_5labels.json
[03/16 15:54:55 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')]
[03/16 15:54:55 d2.data.common]: Serializing the dataset using: <class 'detectron2.data.common._TorchSerializedList'>
[03/16 15:54:55 d2.data.common]: Serializing 5000 elements to byte tensors and concatenating them all ...
[03/16 15:54:55 d2.data.common]: Serialized dataset takes 9.72 MiB
[03/16 15:54:57 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from ./output/model_final.pth ...


The code below defines a function "format_detectron2_predictions" to convert the prediction output of Detectron2 to a format that can be used by Cleanlab for identifying label errors. This function accepts the predicted object instances and the number of classes as inputs. It processes the predicted bounding boxes and prediction-confidence for each instance, and outputs a list of numpy arrays containing the bounding boxes and prediction-confidence for each class.

In [None]:
def format_detectron2_predictions(ins,num_classes):
    fields = ins.get_fields()
    boxes = fields['pred_boxes'].tensor.numpy()
    res = [[] for i in range(num_classes)]
    for i in range(0,len(fields['pred_classes'])):
        pred_class = fields['pred_classes'][i].item()
        probs = ins.get_fields()['scores'][i].item()
        box_cord = list(boxes[i])
        box_cord.append(probs)
        res[pred_class].append(box_cord)
    res2 = []
    for i in res:
        if len(i)==0:
            q = np.array(i,dtype=np.float32).reshape((0,num_classes))
        else:
            q = np.array(i,dtype=np.float32)
        res2.append(q)
    return res2



To perform inference and testing on the tutorial notebook linked here, we utilize a limited portion of the validation set of COCO 2017. To find label errors in this subset, please run our [tutorial](https://docs.cleanlab.ai/stable/index.html) notebook on [Finding Label Errors in Object Detection Datasets](https://github.com/cleanlab/cleanlab/blob/master/docs/source/tutorials/object_detection.ipynb).

In [None]:
labels = pickle.load(open("labels.pkl",'rb'))
results = []
for i in labels:
    im_name = os.path.join(VAL_PATH, i['seg_map'].replace(".png",'.jpg'))
    im = cv2.imread(im_name)
    outputs = predictor(im)
    results.append(format_detectron2_predictions(outputs["instances"].to("cpu"),cfg.MODEL.ROI_HEADS.NUM_CLASSES))

pickle.dump(results,open("predictions.pkl",'wb'))

In [None]:
from google.colab import files

# Specify the file path you want to download
file_path = "/content/output/model_final.pth"

# Use the files.download function to download the file
files.download(file_path)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>