# Fitting a (Detectron2) Object Detection Model via K-Fold Cross-Validation to produce Out-of-sample Predictions for every Image in a Dataset 

This notebook demonstrates how to fit a [Detectron2](https://github.com/facebookresearch/detectron2/) model on object detection datasets via *cross-validation*, and produce *out-of-sample* predictions required to run cleanlab for detecting label errors in such data.  Out-of-sample predictions (from a copy of the model that was never trained on the image it is producing a prediction for) are less subject to overfitting and can more accurately detect label issues, but are harder to obtain for every image in an entire dataset unless we employ K-fold cross-validation as demonstrated here.

Thus run this notebook if you wish to find every mislabeled image an a dataset. Once you have the out-of-sample predictions produced from this notebook, you can directly run the cleanlab [tutorial](https://docs.cleanlab.ai/stable/index.html) notebook on [Finding Label Errors in Object Detection Datasets](https://github.com/cleanlab/cleanlab/blob/master/docs/source/tutorials/object_detection.ipynb). Even though the cleanlab tutorial is focused on finding issues in only the validation subset of COCO, you can easily update the tutorial to run on the full dataset for which we obtain predictions via cross-validation in this notebook.

In object detection data, each image is annotated with multiple bounding boxes.  Each bounding box surrounds a physical object within an image scene, and is annotated with a given class label. Using this labeled data, we train a model to predict the locations and classes of objects in an image. The trained model can subsequently be used to identify mislabeled images, which when corrected, allow you to train an even better model without changing your training code! 

Here we fit a state-of-the-art neural network trained starting from a pretrained [X-101](https://github.com/facebookresearch/detectron2/blob/main/MODEL_ZOO.md#imagenet-pretrained-models) network backbone. First let's import the required packages and download the data.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/cleanlab/examples/blob/master/object_detection/detectron2_training.ipynb)

In [None]:
from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg
import pickle
# import some common libraries
import numpy as np
import os, json, cv2, random
from detectron2.data import build_detection_test_loader
from detectron2.evaluation import COCOEvaluator, inference_on_dataset
# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog
from detectron2.data.datasets import register_coco_instances
import glob
from sklearn.model_selection import KFold
import json
from collections import defaultdict

In [None]:
!wget -nc "http://images.cocodataset.org/annotations/annotations_trainval2017.zip"  && unzip -q -o annotations_trainval2017.zip
!wget -nc "http://images.cocodataset.org/zips/val2017.zip" && unzip -q -o val2017.zip
!wget -nc "http://images.cocodataset.org/zips/train2017.zip" && unzip -q -o train2017.zip

In case an error occurs in the preceding cell, kindly download the [COCO dataset](https://cocodataset.org/#download).




If you wish to train on a custom dataset, be sure to review the COCO dataset guidelines for formatting your data, which can be found on their [website](https://cocodataset.org/#format-data).

## Split data into k-folds


This code performs K-Fold cross-validation on a dataset in the COCO format.
* It splits the dataset into multiple folds and creates separate training and test data for each fold. The code loads a JSON file containing the dataset annotations, extracts the image IDs and maps them to file names. 
* It then defines a function to split the data based on the fold indices. The function creates separate training and test data dictionaries, preserving common metadata. It iterates over the images and annotations, assigning them to the appropriate data split based on the image ID. 
* Finally we save the training and test data as separate JSON files and print information about each fold.

In [None]:
import json
from sklearn.model_selection import KFold
from collections import defaultdict
NUM_FOLDS = 5  # Reduce this to get faster results but potentially less accurate predictions 

# Load the JSON file
data = json.load(open("annotations/instances_train2017.json", 'rb'))
image_data = data['images']

# Get unique image IDs and create a mapping of image ID to file name
image_ids = [int(image['file_name'].split(".")[0]) for image in image_data]
image_map = {image['id']: image['file_name'] for image in image_data}

def split_data(train_indices, test_indices):
    train_data = defaultdict(list)
    test_data = defaultdict(list)
    common_metadata = ['info', 'licenses', 'categories']

    for cm in common_metadata:
        train_data[cm] = data[cm]
        test_data[cm] = data[cm]

    train_image_ids = set([image_ids[i] for i in train_indices])
    test_image_ids = set([image_ids[i] for i in test_indices])

    for image in image_data:
        image_id = int(image['file_name'].split('.')[0])
        if image_id in train_image_ids:
            train_data['images'].append(image)
        else:
            test_data['images'].append(image)

    for annotation in data['annotations']:
        image_id = annotation['image_id']
        if image_id in train_image_ids:
            train_data['annotations'].append(annotation)
        else:
            test_data['annotations'].append(annotation)

    return train_data, test_data

def print_data_info(data_dict, fold):
    images_count = len(data_dict['images'])
    annotations_count = len(data_dict['annotations'])
    print(f"Number of images: {images_count}, Number of annotations: {annotations_count}")

# Generate K-Fold cross-validation
kf = KFold(n_splits=NUM_FOLDS)
pairs = []
for fold, (train_indices, test_indices) in enumerate(kf.split(image_ids)):
    train_data, test_data = split_data(train_indices, test_indices)
    train_file = f"train_coco_{fold}_fold.json"
    test_file = f"test_coco_{fold}_fold.json"
    pairs.append([train_file,test_file])
    with open(train_file, 'w') as train_file:
        json.dump(train_data, train_file)
    with open(test_file, 'w') as test_file:
        json.dump(test_data, test_file)
    print(f"Data info for training data fold {fold}:")
    print_data_info(train_data, fold)
    print(f"Data info for test data fold {fold}:")
    print_data_info(test_data, fold)


## Train the model


We define the configuration settings for training an object detection model using Detectron2. The model architecture used in this example is "faster_rcnn_X_101_32x8d_FPN_3x" from the COCO-Detection model zoo. The training data is specified by the "my_dataset_train" dataset and validation data is specified by the "my_dataset_val" dataset which refer to COCO2017 train and val containing only the subset of labels specified before.

The number of worker threads is set to 2 and the batch size is set to 2.
The learning rate and maximum number of iterations are also specified. The model is initialized from the COCO-Detection model zoo and the output directory for the trained model is created. Finally, the configuration is passed to the DefaultTrainer class for training the object detection model.

<strong>Note:</strong> The number of iterations was set based on [early stopping.](https://en.wikipedia.org/wiki/Early_stopping#:~:text=In%20machine%20learning%2C%20early%20stopping,training%20data%20with%20each%20iteration.)

In [None]:
def train_data(TRAIN,VALIDATION,folder):
    cfg = get_cfg()
    MODEL = 'faster_rcnn_X_101_32x8d_FPN_3x.yaml'
    cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/"+MODEL))
    cfg.DATASETS.TRAIN = (TRAIN,)
    cfg.DATASETS.TEST = (VALIDATION,)
    cfg.DATALOADER.NUM_WORKERS = 2
    cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/"+MODEL)  # Let training initialize from model zoo
    cfg.SOLVER.IMS_PER_BATCH = 2  # This is the real "batch size" commonly known to deep learning people
    cfg.SOLVER.BASE_LR = 0.00025  # pick a good LR
    cfg.SOLVER.MAX_ITER = 6000    # 
    cfg.SOLVER.STEPS = []        # milestones where LR is reduced, in this case there's no decay
    cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # The "RoIHead batch size". 
    cfg.MODEL.ROI_HEADS.NUM_CLASSES = 80  
    cfg.TEST.EVAL_PERIOD = 500
    os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
    trainer = DefaultTrainer(cfg) 
    trainer.resume_or_load(resume=False)
    trainer.train();


## Save the results


The given code block implements a function "format_detectron2_predictions" that converts the output of Detectron2 to a format that can be used by Cleanlab for identifying label errors. The function accepts the predicted instances and the number of classes as inputs. It processes the predicted bounding boxes and scores for each instance, and outputs a list of numpy arrays containing the bounding boxes and scores for each class.

In [None]:
def format_detectron2_predictions(instances, num_classes):
    """
    Format Detectron2 predictions into a list of NumPy arrays.

    Args:
        instances (detectron2.structures.Instances): Predicted instances from Detectron2.
        num_classes (int): Number of classes.

    Returns:
        list of numpy.ndarray: Formatted predictions where each element represents a class and contains 
        an array of bounding box coordinates and scores.

    """
    fields = instances.get_fields()
    boxes = fields['pred_boxes'].tensor.numpy()
    results = [[] for _ in range(num_classes)]

    for i in range(len(fields['pred_classes'])):
        pred_class = fields['pred_classes'][i].item()
        scores = instances.get_fields()['scores'][i].item()
        box_coordinates = list(boxes[i])
        box_coordinates.append(scores)
        results[pred_class].append(box_coordinates)

    formatted_results = []
    for i in results:
        if len(i) == 0:
            formatted_array = np.array(i, dtype=np.float32).reshape((0, num_classes))
        else:
            formatted_array = np.array(i, dtype=np.float32)
        formatted_results.append(formatted_array)

    return formatted_results

In [None]:
for k in range(0,NUM_FOLDS):
    result_dict = {}
    train_data = pairs[k][0]
    val_data = pairs[k][1]
    train_data(train_data,val_data,"COCO_TRAIN_"+str(k)+"_FOLD")
    evaluator = COCOEvaluator(val_data, output_dir="output")
    val_loader = build_detection_test_loader(cfg, val_data)
    cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")  # path to the model we just trained
    cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.1   # set a custom testing threshold
    predictor = DefaultPredictor(cfg)
    dataset = json.load(open("../"+pairs[k][1]+'.json','rb'))
    for image in dat['images']:
        im_name = os.path.join(TRAIN_PATH, i['file_name'])
        im = cv2.imread(im_name)
        outputs = predictor(im)
        result_dict[im_name](format_detectron2_predictions(outputs["instances"].to("cpu"),cfg.MODEL.ROI_HEADS.NUM_CLASSES))
    pickle.dump(result_dict,open("results_fold_"+str(k)+".pkl",'wb'))

In [None]:
result_dict = {}
for k in range(0,NUM_FOLDS):
    res_d = pickle.load(open("results_fold_"+str(k)+'.pkl','rb'))
    for r in res_d:
        result_dict[r] = res_d[i]

In [None]:
dataset = pickle.load(open("TRAIN_COCO_ALL_labels.pkl",'rb'))
results = []
for i in dataset:
    im_name = os.path.join(TRAIN_PATH, i['seg_map'].replace(".png",'.jpg'))
    results.append(result_dict[im_name])
pickle.dump(results,open("results_train_ALL.pkl",'wb'))