# Testing an Object Detection Model with Doleus

This notebook demonstrates how to use **Doleus** to test an object detection model across different data slices. We use a small subset of the COCO validation dataset (**TinyCocoDataset**, 50 images) and create slices based on:
- Image brightness ("brighter" vs. "darker")
- Metadata ("indoor" vs. "outdoor")

We use the pre-trained **Faster R-CNN (ResNet50-FPN v2)** model from TorchVision.

### Steps in this tutorial:
1. Download the TinyCocoDataset.
2. Create a PyTorch dataset.
3. Slice the dataset based on brightness and metadata.
4. Generate predictions using the model.
5. Evaluate performance on each data slice with Doleus.

In [1]:
import os 
import requests
import zipfile

from torchvision.datasets import CocoDetection

In [2]:
class TinyCocoDataset(CocoDetection):
    """
    A simplified dataset class for Tiny COCO.
    Inherits from torchvision's CocoDetection but only loads bounding boxes.
    """

    @staticmethod
    def download_dataset(dir: str = "tiny_coco", url: str = "https://www.dropbox.com/scl/fo/ea21g39o29gihqqyd239w/ALuoU3X4UigKJuJVySSyXVk?rlkey=jloatvwfq3024l5701m29g2ij&st=7071uuzd&dl=1"):
        """Download the demo dataset from Dropbox, store it in dir, unzip it and return the directory"""
        if os.path.exists(dir):
            return dir
            
        os.makedirs(dir, exist_ok=True)
        
        zip_path = f"{dir}.zip"
        response = requests.get(url)
        
        with open(zip_path, "wb") as f:
            f.write(response.content)
                
        with zipfile.ZipFile(zip_path, 'r') as zip_ref:
            zip_ref.extractall(dir)
            
        os.remove(zip_path)
        return dir
    
    def __init__(
        self, root: str | None = None, split: str = "val", transform=None, target_transform=None
    ) -> None:
        if root is None:
            root = TinyCocoDataset.download_dataset()
        ann_file = os.path.join(root, "annotations", f"instances_{split}2017.json")
        img_folder = os.path.join(root, f"{split}2017")
        super().__init__(
            img_folder, ann_file, transform=transform, target_transform=target_transform
        )

    def __getitem__(self, idx: int):
        img, target = super().__getitem__(idx)

        if target:
            boxes = torch.tensor([ann["bbox"] for ann in target], dtype=torch.float32)
        else:
            boxes = torch.empty((0, 4), dtype=torch.float32)
        boxes = box_convert(boxes, in_fmt="xywh", out_fmt="xyxy")

        labels = [ann["category_id"] for ann in target]
        return img, boxes, labels

# Step 2: Load and Prepare the Dataset

We have prepared a subset of the Coco Dataset (https://cocodataset.org/#home) for download. It is 9.1 MB in size and contains 50 images from the COCO validation set, their respective annotations and some additional metadata. 

**Dataset Details:**
- The dataset consists of 50 images from the COCO validation set.
- Each image has corresponding annotations (bounding boxes, labels).
- Additional metadata includes location information (e.g., 'indoor' or 'outdoor').

In [None]:
dataset = TinyCocoDataset() #This will download the dataset and store it in the current working directory as tiny_coco

# Step 3:  Initialize the Model and Preprocess Transform

In [4]:
from torchvision.models.detection import FasterRCNN_ResNet50_FPN_V2_Weights, fasterrcnn_resnet50_fpn_v2

weights = FasterRCNN_ResNet50_FPN_V2_Weights.COCO_V1
model = fasterrcnn_resnet50_fpn_v2(weights=weights)
model.eval()

preprocess = weights.transforms()


# Step 4: Create Predictions for the Dataset
Depending on your computing infrastructure, this can take a couple of minutes for all 50 images.
You can execute the code that processes only a subset to increase the speed.

In [5]:
import torch
from torchvision.ops import box_convert

predictions = []
with torch.no_grad():
    for idx in range(len(dataset)):
        img, gt_boxes, gt_labels = dataset[idx]

        img_processed = preprocess(img)
        prediction = model([img_processed])[0]

        # Save predictions in the format expected by Doleus
        pred_entry = {
            "boxes": prediction["boxes"].cpu(),
            "labels": prediction["labels"].cpu(),
            "scores": prediction["scores"].cpu(),
        }
        predictions.append(pred_entry)

# Step 5: Create a Doleus Dataset Wrapper and Add Metadata

**Why Wrap the Dataset?**
- Using `DoleusDetection`, we can associate model predictions with the images in the dataset.
- This allows us to apply tests to different data slices.

In [None]:
from moonwatcher.dataset.dataset import MoonwatcherDetection

moonwatcher_dataset = MoonwatcherDetection(
    name="tiny-coco-val-subset",
    dataset=dataset,
    num_classes=91,  # COCO has 91 classes
)

In [8]:
# Associate the model predictions with the dataset
moonwatcher_dataset.add_model_predictions(predictions, "faster_rcnn")

## Adding pre-defined metadata
Doleus offers some methods that add metadata out of the box. These metadates are based on classic image statistics.

**Why Metadata Matters?**
- Metadata provides additional insights beyond raw labels.
- Using attributes like brightness, contrast, or resolution, we can analyze whether certain factors impact model performance.

In [None]:
# Add predefined metadata ("brightness") to allow slicing based on image properties
moonwatcher_dataset.add_predefined_metadata("brightness")

# You can add other predefined metadata as well and perform slicing based on them
# moonwatcher_dataset.add_predefined_metadata("contrast")
# moonwatcher_dataset.add_predefined_metadata("saturation")
# moonwatcher_dataset.add_predefined_metadata("resolution")


## Adding custom metadata
With Doleus you can also add custom metadata. We have prepared a metadata file that assigns each image to the location of either "indoor" or "outdoor"

**Custom Metadata Example:**
- The dataset includes a CSV file that assigns images to 'indoor' or 'outdoor' categories.
- This allows us to evaluate if the model behaves differently in varying environments.

In [10]:
import pandas as pd

metadata = pd.read_csv("tiny_coco/metadata/tiny_coco_metadata.csv")

# For the complete dataset with 50 images, the metadata file has 50 rows.
moonwatcher_dataset.add_metadata_from_dataframe(metadata)

## Creating Slices based on metadata
Now we create slices based on the metadata we have defined.

In [11]:
# Create slices based on brightness percentile
slice_bright = moonwatcher_dataset.slice_by_percentile("brightness", ">=", 50)
slice_dim = moonwatcher_dataset.slice_by_percentile("brightness", "<", 50)

In [12]:
# Create slices based on location
slice_indoor = moonwatcher_dataset.slice_by_metadata_value("location", "indoor")
slice_outdoor = moonwatcher_dataset.slice_by_metadata_value("location", "outdoor")

# Create checks for each of the slices
We can create Checks for each Slice to evaluate and analyze their respective performance.

**What Are Checks?**
- Checks define performance criteria (e.g., `mAP > 0.8`) for a dataset or slice and a model.
- If a Check fails, it indicates that the slice doesn't perform as well as expected.
- That helps you to identify subsets of the data that underperform.

In [13]:
from moonwatcher.check import Check

check_map_bright = Check(
    name="map_bright",
    dataset=slice_bright,
    model_id="faster_rcnn",
    metric="mAP",
    operator=">",
    value=0.8,
)

check_map_dim = Check(
    name="map_dim",
    dataset=slice_dim,
    model_id="faster_rcnn",
    metric="mAP",
    operator=">",
    value=0.8,
)

check_map_indoor = Check(
    name="map_indoor",
    dataset=slice_indoor,
    model_id="faster_rcnn",
    metric="mAP",
    operator=">",
    value=0.8,
)

check_map_outdoor = Check(
    name="map_outdoor",
    dataset=slice_outdoor,
    model_id="faster_rcnn",
    metric="mAP",
    operator=">",
    value=0.8,
)

# Combining the Checks into a Checksuite

**Why Use a CheckSuite?**
- Instead of running individual checks manually, a `CheckSuite` groups multiple checks together.
- This allows for easy evaluation across different dataset slices.


In [16]:
from moonwatcher.check import CheckSuite

check_suite = CheckSuite(
    name="check_suite",
    checks=[check_map_bright, check_map_dim, check_map_indoor, check_map_outdoor],
)

In [None]:
# Now we can run the Checksuite and save the results as a JSON file
report = check_suite.run_all(show=True)
report.to_json("check_suite_report.json")