#  Evaluating Object Detections with FiftyOne

This walkthrough demonstrates how to use FiftyOne to perform hands-on evaluation of your detection model.

It covers the following concepts:

-   Loading a dataset with ground truth detections
-   Adding model predictions to your dataset
-   Performing COCO-style evaluation of the predictions
-   Sorting and searching samples by model performance
-   Performing complex queries on your dataset and visualizing label efficacy

## Setup

In this tutorial, we'll use an off-the-shelf [Faster R-CNN detection model](https://pytorch.org/docs/stable/torchvision/models.html#faster-r-cnn) provided by PyTorch.

To use it, you'll need to install `torch` and `torchvision`, if necessary. We'll also need `pycocotools` for evaluation:

In [1]:
# Modify as necessary (e.g., GPU install). See https://pytorch.org for options
!pip install torch
!pip install torchvision

!pip install pycocotools



The following snippet will download the pretrained model from the web and load it:

In [2]:
import torch
import torchvision

# Run the model on GPU if it is available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Load a pre-trained Faster R-CNN model
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.to(device)
model.eval()

print("Model ready")

Model ready


We'll perform our analysis on the validation split of the [COCO dataset](https://cocodataset.org/#home), which is conveniently available for download via the [FiftyOne Dataset Zoo](https://voxel51.com/docs/fiftyone/user_guide/dataset_creation/zoo.html).

The snippet below will download the validation split to `~/fiftyone/coco-2017/validation`:

In [3]:
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("coco-2017", split="validation")

Split 'validation' already downloaded
Loading 'coco-2017' split 'validation'
 100% |█████| 5000/5000 [32.1s elapsed, 0s remaining, 158.8 samples/s]


Let's inspect the dataset to see what we downloaded:

In [4]:
# Print some information about the dataset
print(dataset)

Name:           coco-2017-validation
Num samples:    5000
Persistent:     False
Info:           {'classes': ['0', 'person', 'bicycle', ...]}
Tags:           ['validation']
Sample fields:
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)


In [5]:
# Print a sample ground truth detection
sample = dataset.first()
print(sample.ground_truth.detections[0])

<Detection: {
    'id': '5f46a8e098ce209ec9da498f',
    'label': 'potted plant',
    'bounding_box': array([0.37028125, 0.33453052, 0.03859375, 0.16314554]),
    'confidence': None,
    'attributes': BaseDict({
        'area': <NumericAttribute: {'value': 531.8071000000001}>,
        'iscrowd': <NumericAttribute: {'value': 0.0}>,
    }),
}>


Note that the ground truth detections are stored in the `ground_truth` field of the samples.

## Add predictions to dataset

Now let's generate some predictions to analyze.

The code below performs inference with the Faster R-CNN model on every sample in the dataset and stores the resulting predictions in a `faster_rcnn` field of the samples. 

In [6]:
subset = dataset.take(100, seed=51)

In [7]:
import json
import os
from PIL import Image

from torchvision.transforms import functional as func

import fiftyone as fo


# Get class list
classes = dataset.info["classes"]

# Add predictions to dataset
with fo.ProgressBar() as pb:
    for sample in pb(subset):
        # Load image
        image = Image.open(sample.filepath)
        image = func.to_tensor(image).to(device)
        c, h, w = image.shape
        
        # Perform inference
        preds = model([image])[0]
        labels = preds["labels"].cpu().detach().numpy()
        scores = preds["scores"].cpu().detach().numpy()
        boxes = preds["boxes"].cpu().detach().numpy()
        
        # Convert detections to FiftyOne format
        detections = []
        for label, score, box in zip(labels, scores, boxes):
            # Convert to [top-left-x, top-left-y, width, height]
            # in relative coordinates in [0, 1] x [0, 1]
            x1, y1, x2, y2 = box
            rel_box = [x1 / w, y1 / h, (x2 - x1) / w, (y2 - y1) / h]

            detections.append(fo.Detection(
                label=classes[label],
                bounding_box=rel_box,
                confidence=score
            ))
        
        # Save predictions to dataset
        sample["faster_rcnn"] = fo.Detections(detections=detections)
        sample.save()

print("Finished adding predictions")

 100% |█████| 100/100 [133.2ms elapsed, 13.2s remaining, 7.5 samples/s]
Finished adding predictions


## Evaluate detections

Now that we have a dataset with ground truth and predicted objects, let's use FiftyOne to evaluate the quality of the detections.

We'll start with some basic analysis of the predictions according to their confidence scores; then we'll compute true/false positives for each sample and analyze those.

### Applying a confidence threshold

FiftyOne provides the ability to [write expressions](https://voxel51.com/docs/fiftyone/user_guide/using_views.html#filtering) that match, filter, and sort detections based on their attributes. See [using DatasetViews](https://voxel51.com/docs/fiftyone/user_guide/using_views.html) for full details.

For example, let's generate a view that contains only detections whose `confidence` is at least `0.75`:

In [8]:
from fiftyone import ViewField as F

# Only keep detections with confidence >= 0.75
high_conf_view = subset.filter_detections("faster_rcnn", F("confidence") > 0.75)

In [9]:
# Print some information about the view
print(high_conf_view)

Dataset:        coco-2017-validation
Num samples:    100
Tags:           ['validation']
Sample fields:
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    faster_rcnn:  fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
Pipeline stages:
    1. Take(size=100, seed=51)
    2. FilterDetections(field='faster_rcnn', filter={'$gt': ['$$this.confidence', 0.75]})


In [10]:
# Print a sample prediction from the view
sample = high_conf_view.first()
print(sample.faster_rcnn.detections[0])

<Detection: {
    'id': '5f46a90698ce209ec9db7c8b',
    'label': 'dog',
    'bounding_box': array([0.38723122, 0.5448051 , 0.56090901, 0.40119233]),
    'confidence': 0.9992108345031738,
    'attributes': BaseDict({}),
}>


Suppose we want to study these high confidence detections (those with `confidence >= 0.75`) in more detail.

We can conveniently do that by creating a new field on our samples that contains only the detections from the filtered 
view:

In [11]:
# Create a new `faster_rcnn_75` field on `dataset` that contains the detections
# from the `faster_rcnn` field of the samples in `high_conf_view`
new_field = "faster_rcnn_75"
dataset.clone_field("faster_rcnn", new_field, samples=high_conf_view)

# Verify that the new field was created
print(dataset)

 100% |█████| 100/100 [4.3s elapsed, 0s remaining, 32.6 samples/s]      
Name:           coco-2017-validation
Num samples:    5000
Persistent:     False
Info:           {'classes': ['0', 'person', 'bicycle', ...]}
Tags:           ['validation']
Sample fields:
    filepath:       fiftyone.core.fields.StringField
    tags:           fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:       fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
    ground_truth:   fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    faster_rcnn:    fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    faster_rcnn_75: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)


### Perform COCO-style evaluation

Now let's evaluate the quality of the detections in the `faster_rcnn_75` field of our samples with respect to the ground truth detections in the `ground_truth` field.

FiftyOne provides a `fiftyone.utils.eval` module that contains a collection of utility methods for performing evaluation of model predictions.

In particular, we'll use the [evaluate_detections()](https://voxel51.com/docs/fiftyone/api/fiftyone.utils.eval.coco.html#fiftyone.utils.eval.coco.evaluate_detections) method, which performs the following for each sample:

- Matches predicted and ground truth detections across a range of [Intersection over Union (IoU)](https://en.wikipedia.org/wiki/Jaccard_index) values: `[0.50, 0.55, ..., 0.90, 0.95]`

- Computes true positive (TP), false positive (FP), and false negative (FN) counts for each sample

- Stores this information in the dataset


In [12]:
import fiftyone.utils.eval as foue

foue.evaluate_detections(subset, "faster_rcnn_75", gt_field="ground_truth")

Evaluating detections...
 100% |███| 100/100 [4.6s elapsed, 0s remaining, 27.5 samples/s]      


Let's inspect the contents of the dataset to see what information was added:

In [13]:
# Print the schema of the dataset
print(subset)

Dataset:        coco-2017-validation
Num samples:    100
Tags:           ['validation']
Sample fields:
    filepath:       fiftyone.core.fields.StringField
    tags:           fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:       fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
    ground_truth:   fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    faster_rcnn:    fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    faster_rcnn_75: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    tp_iou_0_75:    fiftyone.core.fields.IntField
    fp_iou_0_75:    fiftyone.core.fields.IntField
    fn_iou_0_75:    fiftyone.core.fields.IntField
Pipeline stages:
    1. Take(size=100, seed=51)


Each sample now contains new fields `tp_iou_0_75`, `fp_iou_0_75`, and `fn_iou_0_75` corresponding to the total true positive (TP), false positive (FP), and false negative (FN) counts for the detections in the `faster_rcnn_75` field of the samples at an IoU of 0.75 (this value can be customized via the `save_iou` argument of the evaluation method).

In addition, the predictions in the `faster_rcnn_75` field of each sample contain a new `ground_truth_eval` field that tabulates TP, FP, and FN counts for each IoU under test.

Finally, the individual detections in the `faster_rcnn_75` field of each sample have a new `ground_truth_eval` field that contains:

- An `eval_id` field that specifies a UUID for the detection

- An `ious` field that contains the IoUs for every class of that detection with respect to the ground truth detections of that class

- A `matches` field that lists the `eval_id` and `iou` for the matching ground truth detection (if any) using the matching algorithm provided by `pycocotools`

## Visualize detections

Now let's visualize this evaluation information that we've added to our dataset!

### Visualizing bounding boxes

First, let's launch the [FiftyOne App](https://voxel51.com/docs/fiftyone/user_guide/app.html) and view the ground truth and predicted bounding boxes:

In [14]:
session = fo.launch_app(dataset=dataset)
session.view = subset

App launched


![launch](images/eval_dets/eval_dets_1.png)

Each field of the samples are shown as togglable bubbles on the left sidebar which can be used to control whether ground truth detections (`ground_truth`), raw predictions (`faster_rcnn`), or high confidence predictions (`faster_rcnn_75`) are rendered on the images:

![bubbles](images/eval_dets/eval_dets_2.png)

### Manually select samples of interest

You can select images in the App by clicking on them. Then, you can create a view that contains only those samples:

In [15]:
# The currently selected images in the App
selected_samples = session.selected

# Create a new view that contains only the selected samples
# And open this view in the App!
session.view = dataset.select(selected_samples)

![selected](images/eval_dets/eval_dets_3.png)

Let's reset the session to show the entire dataset again:

In [16]:
# Resets the session; the entire dataset will now be shown
session.view = None

### View the best-performing samples

Recall that a `tp_iou_0_75` field was added to each sample that tabulates the number of true positive detections in that sample.

Let's create a view that sorts by `tp_iou_0_75` so we can see the best-performing cases of our model (i.e., the samples with the most correct predictions):

In [17]:
# Show samples with most true positives first
session.view = subset.sort_by("tp_iou_0_75", reverse=True)

![tprev](images/eval_dets/eval_dets_4.png)

### View the worst-performing samples

Similarly, we can sort by the `fp_iou_0_75` field to see the worst-performing cases of our model (i.e., the samples with the most false positive predictions):

In [18]:
# Show samples with most false positives first
session.view = subset.sort_by("fp_iou_0_75", reverse=True)

![fprev](images/eval_dets/eval_dets_5.png)

### Filtering by bounding box area

`DatasetView` pipelines are extremely powerful. For example, let's look at how our model performed on small objects by creating a view that contains only detections whose bounding box area is less than `0.005`:

In [19]:
# [top-left-x, top-left-y, width, height]
bbox_area = F("bounding_box")[2] * F("bounding_box")[3]

# Create a view that contains only predictions whose area is < 0.005
small_boxes_view = subset.filter_detections("faster_rcnn_75", bbox_area < 0.005)

session.view = small_boxes_view

![small](images/eval_dets/eval_dets_6.png)

### Viewing detections in a crowd

Recall that our ground annotations from the COCO dataset have an `iscrowd = 0/1` attribute that indicates whether a box contains multiple instances of the same object.

Let's create a view that contains only samples with at least one detection for which `iscrowd` is 1:

In [20]:
# Create a view that contains only samples for which at least one detection has 
# its iscrowd attribute set to 1
crowded_images_view = subset.match(
    F("ground_truth.detections").filter(F("attributes.iscrowd.value") == 1).length() > 0
)

session.view = crowded_images_view

![crowd](images/eval_dets/eval_dets_7.png)

### More complex insights

Let's combine our previous operations to form more complex queries that provide deeper insight into the quality of our detections.

For example, let's sort our view of crowded images from the previous section in decreasing order of false positive counts, so that we can see samples that have many (allegedly) spurious predictions in images that are known to contain crowds of objects:

In [21]:
sorted_crowded_images_view = crowded_images_view.sort_by(
    "fp_iou_0_75", reverse=True
)

session.view = sorted_crowded_images_view

![crowdsort](images/eval_dets/eval_dets_8.png)

Let's compare the above view to another view that just sorts by false positive count, regardless of whether the image is crowded:

In [22]:
session.view = subset.sort_by("fp_iou_0_75", reverse=True)

![fprev](images/eval_dets/eval_dets_5.png)

See anything interesting?

Comparing the individual examples, we see that the samples that contain many false positives are the ones where the 
underlying ground truth bounding box was missing the `iscrowd` attribute! The effect of this omission is that crowds of correct predictions are evaluated as false positives even though they are true positives.

In other words, the quality of model may not be responsible for the purportedly low quantitative performance of the detections; in fact, the ground truth annotations from the COCO dataset should be refined to fix the missing `iscrowd` annotations!

This conclusion would have been nearly impossible to achieve without visually inspecting the individual samples in the dataset according to the variety of criteria that we considered in this tutorial.

FiftyOne enables rapid experimentation with your datasets!