# Remove Duplicate Objects

This recipe demonstrates a simple workflow for finding and removing duplicate objects in your FiftyOne datasets using [intersection over union (IoU)](https://en.wikipedia.org/wiki/Jaccard_index).

Specificially, it covers:

- Using the [compute_max_ious()](https://docs.voxel51.com/api/fiftyone.utils.iou.html#fiftyone.utils.iou.compute_max_ious) utility to compute overlap between spatial objects
- Using the App’s [tagging UI](https://docs.voxel51.com/user_guide/app.html#tags-and-tagging) to review and delete duplicate labels
- Using the [find_duplicates()](https://docs.voxel51.com/api/fiftyone.utils.iou.html#fiftyone.utils.iou.find_duplicates) utility to automatically detect duplicate objects



## Setup

If you haven't already, install FiftyOne:

In [None]:
!pip install fiftyone

## Load a Dataset

For the walkthrough, we will be using the [MSCOCO 2017](https://cocodataset.org/#home) validation split from the [FiftyOne Dataset Zoo](https://docs.voxel51.com/user_guide/dataset_zoo/datasets.html#coco-2017). We can load it in with the following:

In [None]:
import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("coco-2017", split="validation", max_samples=1000)

## Finding Duplicate Objects

Now let’s use the [`compute_max_ious()`](https://docs.voxel51.com/api/fiftyone.utils.iou.html#fiftyone.utils.iou.compute_max_ious) utility to compute the maximum IoU between each object in the `ground_truth` field with another object of the same class (`classwise=True`) within the same image.

The max IOU will be stored in a max_iou attribute of each object, and the idea here is that duplicate objects will necessarily have high [IoU](https://en.wikipedia.org/wiki/Jaccard_index) with another object.

In [2]:
import fiftyone.utils.iou as foui

foui.compute_max_ious(dataset, "ground_truth", iou_attr="max_iou", classwise=True)
print("Max IoU range: (%f, %f)" % dataset.bounds("ground_truth.detections.max_iou"))

 100% |███████████████| 1000/1000 [972.3ms elapsed, 0s remaining, 1.0K samples/s]       
Max IoU range: (0.000000, 0.951640)


Note that [`compute_max_ious()`](https://docs.voxel51.com/api/fiftyone.utils.iou.html#fiftyone.utils.iou.compute_max_ious) provides an optional other_field parameter if you would like to compute IoUs between objects in different fields instead.

In any case, let’s create a [view](https://docs.voxel51.com/user_guide/using_views.html#filtering-sample-contents) that contains only labels with a max IoU > 0.75:

In [3]:
from fiftyone import ViewField as F

# Retrieve detections that overlap above a chosen threshold
dups_view = dataset.filter_labels("ground_truth", F("max_iou") > 0.75)
print(dups_view)

Dataset:     coco-2017-validation-1000
Media type:  image
Num samples: 7
Sample fields:
    id:           fiftyone.core.fields.ObjectIdField
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
View stages:
    1. FilterLabels(field='ground_truth', filter={'$gt': ['$$this.max_iou', 0.75]}, only_matches=True, trajectories=False)


In [None]:
session = fo.launch_app(view=dups_view)

![dups_view](../assets/find_duplicates.png)

In FiftyOne, we can tag our samples and export them for annotation job with one of labeling integrations: CVAT, Label Studio, V7, or LabelBox! This can get our dataset back into tip-top shape to train again!