# Video Object Detection and Tracking

In [None]:
!pip install -U fiftyone ultralytics deep-sort-realtime

Load an example video dataset and delete the ground truth fields to show how to use the video object detection and tracking API:

In [1]:
import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F
dataset = foz.load_zoo_dataset("quickstart-video")
dataset.select_fields().keep_fields()

Dataset already downloaded
Loading existing dataset 'quickstart-video'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use


In [2]:
session = fo.launch_app(dataset, auto=False)

Session launched. Run `session.show()` to open the App in a cell output.


Download and load an object detection model (YOLOv8l) from Ultralytics:

In [4]:
from ultralytics import YOLO
model = YOLO('yolov8l.pt')

Downloading https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8l.pt to 'yolov8l.pt'...


100%|██████████| 83.7M/83.7M [00:14<00:00, 6.02MB/s]


Apply the model to all frames across all videos in the dataset:

In [5]:
dataset.apply_model(model, label_field="yolo_v8l")

Computing metadata...
 100% |███████████████████| 10/10 [166.4ms elapsed, 0s remaining, 60.1 samples/s] 
 100% |███████████████| 1279/1279 [5.3m elapsed, 0s remaining, 4.1 samples/s]       


In [6]:
session.refresh()

Inspect the first frame of the first video:

In [9]:
dataset.first().frames[1]

<Frame: {
    'id': '66421a3d44b3ca4add818000',
    'frame_number': 1,
    'yolo_v8l': <Detections: {
        'detections': [
            <Detection: {
                'id': '66436ef9307856cf2faaec5b',
                'attributes': {},
                'tags': [],
                'label': 'stop sign',
                'bounding_box': [
                    0.9099450334906578,
                    0.4144589528441429,
                    0.06314297020435333,
                    0.1033763438463211,
                ],
                'mask': None,
                'confidence': 0.911078155040741,
                'index': None,
            }>,
            <Detection: {
                'id': '66436ef9307856cf2faaec5c',
                'attributes': {},
                'tags': [],
                'label': 'car',
                'bounding_box': [
                    0.8739790096879005,
                    0.7539754882454872,
                    0.1255653351545334,
                    0.238774105906

Filter for just high confidence detections:

In [19]:
from fiftyone import ViewField as F
view = dataset.filter_labels("frames.yolo_v8l", F("confidence") > 0.75)

print(dataset.count("frames.yolo_v8l.detections"))
print(view.count("frames.yolo_v8l.detections"))

10813
3708


Track objects across frames using DeepSort (this might take a while):

In [20]:
from fiftyone.utils.tracking import DeepSort

DeepSort.track(dataset, "frames.yolo_v8l")

 100% |███████████████████| 10/10 [14.8m elapsed, 0s remaining, 0.0 samples/s]   


In [21]:
session.refresh()

![Video Tracking](../assets/video_tracking.gif)

View tracks one by one:

In [24]:
tracks_view = dataset.to_trajectories("frames.ds_tracks")

In [80]:
session.view = tracks_view.view()

![Video Trajectories](../assets/video_trajectories.gif)