The massive sea of computer vision models can be difficult to navigate if you are trying to find the best models or even just relevant baselines for your task. Model zoos like TensorFlow Hub and Facebook’s Detectron2 make it easy to access popular models. Further, libraries like PyTorch lightning make it easy to then modify these models to suit your needs. This is all well and good for images, but for videos, it’s another story. Video data is becoming increasingly more popular, but the additional complexity that comes with it often leaves video-related tasks on the backburner.

PyTorchVideo provides access to a video model zoo, video data processing functions, and a video-focused accelerator to deploy models all backed in PyTorch allowing for seamless integration into existing workflows.

The only thing missing from PyTorchVideo to complete your video workflows is a way to visualize your datasets and interpret your model results. This is where FiftyOne comes in. FiftyOne is an open-source tool that I have been working on at Voxel51. It is designed to make it easy to visualize any image or video dataset and explore ground truth and predicted labels stored locally or in the cloud. The flexible representation of FiftyOne datasets and the FiftyOne App let you quickly get hands-on with your datasets and interpret your models to find failure modes, annotation mistakes, visualize complex labels, and more.

In [3]:
!pip install fiftyone pytorch torchvision

Collecting pytorch
  Using cached https://files.pythonhosted.org/packages/ee/67/f403d4ae6e9cd74b546ee88cccdb29b8415a9c1b3d80aebeb20c9ea91d96/pytorch-1.0.2.tar.gz
Building wheels for collected packages: pytorch
  Building wheel for pytorch (setup.py) ... [?25lerror
[31m  ERROR: Failed building wheel for pytorch[0m
[?25h  Running setup.py clean for pytorch
Failed to build pytorch
Installing collected packages: pytorch
    Running setup.py install for pytorch ... [?25l[?25herror
[31mERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-8hfd_gxx/pytorch/setup.py'"'"'; __file__='"'"'/tmp/pip-install-8hfd_gxx/pytorch/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-iw9sc7kn/install-record.txt --single-version-externally-managed --compile Check

While PyTorchVideo is also installable through pip, the functionality in this post requires it to be installed through GitHub:

In [5]:
!git clone https://github.com/facebookresearch/pytorchvideo.git
%cd pytorchvideo
!pip install -e .

Cloning into 'pytorchvideo'...
remote: Enumerating objects: 1411, done.[K
remote: Counting objects: 100% (405/405), done.[K
remote: Compressing objects: 100% (238/238), done.[K
remote: Total 1411 (delta 233), reused 262 (delta 152), pack-reused 1006[K
Receiving objects: 100% (1411/1411), 5.43 MiB | 6.14 MiB/s, done.
Resolving deltas: 100% (718/718), done.
/content/pytorchvideo
Obtaining file:///content/pytorchvideo
Collecting fvcore>=0.1.4
[?25l  Downloading https://files.pythonhosted.org/packages/f5/f2/2873bf64ce3e7bc2ef1b5f389729c18925edab7bfefd909235aadf73ef50/fvcore-0.1.5.post20210609.tar.gz (49kB)
[K     |████████████████████████████████| 51kB 2.9MB/s 
[?25hCollecting av
[?25l  Downloading https://files.pythonhosted.org/packages/66/ff/bacde7314c646a2bd2f240034809a10cc3f8b096751284d0828640fff3dd/av-8.0.3-cp37-cp37m-manylinux2010_x86_64.whl (37.2MB)
[K     |████████████████████████████████| 37.2MB 1.4MB/s 
[?25hCollecting parameterized
  Downloading https://files.pythonhos

This walkthrough uses a subset of the Kinetics-400 dataset which can be downloaded with the following code snippet:

In [6]:
!pip install youtube-dl
!wget https://storage.googleapis.com/deepmind-media/Datasets/kinetics400.tar.gz
!tar -xvf ./kinetics400.tar.gz

Collecting youtube-dl
[?25l  Downloading https://files.pythonhosted.org/packages/a4/43/1f586e49e68f8b41c4be416302bf96ddd5040b0e744b5902d51063795eb9/youtube_dl-2021.6.6-py2.py3-none-any.whl (1.9MB)
[K     |████████████████████████████████| 1.9MB 6.8MB/s 
[?25hInstalling collected packages: youtube-dl
Successfully installed youtube-dl-2021.6.6
--2021-06-09 14:26:29--  https://storage.googleapis.com/deepmind-media/Datasets/kinetics400.tar.gz
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.139.128, 74.125.141.128, 173.194.210.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.139.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10810465 (10M) [application/octet-stream]
Saving to: ‘kinetics400.tar.gz’


2021-06-09 14:26:30 (158 MB/s) - ‘kinetics400.tar.gz’ saved [10810465/10810465]

kinetics400/
kinetics400/validate.json
kinetics400/validate.csv
kinetics400/train.json
kinetics400/train.csv
kinetics400/test.json
k

One of the many reasons that video datasets are more difficult to work with than image datasets is the fact that many popular video datasets are available only through YouTube. So instead of being able to download a zip containing everything you need, you instead need to run scripts like the one below to download individual videos from YouTube that may or may not have become unavailable since the dataset was curated.

In [None]:
from datetime import timedelta
import json
import os
import subprocess

import youtube_dl
from youtube_dl.utils import (DownloadError, ExtractorError)

def download_video(url, start, dur, output):
    output_tmp = os.path.join("/tmp",os.path.basename(output))
    try:
    # From https://stackoverflow.com/questions/57131049/is-it-possible-to-download-a-specific-part-of-a-file
        with youtube_dl.YoutubeDL({'format': 'best'}) as ydl:
            result = ydl.extract_info(url, download=False)
            video = result['entries'][0] if 'entries' in result else result
        
        url = video['url']
        if start < 5:
            offset = start
        else:
            offset = 5
        start -= offset
        offset_dur = dur + offset
        start_str = str(timedelta(seconds=start)) 
        dur_str = str(timedelta(seconds=offset_dur)) 

        cmd = ['ffmpeg', '-i', url, '-ss', start_str, '-t', dur_str, '-c:v',
                'copy', '-c:a', 'copy', output_tmp]
        subprocess.call(cmd)

        start_str_2 = str(timedelta(seconds=offset)) 
        dur_str_2 = str(timedelta(seconds=dur)) 

        cmd = ['ffmpeg', '-i', output_tmp, '-ss', start_str_2, '-t', dur_str_2, output]
        subprocess.call(cmd)
        return True
        
    except (DownloadError, ExtractorError) as e:
        print("Failed to download %s" % output)
        return False
        
with open("./kinetics400/test.json", "r") as f:
    test_data = json.load(f)

target_classes = [
 'springboard diving',
 'surfing water',
 'swimming backstroke',
 'swimming breast stroke',
 'swimming butterfly stroke',
]
data_dir = "./videos"
max_samples = 5
    
classes_count = {c:0 for c in target_classes}

for fn, data in test_data.items():
    label = data["annotations"]["label"]
    segment = data["annotations"]["segment"]
    url = data["url"]
    dur = data["duration"]
    if label in classes_count and classes_count[label] < max_samples:
        c_dir = os.path.join(data_dir, label)
        if not os.path.exists(c_dir):
            os.makedirs(c_dir)
        

        start = segment[0]
        output = os.path.join(c_dir, "%s_%s.mp4" % (label.replace(" ","_"), fn))
        
        results = True
        if not os.path.exists(output):
            result = download_video(url, start, dur, output)
        if result:
            classes_count[label] += 1

print("Finished downloading videos!")

[youtube] --coBvtS-eQ: Downloading webpage
[youtube] --coBvtS-eQ: Downloading MPD manifest
[youtube] -AJ3JIMaS18: Downloading webpage
[youtube] -JIvn5VWIKQ: Downloading webpage
[youtube] -JIvn5VWIKQ: Downloading MPD manifest
[youtube] -MzUbQLVWFI: Downloading webpage
[youtube] -Taqg91Q2gc: Downloading webpage


ERROR: Private video
Sign in if you've been granted access to this video


Failed to download ./videos/swimming butterfly stroke/swimming_butterfly_stroke_-Taqg91Q2gc.mp4
[youtube] -XLx2qBr3I4: Downloading webpage
[youtube] -XLx2qBr3I4: Downloading player 68cc98b3
[youtube] -ggniXt9sgA: Downloading webpage
[youtube] -wALxeb6hzo: Downloading webpage


ERROR: Private video
Sign in if you've been granted access to this video


Failed to download ./videos/surfing water/surfing_water_-wALxeb6hzo.mp4
[youtube] -z-ybS14u-8: Downloading webpage
[youtube] 05FLJw6nljs: Downloading webpage
[youtube] 078IIP8JcAs: Downloading webpage
[youtube] 078IIP8JcAs: Downloading MPD manifest
[youtube] 0F7D9hqf8Ng: Downloading webpage
[youtube] 0F7D9hqf8Ng: Downloading MPD manifest
[youtube] 0LKeVIpQzOQ: Downloading webpage
[youtube] 0LKeVIpQzOQ: Downloading MPD manifest
[youtube] 0Lc49M_qMqs: Downloading webpage
[youtube] 0MJucSWQ4WU: Downloading webpage
[youtube] 0MJucSWQ4WU: Downloading MPD manifest
[youtube] 0T_PWMC44Dg: Downloading webpage
[youtube] 0bTqP8WFpxo: Downloading webpage
[youtube] 0bTqP8WFpxo: Downloading MPD manifest
[youtube] 11ak7MOIROI: Downloading webpage
[youtube] 16DuCTaAS9o: Downloading webpage
[youtube] 16DuCTaAS9o: Downloading MPD manifest
[youtube] 1AD6ltFmiaI: Downloading webpage
[youtube] 1AD6ltFmiaI: Downloading MPD manifest
[youtube] 1fGhiEXT8pc: Downloading webpage
[youtube] 2dD7-peMjIY: Downloadin

**Loading and exploring video datasets**

For image datasets, there are some rudimentary options available for visualizing batches of data like pillow and OpenCV. There are very few options available for visualizing video datasets. FiftyOne is a new open-source library that provides simple and powerful visualization for both image and video datasets.

If your dataset follows a common format, like the COCO format for detections, then you can load it in a single line of code:

In [None]:
import fiftyone as fo

dataset = fo.Dataset(
    "/path/to/dataset_dir", 
    dataset_type=fo.types.COCODetectionDataset, 
    name="my_dataset"
)

Even if your dataset is in a custom format, it is still straightforward to load your dataset with FiftyOne. For example, if you are using an object detection video model, you can load your data as follows:

In [None]:
import random
import fiftyone as fo

num_frames = 5
num_objects_per_frame = 3
video_path = "/path/to/video.mp4"

# Create video sample
sample = fo.Sample(filepath=video_path)

# Add some frame labels
for frame_number in range(1, num_frames + 1):
    # Frame classification
    weather = random.choice(["sunny", "cloudy"])
    sample[frame_number]["weather"] = fo.Classification(label=weather)

    # Object detections
    detections = []
    for _ in range(num_objects_per_frame):
        label = random.choice(["cat", "dog", "bird", "rabbit"])

        # Bounding box coordinates are stored as relative numbers in [0, 1]
        # in the following format:
        # [top-left-x, top-left-y, width, height]
        bounding_box = [
            0.8 * random.random(),
            0.8 * random.random(),
            0.2,
            0.2,
        ]
        detections.append(fo.Detection(label=label, bounding_box=bounding_box))

    # Object detections
    sample[frame_number]["objects"] = fo.Detections(detections=detections)

# Create dataset
dataset = fo.Dataset(name="my-labeled-video-dataset")
dataset.add_sample(sample)

In this example, we will be following the PyTorchVision tutorial on running a video classification model. Generally, video classification datasets will be stored on disk in a directory tree whose subfolders define dataset classes. This format can be loaded in one line of code:

In [None]:
import fiftyone as fo

name = "kinetics-subset"
dataset_dir = "./videos"

# Create the dataset
dataset = fo.Dataset.from_dir(
    dataset_dir, fo.types.VideoClassificationDirectoryTree, name=name
)

# Launch the App and view the dataset
session = fo.launch_app(dataset)