# Model Inference-in-the-Loop with FiftyOne

This walkthrough provides a glimpse into the possibilities for integrating
FiftyOne into your machine learning workflows.

It covers the following concepts:

-   Loading your existing dataset in FiftyOne
-   Adding predictions from your model to your FiftyOne dataset
-   Launching the FiftyOne App and visualizing/exploring your data
-   Integrating the App into your data wrangling workflow

## Setup

Install `torch` and `torchvision`, if necessary:

In [1]:
# Modify as necessary (e.g., GPU install). See https://pytorch.org for options
!pip install torch
!pip install torchvision



Download the test split of the CIFAR-10 dataset to `~/fiftyone/cifar10/test`:

In [2]:
!fiftyone zoo download cifar10 --splits test

Download a pretrained CIFAR-10 PyTorch model:

In [3]:
# Download the software
!git clone https://github.com/huyvnphan/PyTorch_CIFAR10

# Download the pretrained model (90MB)
!eta gdrive download --public \
    1dGfpeFK_QG0kV-U6QDHMX2EOGXPqaNzu \
    PyTorch_CIFAR10/cifar10_models/state_dicts/resnet50.pt

Cloning into 'PyTorch_CIFAR10'...
remote: Enumerating objects: 551, done.[K
remote: Total 551 (delta 0), reused 0 (delta 0), pack-reused 551[K
Receiving objects: 100% (551/551), 6.54 MiB | 3.19 MiB/s, done.
Resolving deltas: 100% (182/182), done.
Downloading '1dGfpeFK_QG0kV-U6QDHMX2EOGXPqaNzu' to 'PyTorch_CIFAR10/cifar10_models/state_dicts/resnet50.pt'
 100% |████|  719.8Mb/719.8Mb [45.0s elapsed, 0s remaining, 11.6Mb/s]    


## Importing FiftyOne

In [4]:
import fiftyone as fo

## Loading an image classification dataset

Suppose you have an image classification dataset on disk in the following
format:

```
<dataset_dir>/
    data/
        <uuid1>.<ext>
        <uuid2>.<ext>
        ...
    labels.json
```

where `labels.json` is a JSON file in the following format:

```
{
    "classes": [
        <labelA>,
        <labelB>,
        ...
    ],
    "labels": {
        <uuid1>: <target1>,
        <uuid2>: <target2>,
        ...
    }
}
```

In your current workflow, you may parse this data into a list of
`(image_path, label)` tuples as follows:

In [5]:
import json
import os

# The location of the dataset on disk that you downloaded above
dataset_dir = os.path.expanduser("~/fiftyone/cifar10/test")

# Maps image UUIDs to image paths
images_dir = os.path.join(dataset_dir, "data")
image_uuids_to_paths = {
    os.path.splitext(n)[0]: os.path.join(images_dir, n)
    for n in os.listdir(images_dir)
}

labels_path = os.path.join(dataset_dir, "labels.json")
with open(labels_path, "rt") as f:
    _labels = json.load(f)

# Get classes
classes = _labels["classes"]

# Maps image UUIDs to int targets
labels = _labels["labels"]

# Make a list of (image_path, label) samples
samples = [(image_uuids_to_paths[u], classes[t]) for u, t in labels.items()]

# Print a few samples
print(samples[:5])

[('/home/voxel51/fiftyone/cifar10/test/data/00001.jpg', 'horse'), ('/home/voxel51/fiftyone/cifar10/test/data/00002.jpg', 'airplane'), ('/home/voxel51/fiftyone/cifar10/test/data/00003.jpg', 'frog'), ('/home/voxel51/fiftyone/cifar10/test/data/00004.jpg', 'truck'), ('/home/voxel51/fiftyone/cifar10/test/data/00005.jpg', 'dog')]


Building a FiftyOne dataset from your samples is simple:

In [6]:
dataset = fo.Dataset.from_image_classification_samples(
    samples, name="my-dataset"
)

# Print some information about the entire dataset
print(dataset)

 100% |█████████████████████████| 10000/10000 [2.2s elapsed, 0s remaining, 4.7K samples/s]      
Name:           my-dataset
Persistent:     False
Num samples:    10000
Tags:           []
Sample fields:
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)


In [7]:
# Print a few samples from the dataset
print(dataset.view().head())

<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5ef39418c79591cdceaeee82',
    'filepath': '/home/voxel51/fiftyone/cifar10/test/data/00001.jpg',
    'tags': BaseList([]),
    'ground_truth': <Classification: {'label': 'horse'}>,
}>
<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5ef39418c79591cdceaeee83',
    'filepath': '/home/voxel51/fiftyone/cifar10/test/data/00002.jpg',
    'tags': BaseList([]),
    'ground_truth': <Classification: {'label': 'airplane'}>,
}>
<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5ef39418c79591cdceaeee84',
    'filepath': '/home/voxel51/fiftyone/cifar10/test/data/00003.jpg',
    'tags': BaseList([]),
    'ground_truth': <Classification: {'label': 'frog'}>,
}>


## Working with views into your dataset

FiftyOne provides a powerful notion of _dataset views_ for you to access
subsets of the samples in your dataset.

Here's an example operation:

In [8]:
# Gets five random airplanes from the dataset
view = (dataset.view()
    .match(filter={"ground_truth.label": "airplane"})
    .take(5)
)

# Print some information about the view you created
print(view)

Dataset:        my-dataset
Num samples:    5
Tags:           []
Sample fields:
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)
Pipeline stages:
    1. <fiftyone.core.stages.Match object at 0x7f8fa48cb3c8>
    2. <fiftyone.core.stages.Take object at 0x7f8fa48cbc50>


In [9]:
# Print a few samples from the view
print(view.head())

<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5ef39418c79591cdceaef60a',
    'filepath': '/home/voxel51/fiftyone/cifar10/test/data/01929.jpg',
    'tags': BaseList([]),
    'ground_truth': <Classification: {'label': 'airplane'}>,
}>
<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5ef39419c79591cdceaf0da3',
    'filepath': '/home/voxel51/fiftyone/cifar10/test/data/07970.jpg',
    'tags': BaseList([]),
    'ground_truth': <Classification: {'label': 'airplane'}>,
}>
<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5ef39419c79591cdceaefe27',
    'filepath': '/home/voxel51/fiftyone/cifar10/test/data/04006.jpg',
    'tags': BaseList([]),
    'ground_truth': <Classification: {'label': 'airplane'}>,
}>


Iterating over the samples in a view is easy:

In [10]:
for sample in view:
    print(sample.filepath)

/home/voxel51/fiftyone/cifar10/test/data/01869.jpg
/home/voxel51/fiftyone/cifar10/test/data/07106.jpg
/home/voxel51/fiftyone/cifar10/test/data/09445.jpg
/home/voxel51/fiftyone/cifar10/test/data/02064.jpg
/home/voxel51/fiftyone/cifar10/test/data/06995.jpg


## Adding model predictions to your dataset

The following code demonstrates how to add predictions from a model to your
FiftyOne dataset, with minimal changes to your existing ML code:

In [11]:
import sys

import numpy as np
import torch
import torchvision
from torch.utils.data import DataLoader

import fiftyone.utils.torch as fout

sys.path.insert(1, "PyTorch_CIFAR10")
from cifar10_models import *


def make_cifar10_data_loader(image_paths, sample_ids, batch_size):
    mean = [0.4914, 0.4822, 0.4465]
    std = [0.2023, 0.1994, 0.2010]
    transforms = torchvision.transforms.Compose(
        [
            torchvision.transforms.ToTensor(),
            torchvision.transforms.Normalize(mean, std),
        ]
    )
    dataset = fout.TorchImageDataset(
        image_paths, sample_ids=sample_ids, transform=transforms
    )
    return DataLoader(dataset, batch_size=batch_size, num_workers=4)


def predict(model, imgs):
    logits = model(imgs).detach().cpu().numpy()
    predictions = np.argmax(logits, axis=1)
    odds = np.exp(logits)
    confidences = np.max(odds, axis=1) / np.sum(odds, axis=1)
    return predictions, confidences


#
# Load a model
#
# Model performance numbers are available at:
#   https://github.com/huyvnphan/PyTorch_CIFAR10
#

model = resnet50(pretrained=True)
model_name = "resnet50"

#
# Extract a few images to process
#

num_samples = 25
batch_size = 5
view = dataset.view().take(num_samples)
image_paths, sample_ids = zip(*[(s.filepath, s.id) for s in view])
data_loader = make_cifar10_data_loader(image_paths, sample_ids, batch_size)

#
# Perform prediction and store results in dataset
#

for imgs, sample_ids in data_loader:
    predictions, confidences = predict(model, imgs)

    # Add predictions to your FiftyOne dataset
    for sample_id, prediction, confidence in zip(
        sample_ids, predictions, confidences
    ):
        sample = dataset[sample_id]
        sample[model_name] = fo.Classification(
            label=classes[prediction],
            confidence=confidence,
        )
        sample.save()

#
# Get the last batch of samples for which we added predictions
#

view = dataset.view().select(sample_ids)
print(view.head(batch_size))

<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5ef39418c79591cdceaef213',
    'filepath': '/home/voxel51/fiftyone/cifar10/test/data/00914.jpg',
    'tags': BaseList([]),
    'ground_truth': <Classification: {'label': 'airplane'}>,
    'resnet50': <Classification: {'label': 'bird', 'confidence': 0.39588668942451477}>,
}>
<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5ef39418c79591cdceaef55c',
    'filepath': '/home/voxel51/fiftyone/cifar10/test/data/01755.jpg',
    'tags': BaseList([]),
    'ground_truth': <Classification: {'label': 'horse'}>,
    'resnet50': <Classification: {'label': 'horse', 'confidence': 0.798724353313446}>,
}>
<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5ef39418c79591cdceaefae7',
    'filepath': '/home/voxel51/fiftyone/cifar10/test/data/03174.jpg',
    'tags': BaseList([]),
    'ground_truth': <Classification: {'label': 'automobile'}>,
    'resnet50': <Classification: {'label': 'automobile', 'confidence': 0.8348385095596313}>,
}>
<Sam

In [12]:
#
# Get all samples for which we added predictions, in reverse order of
# confidence
#

pred_view = (dataset.view()
    .exists(model_name)
    .sort_by("%s.confidence" % model_name, reverse=True)
)
print(len(pred_view))
print(pred_view.head())

25
<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5ef39418c79591cdceaefae7',
    'filepath': '/home/voxel51/fiftyone/cifar10/test/data/03174.jpg',
    'tags': BaseList([]),
    'ground_truth': <Classification: {'label': 'automobile'}>,
    'resnet50': <Classification: {'label': 'automobile', 'confidence': 0.8348385095596313}>,
}>
<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5ef39419c79591cdceaf0c06',
    'filepath': '/home/voxel51/fiftyone/cifar10/test/data/07557.jpg',
    'tags': BaseList([]),
    'ground_truth': <Classification: {'label': 'automobile'}>,
    'resnet50': <Classification: {'label': 'automobile', 'confidence': 0.8322188258171082}>,
}>
<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5ef39418c79591cdceaef17c',
    'filepath': '/home/voxel51/fiftyone/cifar10/test/data/00763.jpg',
    'tags': BaseList([]),
    'ground_truth': <Classification: {'label': 'deer'}>,
    'resnet50': <Classification: {'label': 'deer', 'confidence': 0.8268129229545593}>

## Using the FiftyOne App

FiftyOne provides a powerful App that allows you easily visualize,
explore, search, filter, your datasets.

You can explore the App interactively through the GUI, and you can even
interact with it in real-time from your Python interpreter!

In [13]:
# Launch the FiftyOne App
session = fo.launch_app()

# Open your dataset in the App
session.dataset = dataset

App launched


![dataset](images/inference_1.png)

In [14]:
# Show five random samples in the App
view = dataset.view().limit(5)
session.view = view

![limit](images/inference_2.png)

In [15]:
# Show the samples for which we previously added pre
session.view = pred_view

![pred-view](images/inference_3.png)

In [16]:
# Show the full dataset again
session.view = None

![selected](images/inference_4.png)

In [17]:
# Print details about the selected samples
selected_view = dataset.view().select(session.selected)
print(selected_view)
print(selected_view.head())

Dataset:        my-dataset
Num samples:    4
Tags:           []
Sample fields:
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)
    resnet50:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)
Pipeline stages:
    1. <fiftyone.core.stages.Select object at 0x7f8fc0d92e10>
<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5ef39418c79591cdceaeee82',
    'filepath': '/home/voxel51/fiftyone/cifar10/test/data/00001.jpg',
    'tags': BaseList([]),
    'ground_truth': <Classification: {'label': 'horse'}>,
}>
<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5ef39418c79591cdceaeee84',
    'filepath': '/home/voxel51/fiftyone/cifar10/test/data/00003.jpg',
    'tags': BaseList([]),