<a href="https://colab.research.google.com/github/alexthaman/evs2023/blob/main/Cocostats.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Coco Enhanced Dataset Statistics sample
This notebook contains a demonstration of data analysis on the Coco 2017 dataset.  In this sample, the Coco 2017 val dataset is augmented with additional metadata from the paper ["Understanding and Evaluating Racial Biases in Image Captioning"](https://arxiv.org/abs/2106.08503).

## Prerequisites
To run this notebook, you will need to first download the additional metadata manually to Google Drive (recommended) or another location of your choice.  The notebook is configured to connect directly to a Google Drive that contains the downloaded dataset, but with some small code modifications you will be able to download from a location of your choice.  The supplemental metadata can be requested from [the project website](https://princetonvisualai.github.io/imagecaptioning-bias/).  In the sample below, the `instances_2014all.csv` file from the metadata is expected to be placed at the root of your Google Drive.

In [1]:
# Mount a Google Drive to this Colab instance
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [2]:
# Download the Coco 2017 val dataset
!mkdir val2017
!curl -O http://images.cocodataset.org/zips/val2017.zip
!curl -O http://images.cocodataset.org/annotations/annotations_trainval2017.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  777M  100  777M    0     0  10.6M      0  0:01:12  0:01:12 --:--:-- 13.3M
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  241M  100  241M    0     0  11.9M      0  0:00:20  0:00:20 --:--:-- 13.2M


In [3]:
# Unzip imanges and annotations
!unzip -q val2017.zip -d val2017
!unzip -q annotations_trainval2017.zip -d annotations
!mv val2017/val2017 val2017/data

In [4]:
# Install packages that will be used for this sample
!pip install fiftyone
!pip install torch torchvision umap-learn

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting fiftyone
  Downloading fiftyone-0.20.1-py3-none-any.whl (7.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.3/7.3 MB[0m [31m97.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting aiofiles (from fiftyone)
  Downloading aiofiles-23.1.0-py3-none-any.whl (14 kB)
Collecting argcomplete (from fiftyone)
  Downloading argcomplete-3.0.8-py3-none-any.whl (40 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.0/40.0 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting boto3 (from fiftyone)
  Downloading boto3-1.26.133-py3-none-any.whl (135 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m135.6/135.6 kB[0m [31m17.5 MB/s[0m eta [36m0:00:00[0m
Collecting dacite<1.8.0,>=1.6.0 (from fiftyone)
  Downloading dacite-1.7.0-py3-none-any.whl (12 kB)
Collecting Deprecated (from fiftyone)
  Downloading Deprecated-1.2.13-py2.py3-

In [5]:
# Augment annotations file with supplemental metadata
import csv
import os
import shutil
import json

with open(f'./annotations/annotations/instances_val2017.json') as f:
    js = json.load(f)

# DELETE THIS ONCE TESTED
#!cp '/content/gdrive/My Drive/'$coco_extra_metadata_file $coco_extra_metadata_file
coco_extra_metadata_file = 'instances_2014all.csv'
shutil.copyfile(f'/content/gdrive/My Drive/{coco_extra_metadata_file}', coco_extra_metadata_file)
if not os.path.isfile(coco_extra_metadata_file):
  print(f'Unable to find {coco_extra_metadata_file}.  Please download this file locally to continue')
  exit(1)
print(f'Copied {coco_extra_metadata_file}')

# Load supplemental metadata into a dictionary
data = {}
with open('instances_2014all.csv', 'r') as csvfile:
    csvreader = csv.DictReader(csvfile)
    for row in csvreader:
        data[row['annId']] = row

# Join labels and extra metadata on annotation ID
for row in js['annotations']:
  id = row['id']
  if str(id) in data:
    row.update(data[str(id)])

# Write the labels file back with the updated metadata
with open('./val2017/labels.json', 'w') as f:
    json.dump(js, f, indent=2)

Copied instances_2014all.csv


# Data visualization and exploration
For this sample we use the open source tool FiftyOne to visualize the dataset.  FiftyOne provides an interactive visualizer for computer vision datasets.  We will work exclusively with images that contain objects of category "person".

## Embedding Analysis
Included below are two examples of embedding analysis on the images.  Embedding analysis is useful to evaluate image similarity, and can be used to discover categories of problematic predictions.

## Metadata Statistics
Also included below are a number of example charts showing the kinds of statistics that can be performed on the dataset with the supplementary metadata.  Data analysis is an exploratory activity and issues of imbalance or unintended correlation are not necessarily problematic, but this kind of information can facilitate deeper investigation into the data and additional experiments to conduct.

In [6]:
# Load Coco dataset into FiftyOne and display only the subset of data with objects of category "person".
# From this point forward we will only work with this portion of the Coco dataset.
import fiftyone as fo
from fiftyone import ViewField as F
import fiftyone.brain as fob

dataset_name = "coco-2017-val"
dataset_dir = "./val2017"
dataset_type = fo.types.COCODetectionDataset  # for example
if not dataset_name in fo.list_datasets():
    dataset = fo.Dataset.from_dir(
        dataset_dir=dataset_dir,
        dataset_type=dataset_type,
        name=dataset_name,
    )
else:
    dataset = fo.load_dataset(dataset_name)

personimages = dataset.filter_labels("detections", F("label").is_in(["person"])) \
                                     .filter_labels("detections", F("iscrowd") == 0) \
                                     .exclude_fields("segmentations")

counts = personimages.count_values("detections.detections.gender")
print(counts)

session = fo.launch_app(personimages)

Migrating database to v0.20.1


INFO:fiftyone.migrations.runner:Migrating database to v0.20.1


 100% |███████████████| 5000/5000 [1.9m elapsed, 0s remaining, 45.7 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 5000/5000 [1.9m elapsed, 0s remaining, 45.7 samples/s]      


{'Female': 867, '': 163, 'Male': 1678, 'Unsure': 818, None: 7251}


In [None]:
# DELETE THIS

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting umap-learn
  Downloading umap-learn-0.5.3.tar.gz (88 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m88.2/88.2 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pynndescent>=0.5
  Downloading pynndescent-0.5.10.tar.gz (1.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m24.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: umap-learn, pynndescent
  Building wheel for umap-learn (setup.py) ... [?25l[?25hdone
  Created wheel for umap-learn: filename=umap_learn-0.5.3-py3-none-any.whl size=82830 sha256=f9a846d5c56b3f4008f29a6adb20cbf73cfc3f383c5bf97f5d6d4ccc334f2d63
  Stored in directory: /root/.cache/pip/wheels/f4/3e/1c/596d0a463d17475af648688443fa4846fef624d1390339e7e9
  Buil

In [7]:
# Compute 2D UMAP on the images using default mobilenet_v2 backbone
import cv2
import numpy as np
import fiftyone.brain as fob

# Compute 2D representation
results = fob.compute_visualization(
    personimages,
    num_dims=2,
    method="umap",
    brain_key="coco_person_crop",
    patches_field='detections',
    seed=51,
)

session = fo.launch_app(personimages)

Downloading model from 'https://download.pytorch.org/models/mobilenet_v2-b0353104.pth'...


INFO:fiftyone.core.models:Downloading model from 'https://download.pytorch.org/models/mobilenet_v2-b0353104.pth'...


 100% |████|  108.4Mb/108.4Mb [177.7ms elapsed, 0s remaining, 610.3Mb/s]     


INFO:eta.core.utils: 100% |████|  108.4Mb/108.4Mb [177.7ms elapsed, 0s remaining, 610.3Mb/s]     
Downloading: "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth" to /root/.cache/torch/hub/checkpoints/mobilenet_v2-b0353104.pth
100%|██████████| 13.6M/13.6M [00:00<00:00, 94.7MB/s]


Computing patch embeddings...


INFO:fiftyone.brain.internal.core.utils:Computing patch embeddings...


 100% |███████████████| 2693/2693 [3.2m elapsed, 0s remaining, 29.7 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 2693/2693 [3.2m elapsed, 0s remaining, 29.7 samples/s]      


Generating visualization...


INFO:fiftyone.brain.internal.core.visualization:Generating visualization...


UMAP(random_state=51, verbose=True)
Fri May 12 18:36:11 2023 Construct fuzzy simplicial set
Fri May 12 18:36:11 2023 Finding Nearest Neighbors
Fri May 12 18:36:11 2023 Building RP forest with 10 trees
Fri May 12 18:36:17 2023 NN descent for 13 iterations
	 1  /  13
	 2  /  13
	 3  /  13
	 4  /  13
	 5  /  13
	Stopping threshold met -- exiting after 5 iterations
Fri May 12 18:36:35 2023 Finished Nearest Neighbor Search
Fri May 12 18:36:38 2023 Construct embedding


Epochs completed:   0%|            0/200 [00:00]

Fri May 12 18:36:49 2023 Finished embedding


In [8]:
# Create a UMAP of cropped images of few object categories of crops above a minimum size
# threshold (bounding box is > 5% of image) using CLIP embeddings.  In this view we can see
# that crops that share common objects are closer in CLIP space, even when we only use
# the cropped bounding box to measure.

import fiftyone.zoo as foz

# Bboxes are in [top-left-x, top-left-y, width, height] format
bbox_area = F("bounding_box")[2] * F("bounding_box")[3]

# Only contains boxes whose area is between 5% and 50% of the image
mysample = dataset.filter_labels(
    "detections", (0.05 <= bbox_area)
)

mysample = mysample.filter_labels("detections", F("label").is_in(["person", "dog", "cat", "horse", "cow"])) \
                                     .filter_labels("detections", F("iscrowd") != 1) \
                                     .exclude_fields("segmentations")
view_pch = mysample.to_patches('detections', other_fields=['skin', 'gender'])
view_pch.compute_patch_embeddings(foz.load_zoo_model("clip-vit-base32-torch"), 'detections', embeddings_field='clip')
fob.compute_visualization(mysample, patches_field='detections', embeddings='clip', \
                          brain_key='mn_clip_umap3', num_dims=2, method='umap')

session = fo.launch_app(view_pch)

Downloading model from 'https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt'...


INFO:fiftyone.core.models:Downloading model from 'https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt'...


 100% |██████|    2.6Gb/2.6Gb [6.8s elapsed, 0s remaining, 401.1Mb/s]      


INFO:eta.core.utils: 100% |██████|    2.6Gb/2.6Gb [6.8s elapsed, 0s remaining, 401.1Mb/s]      


Downloading CLIP tokenizer...


INFO:fiftyone.utils.clip.zoo:Downloading CLIP tokenizer...


 100% |█████|   10.4Mb/10.4Mb [20.2ms elapsed, 0s remaining, 511.2Mb/s]    


INFO:eta.core.utils: 100% |█████|   10.4Mb/10.4Mb [20.2ms elapsed, 0s remaining, 511.2Mb/s]    


 100% |███████████████| 3803/3803 [1.9m elapsed, 0s remaining, 45.8 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 3803/3803 [1.9m elapsed, 0s remaining, 45.8 samples/s]      


Generating visualization...


INFO:fiftyone.brain.internal.core.visualization:Generating visualization...


UMAP( verbose=True)
Fri May 12 18:40:48 2023 Construct fuzzy simplicial set
Fri May 12 18:41:01 2023 Finding Nearest Neighbors
Fri May 12 18:41:04 2023 Finished Nearest Neighbor Search
Fri May 12 18:41:04 2023 Construct embedding


Epochs completed:   0%|            0/500 [00:00]

Fri May 12 18:41:11 2023 Finished embedding


In [9]:
# Load supplementary metadata into a dataframe for further analysis.
import pandas as pd

all = [item for sublist in personimages.values("detections.detections") for item in sublist if 'skin' in item]
df = pd.DataFrame([(obj['skin'], obj['gender']) for obj in all], columns=['skin', 'gender'])
df = df.replace('', '<None>')
df

Unnamed: 0,skin,gender
0,1,Female
1,2,Male
2,2,Male
3,<None>,Male
4,2,Unsure
...,...,...
3521,<None>,Male
3522,1,Female
3523,1,Unsure
3524,2,Female


In [10]:
# Create a 2D histogram with marginal distributions of skin compared with gender
import plotly.express as px

order = {
    'skin': ['1', '2', '3', '4', '5', '6', 'Unsure', '<None>'],
    'gender': ['Male', 'Female', 'Unsure', '<None>'],
}
fig = px.density_heatmap(df, x='skin', y='gender', width=1000, height=600, category_orders=order, title='Skin / Gender distribution in Coco Val 2017')

fig.update_yaxes(title = 'Gender', title_standoff=20, automargin=True, title_font_size=26, tickwidth=40)
fig.update_xaxes(title = 'Skin', automargin=True, title_font_size=26, tickwidth=40)
fig.update_layout(
    font_size=20,
    margin=dict(l=0, r=50, t=50, b=20)
)

fig.show()

In [12]:
# Compute per-image brightness and augment the dataset in FiftyOne with this computed metric
import cv2
import fiftyone.core.odm as odm
from tqdm import tqdm
import concurrent.futures

dataset.add_sample_field(
    "brightness",
    fo.FloatField,
)

def process_sample(sample):
    img = cv2.imread(sample.filepath)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    brightness = cv2.mean(gray)[0]
    sample.brightness = brightness
    sample.save()

# Multithreaded calculation of brightness
with concurrent.futures.ThreadPoolExecutor() as executor:
    futures = []
    for sample in tqdm(personimages):
        futures.append(executor.submit(process_sample, sample))
    for future in tqdm(concurrent.futures.as_completed(futures), total=len(futures)):
        pass

session = fo.launch_app(personimages)

100%|██████████| 2693/2693 [00:36<00:00, 74.58it/s] 
100%|██████████| 2693/2693 [00:00<00:00, 135897.55it/s]


In [13]:
# Build a dataframe of all person detections with gender compared with overall image brightness

mylist = []
compare_variable = 'gender'
for image in personimages:
  br = image.brightness
  for det in image.detections.detections:
    if compare_variable in det:
      mylist.append((br, det[compare_variable]))
df = pd.DataFrame(mylist, columns=['brightness', compare_variable])
df

Unnamed: 0,brightness,gender
0,196.192713,Female
1,121.900274,Male
2,121.900274,Male
3,79.382575,Male
4,79.382575,Unsure
...,...,...
3521,50.396309,Male
3522,148.317618,Female
3523,110.586665,Unsure
3524,163.739274,Female


In [14]:
gender_counts = df.groupby(compare_variable).size()

# Calculate the count of each categorical value
compare_variable_counts = df.groupby(compare_variable).size()
compare_variable_column = f'{compare_variable}_normalized'

# Normalize by divide 1 by the count for each row and store the result in a new column
df[compare_variable_column] = 1 / df.groupby(compare_variable)[compare_variable].transform('count')

df

Unnamed: 0,brightness,gender,gender_normalized
0,196.192713,Female,0.001153
1,121.900274,Male,0.000596
2,121.900274,Male,0.000596
3,79.382575,Male,0.000596
4,79.382575,Unsure,0.001222
...,...,...,...
3521,50.396309,Male,0.000596
3522,148.317618,Female,0.001153
3523,110.586665,Unsure,0.001222
3524,163.739274,Female,0.001153


In [15]:
# Plot a histogram showing a normalized count for each brightness bin for each gender
# TODO:  Fix column name
# TODO:  Title:  Brightness frequency by gender, normalized
#fig = px.histogram(df, x="brightness", y=compare_variable_column, color=compare_variable, labels={compare_variable_column: compare_variable}, histfunc='sum', barmode="group", nbins=20)
fig = px.histogram(df, x="brightness", color='gender', labels={''}, histfunc='count', histnorm='percent', barmode="group", nbins=20)
fig.update_layout(bargap=0.2, bargroupgap=0.1)
fig.layout.yaxis.title.text = 'percent of total'
fig.update_yaxes(title='% of Category Total', title_standoff=20, automargin=True, title_font_size=26, tickwidth=40)
fig.update_xaxes(title='Brightness (mean grayscale value)', automargin=True, title_font_size=26, tickwidth=40)
fig.update_layout(
    title='Image Brightness per Gender in Coco Val 2017',
    font_size=20,
    margin=dict(l=0, r=50, t=50, b=20),
    width=1200
)
fig.show()

# Inference Metrics
The next few sections of the notebook show how to use supplementary metadata along with inference results to better predict model behavior in the real world.  We use YOLOv8 Nano for this sample.  While this model is smaller and less accurate than the larger YOLO models, it is much faster.

An experiment that I leave to the reader is to run this same analysis below on varying model sizes and architectures.

In [16]:
!pip install fiftyone ultralytics

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ultralytics
  Downloading ultralytics-8.0.99-py3-none-any.whl (584 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m584.5/584.5 kB[0m [31m34.6 MB/s[0m eta [36m0:00:00[0m
Collecting thop>=0.1.1 (from ultralytics)
  Downloading thop-0.1.1.post2209072238-py3-none-any.whl (15 kB)
Collecting sentry-sdk (from ultralytics)
  Downloading sentry_sdk-1.22.2-py2.py3-none-any.whl (203 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m203.3/203.3 kB[0m [31m24.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: sentry-sdk, thop, ultralytics
Successfully installed sentry-sdk-1.22.2 thop-0.1.1.post2209072238 ultralytics-8.0.99


In [17]:
# Add inference results to dataset
from ultralytics import YOLO

detection_model = YOLO("yolov8n.pt")

coco_classes = [c for c in dataset.default_classes if not c.isnumeric()]

# Creates an on-disk dataset based on a FiftyOne SampleView
def export_yolo_data(
    samples,
    export_dir,
    classes,
    label_field = "detections",
    split = None
    ):

    if type(split) == list:
        splits = split
        for split in splits:
            export_yolo_data(
                samples,
                export_dir,
                classes,
                label_field,
                split
            )
    else:
        if split is None:
            split_view = samples
            split = "val"
        else:
            split_view = samples.match_tags(split)

        split_view.export(
            export_dir=export_dir,
            dataset_type=fo.types.YOLOv5Dataset,
            label_field=label_field,
            classes=classes,
            split=split
        )

coco_val_dir = "yolo_export"
export_yolo_data(personimages, coco_val_dir, coco_classes)

Downloading https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt to yolov8n.pt...
100%|██████████| 6.23M/6.23M [00:00<00:00, 25.5MB/s]


 100% |███████████████| 2693/2693 [21.6s elapsed, 0s remaining, 146.5 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 2693/2693 [21.6s elapsed, 0s remaining, 146.5 samples/s]      


In [18]:
# Generate predictions for the exported dataset
# TODO: Move to tdqm/streaming, optional GPU, and ideally multithread
# TODO: should not need to load from disk, can use the in-memory streaming data to push to FiftyOne
model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)
predictions = model.predict(source=f'{coco_val_dir}/images/val', save_txt=True, save_conf=True, stream=True)
for prediction in predictions:
  prediction.boxes


image 1/2693 /content/yolo_export/images/val/000000000139.jpg: 448x640 1 person, 5 chairs, 1 potted plant, 2 dining tables, 1 tv, 1 refrigerator, 1 clock, 1 vase, 128.1ms
image 2/2693 /content/yolo_export/images/val/000000000785.jpg: 448x640 1 person, 1 skis, 15.0ms
image 3/2693 /content/yolo_export/images/val/000000000872.jpg: 640x640 2 persons, 11.6ms
image 4/2693 /content/yolo_export/images/val/000000000885.jpg: 448x640 4 persons, 1 tennis racket, 12.0ms
image 5/2693 /content/yolo_export/images/val/000000001000.jpg: 480x640 13 persons, 2 tennis rackets, 104.2ms
image 6/2693 /content/yolo_export/images/val/000000001268.jpg: 448x640 4 persons, 2 boats, 1 bird, 1 backpack, 2 handbags, 8.8ms
image 7/2693 /content/yolo_export/images/val/000000001296.jpg: 640x448 3 persons, 97.8ms
image 8/2693 /content/yolo_export/images/val/000000001353.jpg: 640x480 5 persons, 1 suitcase, 1 chair, 98.4ms
image 9/2693 /content/yolo_export/images/val/000000001490.jpg: 320x640 1 person, 1 surfboard, 118.5m

In [19]:
# This cell contains a number of helper methods to load the inference results into FiftyOne
import numpy as np

def read_yolo_detections_file(filepath):
    detections = []
    if not os.path.exists(filepath):
        return np.array([])

    with open(filepath) as f:
        lines = [line.rstrip('\n').split(' ') for line in f]

    for line in lines:
        detection = [float(l) for l in line]
        detections.append(detection)
    return np.array(detections)

def _uncenter_boxes(boxes):
    '''convert from center coords to corner coords'''
    boxes[:, 0] -= boxes[:, 2]/2.
    boxes[:, 1] -= boxes[:, 3]/2.

def _get_class_labels(predicted_classes, class_list):
    labels = (predicted_classes).astype(int)
    labels = [class_list[l] if l < len(class_list) else '' for l in labels]
    return labels

def convert_yolo_detections_to_fiftyone(
    yolo_detections,
    class_list
    ):

    detections = []
    if yolo_detections.size == 0:
        return fo.Detections(detections=detections)

    boxes = yolo_detections[:, 1:-1]
    _uncenter_boxes(boxes)

    confs = yolo_detections[:, -1]
    labels = _get_class_labels(yolo_detections[:, 0], class_list)

    for label, conf, box in zip(labels, confs, boxes):
      if label == 'person':
        detections.append(
            fo.Detection(
                label=label,
                bounding_box=box.tolist(),
                confidence=conf
            )
        )

    return fo.Detections(detections=detections)

def get_prediction_filepath(filepath, run_number = 1):
    run_num_string = ""
    if run_number != 1:
        run_num_string = str(run_number)
    filename = filepath.split("/")[-1].split(".")[0]
    return f"runs/detect/predict{run_num_string}/labels/{filename}.txt"

def add_yolo_detections(
    samples,
    prediction_field,
    prediction_filepath,
    class_list
    ):

    prediction_filepaths = samples.values(prediction_filepath)
    yolo_detections = [read_yolo_detections_file(pf) for pf in prediction_filepaths]
    detections =  [convert_yolo_detections_to_fiftyone(yd, class_list) for yd in yolo_detections]
    samples.set_values(prediction_field, detections)


In [20]:
# View results with predictions in the FiftyOne app
filepaths = dataset.values("filepath")
prediction_filepaths = [get_prediction_filepath(fp) for fp in filepaths]
dataset.set_values(
    "yolov8n_det_filepath",
    prediction_filepaths
)

add_yolo_detections(
    personimages,
    "yolov8n",
    "yolov8n_det_filepath",
    coco_classes
)

fo.launch_app(personimages)

Dataset:          coco-2017-val
Media type:       image
Num samples:      2693
Selected samples: 0
Selected labels:  0
Session type:     colab
View stages:
    1. FilterLabels(field='detections', filter={'$in': ['$$this.label', [...]]}, only_matches=True, trajectories=False)
    2. FilterLabels(field='detections', filter={'$eq': ['$$this.iscrowd', 0]}, only_matches=True, trajectories=False)
    3. ExcludeFields(field_names=['segmentations'])

In [23]:
# Slice personimages into views by gender and compute detections per slice
gender_acc_df = pd.DataFrame(columns=['recall'])
for gender in list(personimages.count_values("detections.detections.gender")):
    if gender == '':
        continue
    subset = personimages.filter_labels("detections", F("gender") == gender)
    print(f'Evaluating detections for {gender}')
    results = subset.evaluate_detections(
        "yolov8n",
        gt_field="detections",
        eval_key="eval",
        compute_mAP=False)
    gender_acc_df.loc[gender] = results.metrics()['recall']

Evaluating detections...


INFO:fiftyone.utils.eval.detection:Evaluating detections...


 100% |███████████████| 1541/1541 [46.8s elapsed, 0s remaining, 54.3 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 1541/1541 [46.8s elapsed, 0s remaining, 54.3 samples/s]      


Evaluating detections...


INFO:fiftyone.utils.eval.detection:Evaluating detections...


 100% |█████████████████| 623/623 [9.5s elapsed, 0s remaining, 92.1 samples/s]       


INFO:eta.core.utils: 100% |█████████████████| 623/623 [9.5s elapsed, 0s remaining, 92.1 samples/s]       


Evaluating detections...


INFO:fiftyone.utils.eval.detection:Evaluating detections...


 100% |███████████████| 1141/1141 [18.1s elapsed, 0s remaining, 86.8 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 1141/1141 [18.1s elapsed, 0s remaining, 86.8 samples/s]      


Evaluating detections...


INFO:fiftyone.utils.eval.detection:Evaluating detections...


 100% |█████████████████| 639/639 [9.3s elapsed, 0s remaining, 51.1 samples/s]       


INFO:eta.core.utils: 100% |█████████████████| 639/639 [9.3s elapsed, 0s remaining, 51.1 samples/s]       


In [24]:
# Show chart of recall per gender
fig = px.bar(gender_acc_df,
             x=gender_acc_df.index.astype(str),
             y='recall',
             width=1000,
             range_y=[0.5, 1],
             #text='recall',
             text_auto='.2f',
             labels={'x': 'Gender', 'recall': 'Recall'},
             title='Recall by gender',
             category_orders={'x': ['Male', 'Female', 'Unsure', 'None']})
fig.update_layout(font_size=20)
fig.show()

# Synthetic Data
Synthetic Data is a powerful tool for computer vision.  It can be used to pretrain models for better generalization, fill data gaps for rare or hard to capture cases, and study model behaviors.

The example below uses a dataset generated by the Unity engine, along with the Unity Perception Package and Unity Synthetic Humans package to create a synthetic dataset with labels and run inference on this data.  An experiment is created where there is only one clipped person in the image and the onloy variable that changes is the skin tone.  We can then capture the ground truth skin tone in the metadata, similar to what was done by human labelers in the example above, and show inference results compared to this single variable.

The procedure shown is a controlled experiment that is only possible with synthetic data due to the difficulty in controlling single descriptive variables in the real world.  The typical procedure for studying effects with real world data is via ablation studies.

The original Unity project that generated this dataset is also available in the source repository.

In [25]:
!pip install pysolotools

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pysolotools
  Downloading pysolotools-0.3.17-py3-none-any.whl (30 kB)
Collecting requests-toolbelt>=0.9.1 (from pysolotools)
  Downloading requests_toolbelt-1.0.0-py2.py3-none-any.whl (54 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.5/54.5 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
Collecting responses>=0.20.0 (from pysolotools)
  Downloading responses-0.23.1-py3-none-any.whl (52 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m52.1/52.1 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting ratelimit~=2.2.1 (from pysolotools)
  Downloading ratelimit-2.2.1.tar.gz (5.3 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pytest-cov>=3.0.0 (from pysolotools)
  Downloading pytest_cov-4.0.0-py3-none-any.whl (21 kB)
Collecting dataclasses-json~=0.5.7 (from pysolotools)
  Downloading dataclasses_json-0.5.7-py3-non

In [34]:
# Unzip the generated synthetic dataset
# TODO:  Check in UnitySynthFull.zip
unity_synth = "UnitySynth"
coco_synth = "CocoSynth"

!cp '/content/gdrive/My Drive/UnitySynthFull.zip' UnitySynthFull.zip
# This will work when the repo is public
#!curl -O https://github.com/alexthaman/evs2023/releases/download/dataset_v1/UnitySynthFull.zip
print(f'Copied UnitySynthFull.zip')

!unzip -q UnitySynthFull.zip
!mv UnitySynthFull {unity_synth}

Copied UnitySynthFull.zip


In [35]:
# Convert the Unity SOLO format to Coco format
!solo2coco {unity_synth} {coco_synth}

In [36]:
# Augment Coco data with extra metadata
import csv
import os
import glob
import json
import re

# Read extra skin tone data from Unity synth
skin_tones = {}
for file in glob.glob(f'{unity_synth}/sequence.*/step0.frame_data.json'):
  match = re.search(f'{re.escape(unity_synth)}/sequence\.(.*)/step0\.frame_data\.json', file)
  index = match.group(1)
  with open(file, 'r') as f:
    js = json.load(f)
  metadata = [i for i in js['metrics'] if i['description'] == 'Metadata labeler']
  f.close()
  skin_tone = metadata[0]['values'][0]['instances'][0]['Person data']['Skin tone']
  skin_tones[index] = skin_tone
print(skin_tones)

# Augment annotation metadata with extra information
import json
with open(f'{coco_synth}/coco/bbox.json') as f:
    js = json.load(f)
    for ann in js['annotations']:
        ann.update({'skin': skin_tones[f'{ann["image_id"]}']})
print(js['annotations'])

# Write the labels file back with the updated metadata
f = open(f'{coco_synth}/coco/bbox2.json', 'w')
json.dump(js, f, indent=2)
f.close()

{'106': 0.6621692, '179': 1.72474885, '158': 0.4645226, '108': -2.42970324, '34': 0.0309360027, '186': -8.457423, '146': -7.95697, '101': 2.8042028, '178': 0.289458036, '73': 6.262553, '27': 2.48165536, '133': 2.11224437, '149': 2.4306376, '69': 6.060336, '50': 4.196192, '134': -4.90179253, '68': -5.93427324, '29': -5.852691, '187': 1.48282433, '42': -0.4372232, '99': -9.22485, '127': -4.79540062, '111': 7.012754, '115': -6.397379, '172': -1.55508208, '196': -1.55677938, '47': 2.83169, '168': -9.896701, '63': 5.963952, '17': 6.850128, '190': 0.5501559, '121': 6.316472, '83': -7.57226, '43': -7.81850147, '45': -9.026798, '163': -8.42891, '3': -3.66666937, '153': -4.69803429, '103': -6.41292524, '129': -1.78371787, '54': -1.11029029, '166': -8.15327549, '86': -7.553609, '139': 7.67909431, '151': 5.99931431, '24': -3.59881163, '79': 5.103058, '124': 1.90857244, '10': 5.18042755, '150': -4.195324, '48': 4.5961504, '199': 1.60135651, '39': 4.630787, '113': -9.836494, '131': -5.7503767, '138

In [37]:
# Load the CocoSynth dataset into FiftyOne
dataset_name = "synth"
dataset_dir = "./CocoSynth/coco"
dataset_type = fo.types.COCODetectionDataset  # for example
if not dataset_name in fo.list_datasets():
    dataset = fo.Dataset.from_dir(
        dataset_dir=dataset_dir,
        dataset_type=dataset_type,
        name=dataset_name,
        data_path='images',
        labels_path='bbox2.json'
    )
else:
    dataset = fo.load_dataset(dataset_name)

dataset = dataset.exclude_fields('keypoints')

session = fo.launch_app(dataset.to_patches('detections'))

 100% |█████████████████| 200/200 [2.1s elapsed, 0s remaining, 113.1 samples/s]     


INFO:eta.core.utils: 100% |█████████████████| 200/200 [2.1s elapsed, 0s remaining, 113.1 samples/s]     


In [38]:
# Export the dataset from FiftyOne in YOLO format
detection_model = YOLO("yolov8n.pt")

coco_classes = [c for c in dataset.default_classes if not c.isnumeric()]

coco_val_dir = "yolo_export_synth"
export_yolo_data(dataset, coco_val_dir, coco_classes)

 100% |█████████████████| 200/200 [434.4ms elapsed, 0s remaining, 460.4 samples/s]      


INFO:eta.core.utils: 100% |█████████████████| 200/200 [434.4ms elapsed, 0s remaining, 460.4 samples/s]      


In [44]:
# Run inference on the synthetic dataset
import tqdm

# Clean any previous inference runs
!rm -rf runs

model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)
predictions = model.predict(source=f'{coco_val_dir}/images/val', save_txt=True, save_conf=True, stream=True)
for prediction in predictions:
  prediction.boxes


image 1/200 /content/yolo_export_synth/images/val/camera_0.png: 480x640 1 bench, 1 dog, 9.3ms
image 2/200 /content/yolo_export_synth/images/val/camera_1.png: 480x640 1 bench, 8.1ms
image 3/200 /content/yolo_export_synth/images/val/camera_10.png: 480x640 1 bench, 1 dog, 7.8ms
image 4/200 /content/yolo_export_synth/images/val/camera_100.png: 480x640 1 bench, 1 dog, 8.3ms
image 5/200 /content/yolo_export_synth/images/val/camera_101.png: 480x640 1 bench, 8.4ms
image 6/200 /content/yolo_export_synth/images/val/camera_102.png: 480x640 1 bench, 8.3ms
image 7/200 /content/yolo_export_synth/images/val/camera_103.png: 480x640 1 person, 1 bench, 7.9ms
image 8/200 /content/yolo_export_synth/images/val/camera_104.png: 480x640 1 bench, 8.4ms
image 9/200 /content/yolo_export_synth/images/val/camera_105.png: 480x640 1 person, 1 bench, 9.7ms
image 10/200 /content/yolo_export_synth/images/val/camera_106.png: 480x640 1 bench, 8.1ms
image 11/200 /content/yolo_export_synth/images/val/camera_107.png: 480x6

In [45]:
# Load the predictions on the synthetic dataset back into FiftyOne
filepaths = dataset.values("filepath")
prediction_filepaths = [get_prediction_filepath(fp) for fp in filepaths]
dataset.set_values(
    "yolov8n_det_filepath",
    prediction_filepaths,
)

add_yolo_detections(
    dataset,
    "yolov8n",
    "yolov8n_det_filepath",
    coco_classes
)

session = fo.launch_app(dataset)

In [None]:
# Optional cleanup cell, uncomment as needed
#!rm -rf runs
#!rm -rf UnitySynth
#!rm -rf yolo_export_synth
#!rm -rf CocoSynth

In [46]:
# Slice dataset into views by tone and compute detections per slice
import pandas as pd
from fiftyone import ViewField as F

# calculate the interval size
interval_count = 6
start = -10
end = 8
interval_size = (end - start) / interval_count
skin_acc_df = pd.DataFrame(columns=['count', 'precision', 'recall'])

# loop through the range and slice it into 6 even intervals
for i in range(interval_count):
  s = start + i * interval_size
  e = s + interval_size

  sub = dataset.match(F("detections.detections").filter((F('skin') >= s) & (F('skin') < e)).length() > 0)
  results = sub.evaluate_detections(
      "yolov8n",
      gt_field="detections",
      eval_key="eval",
      compute_mAP=False,
  )
  skin_acc_df.loc[f'({s})-({e})'] = [len(sub), results.metrics()['precision'], results.metrics()['recall']]

skin_acc_df['count'] = skin_acc_df['count'].astype(np.int)
skin_acc_df

Evaluating detections...


INFO:fiftyone.utils.eval.detection:Evaluating detections...


 100% |███████████████████| 33/33 [188.2ms elapsed, 0s remaining, 175.4 samples/s]     


INFO:eta.core.utils: 100% |███████████████████| 33/33 [188.2ms elapsed, 0s remaining, 175.4 samples/s]     


Evaluating detections...


INFO:fiftyone.utils.eval.detection:Evaluating detections...


 100% |███████████████████| 46/46 [250.6ms elapsed, 0s remaining, 183.5 samples/s]      


INFO:eta.core.utils: 100% |███████████████████| 46/46 [250.6ms elapsed, 0s remaining, 183.5 samples/s]      


Evaluating detections...


INFO:fiftyone.utils.eval.detection:Evaluating detections...


 100% |███████████████████| 28/28 [121.6ms elapsed, 0s remaining, 230.2 samples/s]    


INFO:eta.core.utils: 100% |███████████████████| 28/28 [121.6ms elapsed, 0s remaining, 230.2 samples/s]    


Evaluating detections...


INFO:fiftyone.utils.eval.detection:Evaluating detections...


 100% |███████████████████| 30/30 [85.4ms elapsed, 0s remaining, 351.1 samples/s] 


INFO:eta.core.utils: 100% |███████████████████| 30/30 [85.4ms elapsed, 0s remaining, 351.1 samples/s] 


Evaluating detections...


INFO:fiftyone.utils.eval.detection:Evaluating detections...


 100% |███████████████████| 29/29 [101.4ms elapsed, 0s remaining, 286.1 samples/s] 


INFO:eta.core.utils: 100% |███████████████████| 29/29 [101.4ms elapsed, 0s remaining, 286.1 samples/s] 


Evaluating detections...


INFO:fiftyone.utils.eval.detection:Evaluating detections...


 100% |███████████████████| 34/34 [105.7ms elapsed, 0s remaining, 321.7 samples/s] 


INFO:eta.core.utils: 100% |███████████████████| 34/34 [105.7ms elapsed, 0s remaining, 321.7 samples/s] 

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations



Unnamed: 0,count,precision,recall
(-10.0)-(-7.0),33,1.0,1.0
(-7.0)-(-4.0),46,1.0,0.586957
(-4.0)-(-1.0),28,0.0,0.0
(-1.0)-(2.0),30,0.0,0.0
(2.0)-(5.0),29,0.0,0.0
(5.0)-(8.0),34,0.0,0.0


In [47]:
# Show chart of recall per gender.  We use recall since the background of the image is pretty basic
# and it is unlikely to have false detections.
import plotly.express as px

fig = px.bar(skin_acc_df,
             x=skin_acc_df.index.astype(str),
             y='recall',
             width=1000,
             #range_y=[0.5, 1],
             text='recall',
             text_auto='.2f',
             labels={'x': 'skin shading'},
             title='Recall by skin shading (synthetic)')
fig.update_layout(font_size=20)
tt = ['I', 'II', 'III', 'IV', 'V', 'VI']
ticktexts = [f'{tt[i]} ({skin_acc_df["count"][i]})' for i in range (0, 6)]
fig.update_xaxes(tickvals=list(range(0, 6)), ticktext=ticktexts)
fig.show()