Creating a dataset integration.

This tutorial will cover how a velour dataset integration can be made to speed up your workflow.

An integration can be useful if your organization uses a standard format for data 
or for when you just use a certain dataset often. We will use the COCO Panoptic dataset 
in this notebook to show ...

In [7]:
import os
import zipfile
import tempfile
import json
import PIL.Image
import requests
import numpy as np
from tqdm import tqdm
from io import BytesIO
from copy import deepcopy
from pathlib import Path, PosixPath
from typing import Any, Dict, List, Tuple, Union
from collections import defaultdict

from velour import Client, Dataset, Model, Datum, Prediction
from velour.enums import TaskType, JobStatus
from velour.viz import create_combined_segmentation_mask

import coco_integration as coco
import yolo_integration as yolo

import ultralytics

In [8]:
# connect
client = Client("http://localhost:8000")

Succesfully connected to http://localhost:8000/.


Using `coco_integration` create a dataset from COCO Panoptic trainval2017.

In [9]:
velour_dataset = coco.create_dataset_from_coco_panoptic(client, limit=2, reset=True)

coco already exists at ./coco!


100%|██████████| 2/2 [00:00<00:00, 59.64it/s]


Create a `Model` using the standard Velour initializer.

In [10]:
velour_model = Model(client, "yolov8n-seg")

Create predictions using YOLOv8 and upload these to Velour using `Model.add_prediction`

In [11]:
inference_engine = ultralytics.YOLO(f"{velour_model.name}.pt")

for datum in tqdm(velour_dataset.get_datums()):

    image = coco.download_image(datum)

    results = inference_engine(image, verbose=False)

    # convert YOLO result into Velour Prediction
    prediction : Prediction = yolo.parse_yolo_object_detection(
        results,            # raw inference
        datum=datum,        # velour datum
        label_key='name',   # label_key override
    )

    # add prediction to the model
    velour_model.add_prediction(prediction)

100%|██████████| 2/2 [00:02<00:00,  1.46s/it]


The `coco_integration` handles dataset finalization internally. Since we only defined annotation parsers for YOLO we will have to finalize our inferences manually.

In [12]:
velour_model.finalize_inferences(velour_dataset)

# Exploring the Dataset

In [14]:
groundtruth_139 = velour_dataset.get_groundtruth('139')
groundtruth_139.datum.uid

'139'

In [15]:
download_image(groundtruth_139.datum)

NameError: name 'download_image' is not defined

In [None]:
instance_mask, instance_legend = create_combined_segmentation_mask(
    [groundtruth_139], 
    label_key="name",
    task_type=TaskType.DETECTION,
)

In [None]:
instance_mask

In [None]:
for k, v in instance_legend.items():
    print(k)
    display(v)

In [None]:
semantic_mask, semantic_legend = create_combined_segmentation_mask(
    [groundtruth_139], 
    label_key="name",
    task_type=TaskType.SEGMENTATION,
)

In [None]:
semantic_mask

In [None]:
for k, v in semantic_legend.items():
    print(k)
    display(v)