# Loading a Segmentation Dataset

In this first step, we will explore how to load segmentation datasets into FiftyOne. Segmentation datasets may be of two types: semantic segmentation (pixel-wise class labels) and instance segmentation (individual object masks).

FiftyOne makes it easy to load both types using its Dataset Zoo or from custom formats like COCO or FiftyOne format. Let’s start by loading a common format instance segmentation dataset.

## Loading a Common Format Segmentation Dataset

Segmentation datasets are often provided in standard formats such as COCO, VOC, YOLO, KITTI, and FiftyOne format. FiftyOne supports direct ingestion of these datasets with just a few lines of code.

Make sure your dataset follows the folder structure and file naming conventions required by the specific format (e.g., COCO JSON annotations or class mask folders for semantic segmentation).

In [None]:
!pip install huggingface_hub

In [None]:
import fiftyone as fo

# Create the dataset
name = "mercedes-dataset"
dataset_dir = "../datasets/mercedes"  # Change with your path

# Create the dataset
dataset = fo.Dataset.from_dir(
    dataset_dir=dataset_dir,
    dataset_type=fo.types.COCODetectionDataset, # Change with your type
    name=name,
)

# View summary info about the dataset
print(dataset)

# Print the first few samples in the dataset
print(dataset.head())

Check out the docs for each format to find optional parameters you can pass for things like train/test split, subfolders, or label paths, check more in the User Guide of Using Datasets

## FiftyOne with a Coffee-Beans Dataset

We will walk through how to use FiftyOne to build better segmentation datasets and models.

- Load your own dataset into FiftyOne. For this example, we use a Coffee-Beans Dataset in COCO format.
- Use FiftyOne in a notebook
- Explore your segmentation dataset using views and the FiftyOne App

Note: To load the dataset locally, visit the Coffee-Beans Dataset page on Hugging Face, download the files, and then load them using the following command.

In [None]:
!git clone https://huggingface.co/datasets/pjramg/colombian_coffee

If you only see small pointer files instead of the actual images, it means Git LFS wasn’t used. In that case, use Git LFS to pull the full dataset.

In [None]:
!sudo apt install git-lfs
!git lfs install

In [None]:
import fiftyone as fo

dataset = fo.Dataset.from_dir(
    dataset_type=fo.types.COCODetectionDataset,
    dataset_dir="./colombian_coffee",
    data_path="images/default",
    labels_path="annotations/instances_default.json",
    label_types="segmentations",
    label_field="categories",
    name="coffee",
    include_id=True,
    overwrite=True
)

# View summary info about the dataset
print(dataset)

# Print the first few samples in the dataset
print(dataset.head())

We can see our images have loaded in the App, but no segmentation masks are shown yet. Next, we’ll ensure annotations are properly loaded.

In [None]:
session = fo.launch_app(dataset)

With the FiftyOne App, you can visualize your samples and their segmentation masks in an interactive UI. Double-click any sample to enter the expanded view, where you can study individual samples with overlayed masks.

The view bar lets you filter and search your dataset to analyze specific classes or objects.

You can seamlessly move between Python and the App. For example, create a filtered view using the Shuffle() and Limit() stages in Python or directly in the App UI.

Once your annotations are loaded correctly, you can confirm that your segmentation masks (not detections!) are present and visualized correctly. 🎉

# Step 2: Adding Instance Segmentation to a FiftyOne Dataset

We will explore how to enrich your dataset by adding instance segmentation predictions.

In this notebook, we’ll cover:

- Using the FiftyOne Model Zoo to apply instance segmentation
- Integrating predictions from a custom model (e.g., a model deployed via Intel Geti)

## Using a Instance Segmentation Dataset

For education purposes, use this link in Drive for downloading an upgraded dataset with 100+ annotated unique images.

Download the dataset with this Link and unzip in your work folder.

Let’s kick things off by loading the colombian_coffee-dataset_1600: (This is a new dataset, different from the one used in the last notebook.)

In [None]:
import fiftyone as fo
from fiftyone.utils.coco import COCODetectionDatasetImporter

dataset = fo.Dataset.from_dir(
    dataset_type=fo.types.COCODetectionDataset,
    dataset_dir="./colombian_coffee-dataset_1600",
    data_path="images/default",
    labels_path="annotations/instances_default.json",
    label_types="segmentations",
    label_field="ground_truth",
    name="coffe_1600",
    include_id=True,
    overwrite=True
)

view = dataset.shuffle()
session = fo.launch_app(dataset)

## Loading predictions using SAM2

With FiftyOne, you have tons of pretrained models at your disposal to use via the FiftyOne Model Zoo or using one of our integrations such as HuggingFace! To get started using them, first load the model in and pass it into the apply_model function.

Install SAM2 following the instuctions from this Repo. You can also jump to the next step of this tutorials to understand how SAM2 works with FiftyOne

https://github.com/facebookresearch/sam2

In [None]:
!python -m pip install "sam2"

In [None]:
!pip install 'git+https://github.com/facebookresearch/sam2.git'

If you encounter any issues, please refer to the main SAM2 repository to verify the installation process Repo.

Now apply Segment Anything SAM2 from the FiftyOne Model Zoo. As you can see, some images in the dataset include ground truth annotations, but not all of them. With SAM2, we will apply segmentation across the entire dataset. (This could take around 1.5 hours)

In [None]:
import fiftyone.zoo as foz
model = foz.load_zoo_model("segment-anything-2-hiera-tiny-image-torch")

# Prompt with boxes
dataset.apply_model(
    model,
    label_field="sam2_predictions",
)

Alternatively, you can apply SAM only to the images that already have ground truth segmentations.

In [None]:
dataset.apply_model(
    model,
    label_field="sam2_predictions",
    prompt_field="ground_truth_segmentations",
)

This will execute SAM only for images in the segmentation category.

## Loading predictions using a custom model (Intel Geti Example)

Let’s now simulate the pipeline with a custom instance segmentation model. If you want to run the inference using the same example, please refer tho this example for your reference.

Assuming you’ve already set up inference with a model (e.g., via OpenVINO + Intel Geti SDK):

In [None]:
!pip install geti-sdk==2.10.*

## Preparing the models for inference

The Intel Geti SDK will be used to run inference with Intel Geti Models. The deployment folder of the best model must be downloaded and unzipped in the same folder as the project.

Download and unzip the model

## Generating instance segmentation masks from polygons and bounding boxes
This function extracts instance segmentation masks from polygon annotations, combining detection (bounding boxes) and segmentation (masks) in the same instance using fo.Detection.

1. Load Image – Reads and converts the image to RGB.
2. Process Annotations – Extracts polygon points, computes bounding boxes, and normalizes coordinates.
3. Generate Masks – Creates, crops, and resizes binary masks for each annotation.
4. Save & Return – Stores masks as temp files and returns fo.Detection objects, ensuring the bounding box and mask belong to the same instance.

This enables accurate visualization and analysis in FiftyOne, preserving both object localization and shape details.

Useful for visualizing or processing segmentation data in FiftyOne.

In [None]:
import numpy as np
import cv2
import fiftyone as fo
from PIL import Image as PILImage
from tempfile import NamedTemporaryFile
from geti_sdk.deployment import Deployment
from geti_sdk.data_models.shapes import Polygon

def generate_mask_from_polygon_and_bboxes(sample, prediction):
    image = cv2.imread(sample.filepath)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    img_height, img_width = image.shape[:2]
    print(f"Image size: {img_width}x{img_height}")
    detections = []
    for annotation in prediction.annotations:
        if isinstance(annotation.shape, Polygon):
            polygon_points = [(point.x, point.y) for point in annotation.shape.points]
            polygon_points = np.array(polygon_points, dtype=np.int32)
            label = annotation.labels[0].name
            confidence = annotation.labels[0].probability
            x, y, w, h = cv2.boundingRect(polygon_points)
            scaled_x = x / img_width
            scaled_y = y / img_height
            scaled_w = w / img_width
            scaled_h = h / img_height
            bounding_box = [scaled_x, scaled_y, scaled_w, scaled_h]
            mask = np.zeros((img_height, img_width), dtype=np.uint8)
            cv2.fillPoly(mask, [polygon_points], 255)
            cropped_mask = mask[y:y + h, x:x + w]
            mask_resized = cv2.resize(cropped_mask, (w, h), interpolation=cv2.INTER_NEAREST)
            print(f"Mask size: {mask_resized.shape} (expected: {h}x{w})")
            with NamedTemporaryFile(delete=False, suffix='.png') as temp_mask_file:
                mask_path = temp_mask_file.name
                cv2.imwrite(mask_path, mask_resized)
            detection = fo.Detection(
                label=label,
                confidence=confidence,
                bounding_box=bounding_box,
                mask_path=mask_path
            )
            detections.append(detection)
    return detections

For education purposes check what is happening in the first or last sample. Then you can apply this to the whole dataset

In [None]:
from openvino.runtime import Core

ie = Core()
devices = ie.available_devices

for device in devices:
    device_name = ie.get_property(device, "FULL_DEVICE_NAME")
    print(f"{device}: {device_name}")

In [None]:
# Update the folder path to match the location where the model was downloaded and unzipped
deployment_inference = Deployment.from_folder("geti_sdk-deployment_90")
deployment_inference.load_inference_models(device="CPU")

In [None]:
# Test on one image
sample = dataset.first()
image_path = sample.filepath
image_data = PILImage.open(image_path)
image_data = np.array(image_data)
prediction = deployment_inference.infer(image_data)
detections = generate_mask_from_polygon_and_bboxes(sample, prediction)
sample['predicted_segmentations_test'] = fo.Detections(detections=detections)
sample.save()
dataset.reload()
print(dataset)
print(sample)

 Tip: Replace prediction.objects with your real output structure and masks.

## Run the prediction in the whole dataset

This loop processes each sample in the dataset by loading the image, running inference using Geti SDK, and generating instance segmentation masks. The function extracts detections with both bounding boxes and masks, ensuring they belong to the same instance. These predictions are then stored in the sample under "predictions_model" using fo.Detections. Finally, the dataset is reloaded to reflect the updates.

In [None]:
# Iterate over the samples in the dataset
for sample in dataset:
    # Load the image as a NumPy array using PIL or OpenCV
    image_path = sample.filepath  # Path to the image file
    image_data = PILImage.open(image_path)
    image_data = np.array(image_data)  # Convert the image to NumPy array

    # Run inference on the sample (using Geti SDK's inference)
    prediction = deployment_inference.infer(image_data)

    # Generate the segmentation mask and detections using the annotations from the prediction
    detections = generate_mask_from_polygon_and_bboxes(sample, prediction)

    # Add the detections as predicted segmentations
    sample["predictions_geti_sdk"] = fo.Detections(detections=detections)

    # Save the updated sample
    sample.save()

# Reload the dataset to reflect the changes
dataset.reload()

## Compare Predictions in FiftyOne App

Toggle between ground_truth_segmentations, sam2_predictions, and predictions_geti_sdk in the App to explore and compare different segmentations side-by-side!

In [None]:
session = fo.launch_app(dataset)

# Step 3: Using SAM 2

Segment Anything 2 (SAM 2) is a powerful segmentation model released in July 2024 that pushes the boundaries of image and video segmentation. It brings new capabilities to computer vision applications, including the ability to generate precise masks and track objects across frames in videos using just simple prompts.

In this notebook, you’ll learn how to:

- Understand the key innovations in SAM 2
- Apply SAM 2 to image datasets using bounding boxes, keypoints, or no prompts at all
- Leverage SAM 2’s video segmentation and mask tracking capabilities with a single-frame prompt

## What is SAM 2?

SAM 2 is the next generation of the Segment Anything Model, originally introduced by Meta in 2023. While SAM was designed for zero-shot segmentation on still images, SAM 2 adds robust video segmentation and tracking capabilities. With just a bounding box or a set of keypoints on a single frame, SAM 2 can segment and track objects across entire video sequences.

## Using SAM 2 for Images

SAM 2 integrates directly with the FiftyOne Model Zoo, allowing you to apply segmentation to image datasets with minimal code. Whether you’re working with ground truth bounding boxes, keypoints, or want to explore automatic mask generation, FiftyOne makes the process seamless.

In [2]:
import fiftyone as fo
import fiftyone.zoo as foz

# Load dataset
dataset = foz.load_zoo_dataset("quickstart", max_samples=25, shuffle=True, seed=51)

# Load SAM 2 image model
model = foz.load_zoo_model("segment-anything-2-hiera-tiny-image-torch")

# Prompt with bounding boxes
dataset.apply_model(model, label_field="segmentations", prompt_field="ground_truth")

# Launch app to view segmentations
session = fo.launch_app(dataset)

Dataset already downloaded
Loading existing dataset 'quickstart-25'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use
 100% |███████████████████| 25/25 [4.0s elapsed, 0s remaining, 8.7 samples/s]      


INFO:eta.core.utils: 100% |███████████████████| 25/25 [4.0s elapsed, 0s remaining, 8.7 samples/s]      


## Using a custom segmentation dataset

We will use a segmenation dataset with coffee beans, this is a FiftyOne Dataset. pjramg/my_colombian_coffe_FO

In [None]:
import fiftyone as fo # base library and app
import fiftyone.utils.huggingface as fouh # Hugging Face integration
dataset_ = fouh.load_from_hub("pjramg/my_colombian_coffe_FO", persistent=True, overwrite=True)

# Define the new dataset name
dataset_name = "coffee_FO_SAM2"

# Check if the dataset exists
if dataset_name in fo.list_datasets():
    print(f"Dataset '{dataset_name}' exists. Loading...")
    dataset = fo.load_dataset(dataset_name)
else:
    print(f"Dataset '{dataset_name}' does not exist. Creating a new one...")
    # Clone the dataset with a new name and make it persistent
    dataset = dataset_.clone(dataset_name, persistent=True)

## Prompting with ground truth information in the 100 unique samples in the dataset

In [None]:
import fiftyone.brain as fob

results = fob.compute_similarity(dataset, brain_key="img_sim2")
results.find_unique(100)

In [None]:
unique_view = dataset.select(results.unique_ids)
session.view = unique_view

## Apply SAM2 just the 100 unique samples

SAM 2 can also segment entire images without needing any bounding boxes or keypoints. This zero-input mode is useful for generating segmentation masks for general visual analysis or bootstrapping annotation workflows.

In [None]:
import fiftyone.zoo as foz
model = foz.load_zoo_model("segment-anything-2-hiera-tiny-image-torch")

# Full automatic segmentations
unique_view.apply_model(model, label_field="sam2_results")

In case you run out of memory, you can free up GPU space by clearing the cache with:

In [None]:
import torch
torch.cuda.empty_cache()

## Bonus with SAM2

### Prompting with Keypoints

Keypoint prompts are a great alternative to bounding boxes when working with articulated objects like people. Here, we filter images to include only people, generate keypoints using a keypoint model, and then use those keypoints to prompt SAM 2 for segmentation.

In [None]:
from fiftyone import ViewField as F

# Filter persons only
dataset = foz.load_zoo_dataset("quickstart")
dataset = dataset.filter_labels("ground_truth", F("label") == "person")

# Apply keypoint detection
kp_model = foz.load_zoo_model("keypoint-rcnn-resnet50-fpn-coco-torch")
dataset.default_skeleton = kp_model.skeleton
dataset.apply_model(kp_model, label_field="gt_keypoints")
session = fo.launch_app(dataset)

In [None]:
# Apply SAM 2 with keypoints
model = foz.load_zoo_model("segment-anything-2-hiera-tiny-image-torch")
dataset.apply_model(model, label_field="segmentations", prompt_field="gt_keypoints_keypoints")
session = fo.launch_app(dataset)

## Using SAM 2 for Video

SAM 2 brings game-changing capabilities to video understanding. It can track segmentations across frames from a single bounding box or keypoint prompt provided on the first frame. With this, you can propagate high-quality segmentation masks through entire sequences automatically.

In [None]:
dataset = foz.load_zoo_dataset("quickstart-video", max_samples=2)
from fiftyone import ViewField as F

# Remove boxes after first frame
(
    dataset
    .match_frames(F("frame_number") > 1)
    .set_field("frames.detections", None)
    .save()
)
session = fo.launch_app(dataset)

In [None]:
# Apply video model with first-frame prompt
model = foz.load_zoo_model("segment-anything-2-hiera-tiny-video-torch")
dataset.apply_model(model, label_field="segmentations", prompt_field="frames.detections")
session = fo.launch_app(dataset)

Available SAM 2 Models in FiftyOne

Image Models:
- segment-anything-2-hiera-tiny-image-torch
- segment-anything-2-hiera-small-image-torch
- segment-anything-2-hiera-base-plus-image-torch
- segment-anything-2-hiera-large-image-torch

Video Models:
- segment-anything-2-hiera-tiny-video-torch
- segment-anything-2-hiera-small-video-torch
- segment-anything-2-hiera-base-plus-video-torch
- segment-anything-2-hiera-large-video-torch