# Load COCO Dataset from FiftyOne

Import the MS COCO dataset using FiftyOne's dataset zoo into Pixeltable tables.

**What's in this recipe:**
- Import COCO dataset with images and annotations using FiftyOne
- Sample 1% of the training split (~1,183 images from 118,287 total)
- Automatic schema handling for images and labels
- Work with image-detection pairs in Pixeltable


## Problem

MS COCO is a large-scale object detection and segmentation dataset with 118,287 training images. You need a representative sample of this dataset in Pixeltable to apply AI models, create embeddings, or run analysis without downloading the entire dataset.


## Solution

**What's in this recipe:**
- Import 1% sample (~1,183 images) from COCO-2017 training split using FiftyOne
- Automatic schema handling for images and detection labels
- Work with image-detection pairs and metadata in Pixeltable

You can use FiftyOne's dataset zoo to efficiently download specific subsets of COCO, then import them into Pixeltable tables. This allows you to work with exactly the data you need without downloading the entire dataset.

You can iterate on transformations before adding them to your table. Use `.select()` with `.collect()` to preview results on sample dataâ€”nothing is stored in your table. If you want to collect only the first few rows, use `.head(n)` instead of `.collect()`. Once you're satisfied, use `.add_computed_column()` to apply transformations to all rows in your table.


### Setup


In [1]:
!uv add pixeltable fiftyone transformers torch accelerate pillow

[2mResolved [1m275 packages[0m [2min 3ms[0m[0m
[2mAudited [1m178 packages[0m [2min 3ms[0m[0m


In [2]:
import pixeltable as pxt
import fiftyone as fo
import fiftyone.zoo as foz

In [None]:
pxt.list_tables()

### Load COCO Dataset from FiftyOne

Load the [COCO-2017 dataset](https://docs.voxel51.com/dataset_zoo/datasets/coco_2017.html) from FiftyOne's dataset zoo. We'll download 1,183 random samples from the training split (1% of 118,287 total training images).


In [None]:
# Load 1,183 random samples from COCO-2017 training split (1% of 118,287)
# FiftyOne only downloads the specific images needed
coco_dataset = foz.load_zoo_dataset(
    'coco-2017',
    split='train',
    max_samples=1183,
    shuffle=True
)

In [None]:
# Create directory for COCO data
pxt.drop_dir('coco_images', force=True)
pxt.create_dir('coco_images')

### Create Pixeltable Table

Now create a table and insert the sampled data. Each row contains an image with its associated captions and metadata.

In [None]:
# Create table with schema for images and labels
t = pxt.create_table(
    'coco_images.samples',
    schema={
        'image': pxt.Image,
        'coco_id': pxt.Int,
        'num_detections': pxt.Int
    }
)

In [None]:
# Prepare rows for insertion from FiftyOne dataset
rows = []
for idx, sample in enumerate(coco_dataset):
    # Check available fields and extract detection count
    # FiftyOne stores detections in 'ground_truth' field for COCO
    num_dets = 0
    if hasattr(sample, 'ground_truth') and sample.ground_truth:
        num_dets = len(sample.ground_truth.detections)
    
    rows.append({
        'image': sample.filepath,
        'coco_id': idx,
        'num_detections': num_dets
    })

t.insert(rows)

In [None]:
# View sample data
t.select(t.image, t.coco_id, t.num_detections).head(5)

In [None]:
# Check total count
t.count()

### Extract Image Metadata

Add computed columns to extract metadata from the images.


In [None]:
# Add computed columns for image dimensions
t.add_computed_column(width=t.image.width)
t.add_computed_column(height=t.image.height)

In [None]:
# View images with their dimensions and detection counts
t.select(t.image, t.num_detections, t.width, t.height).head(10)

### Add CLIP Embeddings for Image Search

Create vector embeddings for the images using OpenAI's CLIP model. These embeddings enable semantic image search and similarity comparisons.


In [None]:
#t = pxt.get_table('coco_images.samples')

Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/alison-pxt/.pixeltable/pgdata


In [4]:
# Add image embeddings using HuggingFace CLIP model
from pixeltable.functions.huggingface import clip

# Use the correct HuggingFace model ID format
t.add_embedding_index(
    'image',
    embedding=clip.using(model_id='openai/clip-vit-base-patch16')
)

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


### Generate Image Captions with BLIP

Use BLIP (Bootstrapping Language-Image Pre-training), an efficient open-source image captioning model from Salesforce. BLIP generates natural, descriptive captions and runs locally without API keys.


In [6]:
# Generate image captions using BLIP
from pixeltable.functions.huggingface import image_captioning

t.add_computed_column(
    caption=image_captioning(t.image, model_id='Salesforce/blip-image-captioning-base')
)



config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/506 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Added 1183 column values with 0 errors.


1183 rows updated, 2366 values computed.

In [None]:
# Add OpenAI GPT-4o-mini captions for comparison
from pixeltable.functions import openai

t.add_computed_column(
    openai_caption=openai.vision(
        prompt="Describe this image in one sentence, focusing on the main objects, their actions, and the setting. Use clear, factual language similar to COCO dataset captions.",
        image=t.image,
        model='gpt-4o-mini'
    )
)

Error: Exception in task: 'tokens'
Traceback (most recent call last):
  File "/Users/alison-pxt/Documents/Github/pxt-cloud-sets/.venv/lib/python3.11/site-packages/pixeltable/exec/expr_eval/expr_eval_node.py", line 396, in _done_cb
    t.result()
  File "/Users/alison-pxt/.local/share/uv/python/cpython-3.11.13-macos-aarch64-none/lib/python3.11/asyncio/futures.py", line 203, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/Users/alison-pxt/.local/share/uv/python/cpython-3.11.13-macos-aarch64-none/lib/python3.11/asyncio/tasks.py", line 277, in __step
    result = coro.send(None)
             ^^^^^^^^^^^^^^^
  File "/Users/alison-pxt/Documents/Github/pxt-cloud-sets/.venv/lib/python3.11/site-packages/pixeltable/exec/expr_eval/schedulers.py", line 138, in _main_loop
    last_report_ts = self.pool_info.resource_limits[limits_info.resource].recorded_at
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'tokens'


In [None]:
# Compare BLIP and OpenAI captions side-by-side
t.select(t.image, t.caption, t.openai_caption, t.num_detections).head(5)


AttributeError: Unknown column: openai_caption

In [None]:
# View images with their AI-generated captions from InternVL3
t.select(t.image, t.internvl_caption, t.num_detections).head(5)

### Publish to Pixeltable Cloud

Publish the table to make it available on Pixeltable Cloud.


In [None]:
# Publish the table to Pixeltable Cloud
pxt.publish(
    t,
    'pxt://pixeltable:fiftyone/coco_mini_2017',
    access='public'
)

In [None]:
t.push()

In [None]:
t.drop_column('filepath')

In [None]:
t

In [None]:
t.push()