# Load image datasets from Hugging Face

Import image datasets from Hugging Face Hub directly into Pixeltable tables.

**What's in this recipe:**
- Import Hugging Face datasets with one function call
- Automatic schema inference from dataset structure
- Work with image classification datasets in Pixeltable

## Problem

Hugging Face hosts thousands of multimodal datasets including images. You need these datasets in Pixeltable to apply AI models, create embeddings, or run analysis.

## Solution

You import Hugging Face datasets directly into Pixeltable tables using `pxt.create_table()` with the `source` parameter. This automatically infers the schema and loads the data, where you can immediately apply AI models and query results.

You can iterate on transformations before adding them to your table. Use `.select()` with `.collect()` to preview results on sample data—nothing is stored in your table. If you want to collect only the first few rows, use `.head(n)` instead of `.collect()`. Once you're satisfied, use `.add_computed_column()` to apply transformations to all rows in your table.

For more on this workflow, see [Get fast feedback on transformations](./dev-iterative-workflow.ipynb).

### Setup

In [None]:
!uv add pixeltable datasets

Load the [MNIST handwritten digit database](https://huggingface.co/datasets/ylecun/mnist) dataset from Hugging Face, as described in the [Hugging Face documentation](https://huggingface.co/docs/datasets/en/package_reference/loading_methods).

In [5]:
import datasets

# Load the training split of MNIST
mnist_dataset = datasets.load_dataset('ylecun/mnist', split='train')

### Create Pixeltable table

Create a directory and table, then let Pixeltable automatically infer the schema and import the data.

In [6]:
import pixeltable as pxt

# Create a fresh directory (drop existing if present)
pxt.drop_dir('mnist_data', force=True)
pxt.create_dir('mnist_data')

Created directory 'mnist_data'.


<pixeltable.catalog.dir.Dir at 0x3126ec250>

In [7]:
# Source parameter automatically infers schema and imports data
t = pxt.create_table('mnist_data.digits', source=mnist_dataset, comment='Classic machine learning dataset of hand-drawn digits from Hugging Face')

Created table 'digits'.
Inserting rows into `digits`: 60000 rows [00:23, 2562.03 rows/s] 
Inserted 60000 rows with 0 errors.


In [8]:
# View imported data
t.select(t.image, t.label).head(5)

image,label
,5
,0
,4
,1
,9


## Publish to Pixeltable Cloud

Share your dataset publicly on Pixeltable Cloud for others to replicate and use.

In [9]:
# Publish the table to Pixeltable Cloud
pxt.publish(
    'mnist_data.digits',
    'pxt://pixeltable:hugging-face/mnist',
    access='public'
)

Creating a replica of 'mnist_data.digits' at: pxt://pixeltable:hugging-face/mnist
Uploading: 100%|███████████████████████████████████████████████| 19.4M/19.4M [00:01<00:00, 11.2MB/s]
Finalizing replica ...
The published table is now available at: pxt://pixeltable:hugging-face/mnist


## See also

- [Working with Hugging Face](../notebooks/integrations/working-with-hugging-face.ipynb)
- [Load audio datasets from Hugging Face](./data-load-huggingface-dataset.ipynb)
- *Dataset from [ylecun/mnist](https://huggingface.co/datasets/ylecun/mnist)*