# Multimodal Data Processing with Pixeltable and Backblaze B2

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pixeltable/pixeltable/blob/master/docs/release/tutorials/working-with-external-files.ipynb)

## Extract video frames and store in Backblaze B2

Learn how to process video files with **[Pixeltable](https://www.pixeltable.com/)** and store the results in **Backblaze B2** cloud storage.

**What you'll build:**
- Set up [Pixeltable](https://www.pixeltable.com/) with Backblaze B2 integration
- Create a video table and load video files
- Extract frames from video at specific intervals
- Convert frames to grayscale
- Store grayscale frames in Backblaze B2 with automatic URL generation
- (Bonus) Use AI to edit frames with [Reve](https://reve.com/) and store edited images to B2

## About Pixeltable

**[Pixeltable](https://www.pixeltable.com/)** is an open-source AI data infrastructure that provides:
- **Computed Columns:** Automatically process data through AI models and transformations ([docs](https://docs.pixeltable.com/overview/pixeltable))
- **Multimodal Support:** Native handling of images, video, audio, and documents
- **Persistent Storage:** Everything is stored in a database that survives restarts
- **Declarative Storage:** Simply specify where to store data—Pixeltable handles uploads and URL generation

Learn more in the [Pixeltable documentation](https://docs.pixeltable.com/overview/pixeltable).

## About Backblaze B2

**Backblaze B2** is S3-compatible cloud storage that's cost-effective and simple. In this notebook, we use it to store processed outputs like extracted frames and transformed images. Pixeltable automatically detects B2 endpoints when you use `https://s3.{region}.backblazeb2.com/` URLs, making it seamless to integrate B2 into your data pipelines.

**Key benefits of using B2 with Pixeltable:**
- **S3-compatible API** - Works seamlessly with Pixeltable's storage system
- **Cost-effective** - Competitive pricing for cloud storage
- **Simple setup** - Just provide your B2 credentials and Pixeltable handles the rest
- **Automatic URL generation** - Pixeltable generates servable URLs for all stored files

**Prerequisites:** Backblaze B2 account (free tier available), Python 3.9+

## Setup

### Set up Backblaze B2

Backblaze B2 is S3-compatible, so we use your B2 credentials as AWS credentials. Pixeltable automatically detects B2 endpoints when you use `https://s3.{region}.backblazeb2.com/` URLs for destinations.

**Step 1: Get your B2 credentials**
- Go to Backblaze B2 dashboard → Account → Application Keys
- Click "Add a New Application Key"
- Select your bucket with read/write permissions
- Copy both values immediately (the applicationKey is only shown once):
  - keyID (identifier, starts with numbers)
  - applicationKey (secret key, starts with K00)

**Step 2: Run the cell below** to enter your credentials and set them as AWS environment variables.

In [1]:
# Enter your Backblaze B2 credentials
import os
from getpass import getpass

print('Enter your Backblaze B2 credentials:')
print('1. First, enter your keyID (identifier, starts with numbers)')
KEY_ID = os.getenv('B2_KEY_ID') or input('   keyID: ').strip()

print('2. Now, enter your applicationKey (secret key, starts with K00)')
APPLICATION_KEY = os.getenv('B2_APPLICATION_KEY') or getpass('   applicationKey: ').strip()

print('3. Enter your B2 region (e.g., us-west-004, press Enter for default)')
B2_REGION = os.getenv('B2_REGION') or input('   region: ').strip() or 'us-west-004'

if not KEY_ID or not APPLICATION_KEY:
    raise ValueError('Both keyID and applicationKey must be provided')

# Set as AWS credentials (Backblaze B2 is S3-compatible)
# Pixeltable will automatically detect B2 endpoints from https:// URLs
os.environ['AWS_ACCESS_KEY_ID'] = KEY_ID
os.environ['AWS_SECRET_ACCESS_KEY'] = APPLICATION_KEY
os.environ['AWS_DEFAULT_REGION'] = B2_REGION

# Store B2_REGION for constructing destination URLs
B2_ENDPOINT = f'https://s3.{B2_REGION}.backblazeb2.com'

print(f'✓ Backblaze B2 configured (endpoint: {B2_ENDPOINT})')
print(f'  Use https:// URLs like: {B2_ENDPOINT}/your-bucket/path/ for destinations')

Enter your Backblaze B2 credentials:
1. First, enter your keyID (identifier, starts with numbers)
2. Now, enter your applicationKey (secret key, starts with K00)
3. Enter your B2 region (e.g., us-west-004, press Enter for default)
✓ Backblaze B2 configured (endpoint: https://s3.us-west-004.backblazeb2.com)
  Use https:// URLs like: https://s3.us-west-004.backblazeb2.com/your-bucket/path/ for destinations


### Dependencies

Install Pixeltable and configure your API keys.

In [2]:
# Install required packages
%pip install -qU pixeltable

/Users/alison-pxt/Documents/Github/backblaze/.venv/bin/python: No module named pip
Note: you may need to restart the kernel to use updated packages.


In [3]:
import pixeltable as pxt
from getpass import getpass
from datetime import datetime
import os

## Create video table

We are starting with a blank slate in Pixeltable locally, which we can see with `pxt.list_tables()`

In [None]:
pxt.list_tables()

In [None]:
# Clean slate - only do this if you want to start fresh
pxt.drop_table('octopus-teacher-trailer', force=True)
pxt.drop_table('octopus-teacher-frames', force=True)

Create an empty Pixeltable table to start with a video column.

In [7]:
# Create a Pixeltable table with a video column
octo_vid = pxt.create_table(
    'octopus-teacher-trailer',
    {'video': pxt.Video}
)

Created table 'octopus-teacher-trailer'.


Insert octopus.mp4 as a video into our table.

In [8]:
octo_vid.insert([{'video': 'sources/octopus.mp4'}])

Inserting rows into `octopus-teacher-trailer`: 1 rows [00:00, 299.85 rows/s]
Inserted 1 row with 0 errors.


1 row inserted, 2 values computed.

Now we can see the video, ready for processing:

In [9]:
octo_vid.collect()

video


## From video to frames

Extract frames from the video, sampling one frame approximately every 15 seconds.

In [10]:
from pixeltable.iterators import FrameIterator

# Extract frames approximately every 15 seconds
octo_frames_v = pxt.create_view(
    'octopus-teacher-frames',
    octo_vid,
    iterator=FrameIterator.create(
        video=octo_vid.video,
        fps=0.0667  # Extract 1 frame every 15 seconds (1 / 15)
    )
)

Inserting rows into `octopus-teacher-frames`: 10 rows [00:00, 3396.75 rows/s]


Iterators in Pixeltable are table-generating functions - we can see the new table we have created in this view. Remember, we started with a single video in a single row. The iterator shredded the video into frames using our `fps` parameter to specify frames per second. In this view, each row is one of those frames. There is also an implicit join here with the base table so you always keep your context with you. This does not mean that the source video is copied multiple times. Instead, Pixeltable references the same media file across rows.

In [11]:
octo_frames_v.head()

pos,frame_idx,pos_msec,pos_frame,frame,video
0,0,0.0,0,,
1,1,15000.0,375,,
2,2,30000.0,750,,
3,3,44960.0,1124,,
4,4,59960.0,1499,,
5,5,74960.0,1874,,
6,6,89960.0,2249,,
7,7,104960.0,2624,,
8,8,119960.0,2999,,
9,9,134920.0,3373,,


### Convert frames to grayscale and store in Backblaze B2

Convert each frame to grayscale using Pixeltable's built-in `.convert()` method and store the results in Backblaze B2. This demonstrates how Pixeltable makes image transformations as simple as adding computed columns.

Use the `https://` URL format so Pixeltable automatically detects the B2 endpoint. The destination should follow this pattern:
```
https://s3.{region}.backblazeb2.com/your-bucket/your-path/
```

For example: `https://s3.us-west-004.backblazeb2.com/pxt-b2/output/frames/`

**This is the power of declarative infrastructure:** Instead of writing upload code, managing file paths, and handling errors, you just specify where data should go. Pixeltable orchestrates everything.


In [12]:
# Convert frames to grayscale and store in B2 - Pixeltable handles upload, versioning, and URL generation
octo_frames_v.add_computed_column(
    frame_bw=octo_frames_v.frame.convert('L'),
    stored=True,
    destination=f"{B2_ENDPOINT}/pxt-b2/output/frames/"
)

Added 10 column values with 0 errors.


10 rows updated, 10 values computed.

Let's see what we just created. The `head(5)` method (a shortcut for `limit(5).collect()`) shows the first 5 rows of our view, displaying the grayscale frames from the video.

In [13]:
octo_frames_v.head(5)

pos,frame_idx,pos_msec,pos_frame,frame,frame_bw,video
0,0,0.0,0,,,
1,1,15000.0,375,,,
2,2,30000.0,750,,,
3,3,44960.0,1124,,,
4,4,59960.0,1499,,,


### View the grayscale frames

Let's see the grayscale frames we just created and stored in B2.

In [14]:
octo_frames_v.head(5)

pos,frame_idx,pos_msec,pos_frame,frame,frame_bw,video
0,0,0.0,0,,,
1,1,15000.0,375,,,
2,2,30000.0,750,,,
3,3,44960.0,1124,,,
4,4,59960.0,1499,,,


### Query frames with servable URLs

Query the frames to display them along with their servable file URLs from Backblaze B2.


In [15]:
# Query grayscale frames with their B2 URLs
octo_frames_v.select(
    octo_frames_v.pos,
    octo_frames_v.frame_bw,
    octo_frames_v.frame_bw.fileurl
).collect()

pos,frame_bw,frame_bw_fileurl
0,,https://s3.us-west-004.backblazeb2.com/pxt-b2/output/frames/da913b99de1c4e2d828bb50232479986/48/48b2/da913b99de1c4e2d828bb50232479986_5_1_48b2f9483b714cacaaa021639498e9b7.jpeg
1,,https://s3.us-west-004.backblazeb2.com/pxt-b2/output/frames/da913b99de1c4e2d828bb50232479986/62/625b/da913b99de1c4e2d828bb50232479986_5_1_625ba0e673f24a3586176f2a554f342f.jpeg
2,,https://s3.us-west-004.backblazeb2.com/pxt-b2/output/frames/da913b99de1c4e2d828bb50232479986/4d/4da7/da913b99de1c4e2d828bb50232479986_5_1_4da7be9b274a4fb5b2b2182f516650e3.jpeg
3,,https://s3.us-west-004.backblazeb2.com/pxt-b2/output/frames/da913b99de1c4e2d828bb50232479986/58/58a8/da913b99de1c4e2d828bb50232479986_5_1_58a8a73d8f4a448b96c5a1c5da2e1a86.jpeg
4,,https://s3.us-west-004.backblazeb2.com/pxt-b2/output/frames/da913b99de1c4e2d828bb50232479986/73/735e/da913b99de1c4e2d828bb50232479986_5_1_735ecbb70d7a425798725d59d4abbb3b.jpeg
5,,https://s3.us-west-004.backblazeb2.com/pxt-b2/output/frames/da913b99de1c4e2d828bb50232479986/0c/0c46/da913b99de1c4e2d828bb50232479986_5_1_0c46642290e7481c807afa71c61d018a.jpeg
6,,https://s3.us-west-004.backblazeb2.com/pxt-b2/output/frames/da913b99de1c4e2d828bb50232479986/29/2912/da913b99de1c4e2d828bb50232479986_5_1_29126bb77d7c48d09b4ad85c6d50efb9.jpeg
7,,https://s3.us-west-004.backblazeb2.com/pxt-b2/output/frames/da913b99de1c4e2d828bb50232479986/b5/b5f1/da913b99de1c4e2d828bb50232479986_5_1_b5f1d484424a47e8ac69c1f1cd3828c7.jpeg
8,,https://s3.us-west-004.backblazeb2.com/pxt-b2/output/frames/da913b99de1c4e2d828bb50232479986/71/71e2/da913b99de1c4e2d828bb50232479986_5_1_71e2ee2c47e946dabd2b18a4f1e40c48.jpeg
9,,https://s3.us-west-004.backblazeb2.com/pxt-b2/output/frames/da913b99de1c4e2d828bb50232479986/b8/b86a/da913b99de1c4e2d828bb50232479986_5_1_b86a2584abb943ffaef58a894b37d98d.jpeg


## Bonus: Working with Reve in Pixeltable

Pixeltable's Reve integration lets you call Reve's `create`, `edit`, and `remix` endpoints directly from tables so you can iterate on visuals without leaving your data workflows. We'll use edit to take each frame and edit it with the same prompt. In Pixeltable, you can create unique prompts per row as well.

### Documentation

- [Pixeltable Reve Functions](https://docs.pixeltable.com/sdk/latest/reve#module-pixeltable-functions-reve)
- [Reve API Reference](https://api.reve.com/console/docs)

### Prerequisites

- A Reve account with an API key ([https://app.reve.com/](https://app.reve.com/) → Settings → API Keys)

### Important Notes

- Reve usage incurs costs according to your plan—keep an eye on credits.
- Images you send to Reve leave your environment; avoid uploading sensitive or private data.

### Set up Reve API key

In [16]:
import os
import getpass

if 'REVE_API_KEY' not in os.environ:
    os.environ['REVE_API_KEY'] = getpass.getpass('Reve API Key: ')

In [17]:
from pixeltable.functions import reve

### Edit frames with Reve

Use Reve's `edit` function to transform the grayscale frames into vibrant underwater scenes. This demonstrates how you can integrate AI image editing directly into your Pixeltable workflows, using B2 storage as your generated media destination.

In [18]:
octo_frames_v.add_computed_column(
    frame_reve=reve.edit(
        octo_frames_v.frame_bw,
        'Convert every image into a majestic underwater scene with vibrant colors, rich textures, and if present, focus on the octopus. If not present, the image should be misty, wistful, and focus on a sense of reflection. All images should still look realistic, not imaginary.'
    ),
    destination=f"{B2_ENDPOINT}/pxt-b2/output/reve/"
)

Added 10 column values with 0 errors.


10 rows updated, 10 values computed.

### View the Reve-edited frames

Compare the original grayscale frames with the Reve-edited versions.

In [19]:
octo_frames_v.select(octo_frames_v.frame_bw, octo_frames_v.frame_reve).head(5)

frame_bw,frame_reve
,
,
,
,
,


### Query Reve-edited frames with servable URLs

View the Reve-edited frames along with their servable file URLs from Backblaze B2.


In [20]:
# Query Reve-edited frames with their B2 URLs
octo_frames_v.select(
    octo_frames_v.pos,
    octo_frames_v.frame_reve,
    octo_frames_v.frame_reve.fileurl
).collect()

pos,frame_reve,frame_reve_fileurl
0,,https://s3.us-west-004.backblazeb2.com/pxt-b2/output/reve/da913b99de1c4e2d828bb50232479986/76/76c6/da913b99de1c4e2d828bb50232479986_6_2_76c63722835a47ffb8b413cdf750ae3a.jpeg
1,,https://s3.us-west-004.backblazeb2.com/pxt-b2/output/reve/da913b99de1c4e2d828bb50232479986/c7/c748/da913b99de1c4e2d828bb50232479986_6_2_c748d8ab2984409a8634a700b85aae93.jpeg
2,,https://s3.us-west-004.backblazeb2.com/pxt-b2/output/reve/da913b99de1c4e2d828bb50232479986/be/bea1/da913b99de1c4e2d828bb50232479986_6_2_bea1b38a0eb74a7e94bcb373538dbd12.jpeg
3,,https://s3.us-west-004.backblazeb2.com/pxt-b2/output/reve/da913b99de1c4e2d828bb50232479986/da/da54/da913b99de1c4e2d828bb50232479986_6_2_da5492b31a3944d493094c849a2287e2.jpeg
4,,https://s3.us-west-004.backblazeb2.com/pxt-b2/output/reve/da913b99de1c4e2d828bb50232479986/0a/0a69/da913b99de1c4e2d828bb50232479986_6_2_0a69bbebbe17452a89d7ca0f58621a5e.jpeg
5,,https://s3.us-west-004.backblazeb2.com/pxt-b2/output/reve/da913b99de1c4e2d828bb50232479986/74/746e/da913b99de1c4e2d828bb50232479986_6_2_746e72971dc4445b9eee6a28e2f5938a.jpeg
6,,https://s3.us-west-004.backblazeb2.com/pxt-b2/output/reve/da913b99de1c4e2d828bb50232479986/f1/f1d7/da913b99de1c4e2d828bb50232479986_6_2_f1d7a0c0ee45437ba979271cb347ce43.jpeg
7,,https://s3.us-west-004.backblazeb2.com/pxt-b2/output/reve/da913b99de1c4e2d828bb50232479986/c5/c5d8/da913b99de1c4e2d828bb50232479986_6_2_c5d834c95d624af4a223eccfdbd7387a.jpeg
8,,https://s3.us-west-004.backblazeb2.com/pxt-b2/output/reve/da913b99de1c4e2d828bb50232479986/3f/3f68/da913b99de1c4e2d828bb50232479986_6_2_3f684fe1760846259d234fc61780d0ea.jpeg
9,,https://s3.us-west-004.backblazeb2.com/pxt-b2/output/reve/da913b99de1c4e2d828bb50232479986/d4/d45d/da913b99de1c4e2d828bb50232479986_6_2_d45dae528244455f85a0d364d42e4d28.jpeg
