# Act 3: Generate from Data

In this notebook, you'll generate new data from existing data. We'll be using AI models to get creative:

✂️ Extract frames from video scenes  
🧪 Compose scene descriptions using Gemini with frames and transcripts  
🧞‍♀️ Generate images with Imagen using Gemini scene descriptions  
🍿 (Optional) Generate videos from images and prompts using Veo

In a typical workflow, this means: loading video files, extracting frames with OpenCV or ffmpeg, encoding images to base64, constructing API requests, managing authentication, parsing JSON responses, handling rate limits and errors, writing batch processing loops, saving outputs to disk, and tracking which items have been processed.

In Pixeltable, we can express this workflow without any explicit data handling. It is still the same work, same conceptual flow, but you don't have to craft the data operations. Instead, you can focus on the logic and Pixeltable handles the rest. 

## In this notebook

The techniques you'll learn apply to any workflow that chains transformations and applies them across data. We'll use Pixeltable to:

1. **Extract Frames** - Pull a frame from each video scene
2. **Generate Creative Prompts** - Use multimodal AI to create trailer descriptions from frames and transcripts
3. **Create Visual Content** - Generate images and videos from AI prompts
4. **Apply Across All Scenes** - Run the workflow on your entire scene library automatically

**Prerequisites:** You'll need a Gemini API key from [aistudio.google.com](https://aistudio.google.com/apikey). See [Pixeltable's API key configuration guide](https://docs.pixeltable.com/howto/cookbooks/core/workflow-api-keys) for setup instructions.

**Models and Pixeltable UDFs used in this notebook:**

| Google Model | Pixeltable UDF | Purpose |
|--------------|----------------|----------|
| `gemini-2.0-flash` | `pxtf.gemini.generate_content()` | Generate text descriptions from images |
| `imagen-4.0-fast-generate-001` | `pxtf.gemini.generate_images()` | Generate images from text prompts |
| `veo-3.1-generate-preview` | `pxtf.gemini.generate_videos()` | Generate videos from images and text |

**Here's what you'll build:**

```
┌────────────────────────────────────────────────────────────────────────────┐
│  SCENE             PROMPT             GENERATE            ANIMATE          │
│                                                                            │
│  ┌────────┐       ┌────────┐        ┌──────────┐       ┌──────────┐        │
│  │ Frame  │──────▶│ Gemini │───────▶│  Imagen  │──────▶│   Veo    │        │
│  │ + Text │       │        │        │          │       │          │        │
│  └────────┘       └────────┘        └──────────┘       └──────────┘        │
│      │                │                   │                  │             │
│  Frame +          Scene              Diorama            Animated           │
│  metadata         prompt             image              video              │
│                                                                            │
└────────────────────────────────────────────────────────────────────────────┘
```

In [1]:
import pixeltable as pxt
import pixeltable.functions as pxtf

As we've been doing, we'll take a look at the tables we can "get":

In [2]:
pxt.list_tables('primetime-workshop')

Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/alison-pxt/.pixeltable/pgdata


['primetime-workshop/video-frame-view',
 'primetime-workshop/primetime_vids',
 'primetime-workshop/sentences',
 'primetime-workshop/scene_view']

We'll be working with our `scene_view` table. 

In [3]:
scene_view = pxt.get_table('primetime-workshop/scene_view')

You can uncomment the code below if you have already gone through this notebook and want to start fresh with the state of `scene_view` before adding any generative AI model inputs or outputs.

In [None]:
#scene_view.drop_column(scene_view.scene_image)
#scene_view.drop_column(scene_view.scene_prompt)
#scene_view.drop_column(scene_view.prompt_response)
#scene_view.drop_column(scene_view.beginning_frame)
#scene_view.drop_column(scene_view.prompt_text)

We can also check out the history of our table, to revisit where we've been!

In [4]:
# Your may look different than mine
scene_view.history()

Unnamed: 0,version,created_at,user,change_type,inserts,updates,deletes,errors,schema_change
0,4,2026-01-29 03:43:21.765418+00:00,,data,101,0,0,0,
1,3,2026-01-29 03:42:34.977887+00:00,,schema,0,10,0,0,Added: transcript_text
2,2,2026-01-29 03:39:48.513358+00:00,,schema,0,10,0,0,Added: transcription
3,1,2026-01-29 03:39:44.071679+00:00,,schema,0,10,0,0,Added: audio
4,0,2026-01-29 03:39:36.073935+00:00,,schema,10,0,0,0,Initial Version


In addition to viewing table history, you can do time travel, rollback to a previous version, and revert changes. See this page in our docs for more: https://docs.pixeltable.com/platform/version-control

As a reminder, here is our view schema:

In [5]:
scene_view

0
view 'primetime-workshop/scene_view' (of 'primetime-workshop/primetime_vids')

Column Name,Type,Computed With
pos,Required[Int],
segment_start,Float,
segment_start_pts,Int,
segment_end,Float,
segment_end_pts,Int,
video_segment,Required[Video],
audio,Required[Audio],extract_audio(video_segment)
transcription,Required[Json],"transcribe(audio, model='base')"
transcript_text,String,transcription.text.astype(String)
video,Video,


## 01 - Extract Frames

You already have `scene_view` from Act 2 with video segments for each scene. Now you'll add a computed column to extract a frame from each segment.

As a reminder, `scene_view` is a view built on top of `primetime_vids`, which means it automatically has access to all the parent table's columns (like `title` and `promo_text`).

Computed columns work on views the same way they work on tables - they're automatically applied to every row:

- `extract_frame()` is a Pixeltable UDF that pulls a single frame at a specific timestamp from each video segment - we used this UDF already in Act 1
- Output frames are stored persistently
- If you add a new row to `primetime_vids`, only those new scenes are processed (incremental updates)

In [6]:
scene_view.add_computed_column(
    beginning_frame=scene_view.video_segment.extract_frame(timestamp=5),
    if_exists='replace'
)

Added 18 column values with 0 errors in 1.54 s (11.69 rows/s)


18 rows updated.

Let's look at the frames we just extracted:

In [7]:
scene_view.select(scene_view.pos, scene_view.video_segment, scene_view.beginning_frame).tail(3)

pos,video_segment,beginning_frame
5,,
6,,
7,,


## 02 - Build Scene Prompts

Now let's build scene descriptions that we can use in later sections to generate images and videos with AI models. We'll once again use a built-in UDF (User-Defined Function) to do this. Pixeltable provides a `format()` function in `pixeltable.functions.string` that works like Python's `str.format()`, but operates on table columns. 


We'll use this UDF to compose a custom scene text prompt with a template, and using data from columns in our view. Here's the workflow:

1. Add a computed column called `prompt_text` to compose the text prompt (no AI needed here)
2. We pass the text prompt + frame image to Gemini in another computed column
3. Gemini returns a JSON response with the scene description
4. We extract the text from the JSON response

We are using a prompt template based on Google's [guide to prompting Gemini for image generation](https://developers.googleblog.com/en/how-to-prompt-gemini-2-5-flash-image-generation-for-the-best-results/). 

In [8]:
# Define the template string
prompt_template = (
    'You are creating a miniature diorama scene. '
    'Movie: "{title}". '
    'Plot: {promo_text}. '
    'This frame shows a key moment where: {transcript_text}. '
    'Create a detailed prompt for an image generator following this structure: '
    'An intricate miniature diorama of [describe the scene], '
    'photographed with a tilt-shift lens creating selective focus and a dreamlike toy-world atmosphere. '
    'Describe the shot type, lighting (emphasize dramatic miniature lighting), '
    'tiny handcrafted details, materials (felt, clay, wood), color palette, and mood. '
    'The scene should feel both cinematic and charmingly handmade. '
    'Write only the prompt, no preamble or explanation. '
    'Do not include text, typography, or words in the image.'
)

Now we add the computed column:

In [9]:
# Add computed column using string.format()
scene_view.add_computed_column(
    prompt_text=pxtf.string.format(
        prompt_template,
        title=scene_view.title,
        promo_text=scene_view.promo_text,
        transcript_text=scene_view.transcript_text
    ),
    if_exists='replace'
)

Added 18 column values with 0 errors in 0.05 s (382.97 rows/s)


18 rows updated.

In [10]:
scene_view

0
view 'primetime-workshop/scene_view' (of 'primetime-workshop/primetime_vids')

Column Name,Type,Computed With
pos,Required[Int],
segment_start,Float,
segment_start_pts,Int,
segment_end,Float,
segment_end_pts,Int,
video_segment,Required[Video],
audio,Required[Audio],extract_audio(video_segment)
transcription,Required[Json],"transcribe(audio, model='base')"
transcript_text,String,transcription.text.astype(String)
beginning_frame,Image,video_segment.extract_frame(timestamp=5)


So we have `prompt_text` as a new column in our view schema. Let's look at two of them - notice where there are boilerplate elements here.

In [11]:
scene_view.select(scene_view.beginning_frame, scene_view.prompt_text).head(1)

beginning_frame,prompt_text
,"You are creating a miniature diorama scene. Movie: ""The Queens Gambit"". Plot: Set during the Cold War era, orphaned chess prodigy Beth Harmon struggles with addiction in a quest to become the greatest chess player in the world.. This frame shows a key moment where: That check has been the whole point of the sequence beginning with the bishop cutting down the slope of the book by forcing it to a last threatening left. Question is... What will she do now?. Create a detailed prompt for an imag ...... g this structure: An intricate miniature diorama of [describe the scene], photographed with a tilt-shift lens creating selective focus and a dreamlike toy-world atmosphere. Describe the shot type, lighting (emphasize dramatic miniature lighting), tiny handcrafted details, materials (felt, clay, wood), color palette, and mood. The scene should feel both cinematic and charmingly handmade. Write only the prompt, no preamble or explanation. Do not include text, typography, or words in the image."


In [12]:
scene_view.select(scene_view.beginning_frame, scene_view.prompt_text).tail(1)

beginning_frame,prompt_text
,"You are creating a miniature diorama scene. Movie: ""Only Murders in the Building"". Plot: Three strangers share an obsession with true crime and suddenly find themselves wrapped up in one. When a grisly death occurs inside their exclusive Upper West Side apartment building, the trio suspects murder and employs their precise knowledge of true crime to investigate the truth. Perhaps even more explosive are the lies they tell one another. Soon, the endangered trio comes to realize a killer might ...... g this structure: An intricate miniature diorama of [describe the scene], photographed with a tilt-shift lens creating selective focus and a dreamlike toy-world atmosphere. Describe the shot type, lighting (emphasize dramatic miniature lighting), tiny handcrafted details, materials (felt, clay, wood), color palette, and mood. The scene should feel both cinematic and charmingly handmade. Write only the prompt, no preamble or explanation. Do not include text, typography, or words in the image."


## 03 - Compose Scene-Specific Prompts

Now we'll actually use these text prompts to generate scene-specific prompts for AI image/video models. 

Recall from Act 1 that writing a query using `select()` does not change the underlying table - you can think of this like a purely "in memory" operation that is great for experimentation and transparency in your workflow. Here, we are calling the Gemini model but just for two scenes:

In [13]:
# Test on a single scene
scene_view.where(scene_view.pos == 6).select(
    scene_view.beginning_frame,
    scene_view.prompt_text,
    scene_prompt=pxtf.gemini.generate_content(
        contents=[
            scene_view.prompt_text,
            scene_view.beginning_frame
        ],
        model='gemini-2.0-flash'
    )
).collect()

beginning_frame,prompt_text,scene_prompt
,"You are creating a miniature diorama scene. Movie: ""The Queens Gambit"". Plot: Set during the Cold War era, orphaned chess prodigy Beth Harmon struggles with addiction in a quest to become the greatest chess player in the world.. This frame shows a key moment where: You're gonna miss the flood? Come on. Come on. Lisa, come on.. Create a detailed prompt for an image generator following this structure: An intricate miniature diorama of [describe the scene], photographed with a tilt-shift lens creating selective focus and a dreamlike toy-world atmosphere. Describe the shot type, lighting (emphasize dramatic miniature lighting), tiny handcrafted details, materials (felt, clay, wood), color palette, and mood. The scene should feel both cinematic and charmingly handmade. Write only the prompt, no preamble or explanation. Do not include text, typography, or words in the image.","{""sdk_http_response"": {""headers"": {""Content-Type"": ""application/json; charset=UTF-8"", ""Vary"": ""Origin, X-Origin, Referer"", ""Content-Encoding"": ""gzip"", ""Date"": ""Thu, 29 Jan 2026 03:46:55 GMT"", ""Server"": ""scaffolding on HTTPServer2"", ""X-XSS-Protection"": ""0"", ""X-Frame-Options"": ""SAMEORIGIN"", ""X-Content-Type-Options"": ""nosniff"", ""Server-Timing"": ""gfet4t7; dur=3248"", ""Alt-Svc"": ""h3=\"":443\""; ma=2592000,h3-29=\"":443\""; ma=2592000"", ""Transfer-Encoding"": ""chunked""}, ""body"": null}, ""candidates"": [{""content"": {""parts"": [{""media_resolution"": null, ""code_execution_result"": null, ""executable_code"": null, ""file_data"": null, ""function_call"": null, ""function_response"": null, ""inline_data"": null, ""text"": ""An intricate miniature diorama of Beth Harmon, dressed in a pristine white coat and hat, standing confidently on a cobblestone street next to a gl ...... s, whites, and blacks, with a subtle pop of red on the vintage car; desaturated tones. Mood: confident, stylish, slightly melancholic, cinematic.\n"", ""thought"": null, ""thought_signature"": null, ""video_metadata"": null}], ""role"": ""model""}, ""citation_metadata"": null, ""finish_message"": null, ""token_count"": null, ""finish_reason"": ""STOP"", ""avg_logprobs"": -0.615, ""grounding_metadata"": null, ""index"": null, ""logprobs_result"": null, ""safety_ratings"": null, ""url_context_metadata"": null}], ""create_time"": null, ""model_version"": ""gemini-2.0-flash"", ""prompt_feedback"": null, ""response_id"": ""LNh6adXMPMfS_uMPi5uJqAg"", ""usage_metadata"": {""cache_tokens_details"": null, ""cached_content_token_count"": null, ""candidates_token_count"": 202, ""candidates_tokens_details"": [{""modality"": ""TEXT"", ""token_count"": 202}], ""prompt_token_count"": 1988, ""prompt_tokens_details"": [{""modality"": ""TEXT"", ""token_count"": 182}, {""modality"": ""IMAGE"", ""token_count"": 1806}], ""thoughts_token_count"": null, ""tool_use_prompt_token_count"": null, ""tool_use_prompt_tokens_details"": null, ""total_token_count"": 2190, ""traffic_type"": null}, ""automatic_function_calling_history"": [], ""parsed"": null}"
,"You are creating a miniature diorama scene. Movie: ""Only Murders in the Building"". Plot: Three strangers share an obsession with true crime and suddenly find themselves wrapped up in one. When a grisly death occurs inside their exclusive Upper West Side apartment building, the trio suspects murder and employs their precise knowledge of true crime to investigate the truth. Perhaps even more explosive are the lies they tell one another. Soon, the endangered trio comes to realize a killer might ...... g this structure: An intricate miniature diorama of [describe the scene], photographed with a tilt-shift lens creating selective focus and a dreamlike toy-world atmosphere. Describe the shot type, lighting (emphasize dramatic miniature lighting), tiny handcrafted details, materials (felt, clay, wood), color palette, and mood. The scene should feel both cinematic and charmingly handmade. Write only the prompt, no preamble or explanation. Do not include text, typography, or words in the image.","{""sdk_http_response"": {""headers"": {""Content-Type"": ""application/json; charset=UTF-8"", ""Vary"": ""Origin, X-Origin, Referer"", ""Content-Encoding"": ""gzip"", ""Date"": ""Thu, 29 Jan 2026 03:46:58 GMT"", ""Server"": ""scaffolding on HTTPServer2"", ""X-XSS-Protection"": ""0"", ""X-Frame-Options"": ""SAMEORIGIN"", ""X-Content-Type-Options"": ""nosniff"", ""Server-Timing"": ""gfet4t7; dur=2768"", ""Alt-Svc"": ""h3=\"":443\""; ma=2592000,h3-29=\"":443\""; ma=2592000"", ""Transfer-Encoding"": ""chunked""}, ""body"": null}, ""candidates"": [{""content"": {""parts"": [{""media_resolution"": null, ""code_execution_result"": null, ""executable_code"": null, ""file_data"": null, ""function_call"": null, ""function_response"": null, ""inline_data"": null, ""text"": ""An intricate miniature diorama of an elevator interior from \""Only Murders in the Building,\"" featuring three miniature figures resembling the main ...... nd golds, with a hint of sparkle from the dress. The overall mood is slightly mysterious and humorous, like a scene from a miniature crime drama.\n"", ""thought"": null, ""thought_signature"": null, ""video_metadata"": null}], ""role"": ""model""}, ""citation_metadata"": null, ""finish_message"": null, ""token_count"": null, ""finish_reason"": ""STOP"", ""avg_logprobs"": -0.731, ""grounding_metadata"": null, ""index"": null, ""logprobs_result"": null, ""safety_ratings"": null, ""url_context_metadata"": null}], ""create_time"": null, ""model_version"": ""gemini-2.0-flash"", ""prompt_feedback"": null, ""response_id"": ""MNh6abX6BrLP_uMPi_fPgQ0"", ""usage_metadata"": {""cache_tokens_details"": null, ""cached_content_token_count"": null, ""candidates_token_count"": 179, ""candidates_tokens_details"": [{""modality"": ""TEXT"", ""token_count"": 179}], ""prompt_token_count"": 2079, ""prompt_tokens_details"": [{""modality"": ""IMAGE"", ""token_count"": 1806}, {""modality"": ""TEXT"", ""token_count"": 273}], ""thoughts_token_count"": null, ""tool_use_prompt_token_count"": null, ""tool_use_prompt_tokens_details"": null, ""total_token_count"": 2258, ""traffic_type"": null}, ""automatic_function_calling_history"": [], ""parsed"": null}"


You can see that Gemini returns JSON output - we only want the text. We'll take care of that in the next section.

Above, we've called the Gemini API for two scenes (one from each video). Our view table remains unchanged:

In [14]:
scene_view

0
view 'primetime-workshop/scene_view' (of 'primetime-workshop/primetime_vids')

Column Name,Type,Computed With
pos,Required[Int],
segment_start,Float,
segment_start_pts,Int,
segment_end,Float,
segment_end_pts,Int,
video_segment,Required[Video],
audio,Required[Audio],extract_audio(video_segment)
transcription,Required[Json],"transcribe(audio, model='base')"
transcript_text,String,transcription.text.astype(String)
beginning_frame,Image,video_segment.extract_frame(timestamp=5)


Now let's add our AI-generated scene prompt as a computed column, and extract just the text we need from the Gemini response.

- This calls the Gemini API for each scene and saves the output in a column in our table persistently. 
- This will take longer (>1 minute) because we are generating across all rows in our table. 
- We'll parse the response to get just the text we need as a separate computed column.

In [15]:
scene_view.add_computed_column(
    prompt_response=pxtf.gemini.generate_content(
        contents=[
            scene_view.prompt_text,
            scene_view.beginning_frame
        ],
        model='gemini-2.0-flash'
    ), if_exists='replace')

scene_view.add_computed_column(
    scene_prompt=scene_view.prompt_response['candidates'][0]['content']['parts'][0]['text'],
    if_exists='replace'
)

Added 18 column values with 0 errors in 6.52 s (2.76 rows/s)
Added 18 column values with 0 errors in 0.03 s (676.62 rows/s)


18 rows updated.

In [16]:
scene_view

0
view 'primetime-workshop/scene_view' (of 'primetime-workshop/primetime_vids')

Column Name,Type,Computed With
pos,Required[Int],
segment_start,Float,
segment_start_pts,Int,
segment_end,Float,
segment_end_pts,Int,
video_segment,Required[Video],
audio,Required[Audio],extract_audio(video_segment)
transcription,Required[Json],"transcribe(audio, model='base')"
transcript_text,String,transcription.text.astype(String)
beginning_frame,Image,video_segment.extract_frame(timestamp=5)


In [17]:
scene_view.select(scene_view.beginning_frame, scene_view.scene_prompt).limit(3).collect()

beginning_frame,scene_prompt
,"An intricate miniature diorama of the ""Only Murders in the Building"" trio – Charles, Oliver, and Mabel – inside the elevator of The Arconia. Charles is depicted partially exiting, holding the golden elevator door open slightly, while Mabel stands in the center, looking down at her phone. Oliver's back is to the viewer, obscuring part of the scene. Photographed with a tilt-shift lens creating selective focus, emphasizing the figures and blurring the background, creating a dreamlike toy-world ...... is dramatic miniature lighting, with a single, concentrated warm light source coming from the elevator's overhead light, casting long, soft shadows and highlighting the textures of the handcrafted details. Tiny handcrafted details include felt clothing, clay faces with expressive features, and a wooden elevator interior with carved details. The color palette is muted and earthy, with creams, browns, and golds dominating. The mood is mysterious and slightly ominous, with a hint of dark humor."
,"An intricate miniature diorama of Charles' Upper West Side apartment at night, recreating the scene after he finishes his podcast. The shot is a medium shot, focusing on an overstuffed armchair cluttered with a dark tailored blazer, scattered papers, and a magazine featuring Ben Glenroy, capturing a sense of cozy disarray; to the right, an antique silhouette portrait in a gilt frame leans against a stack of books. Photographed with a tilt-shift lens creating selective focus and a dreamlike t ...... andcrafted table lamp with a patterned shade casting long shadows, highlighting the dust motes in the air and the miniature details. Tiny handcrafted details include miniature books on shelves, a tiny ceramic bowl filled with miniature cookies, and a meticulously crafted silhouette portrait with distressed edges. The diorama is constructed from felt, clay, and wood. The color palette is muted browns, deep greens, and warm ambers. The mood is cozy, slightly cluttered, with a hint of mystery."
,"An intricate miniature diorama of Beth Harmon arriving at the Central Chess Club in Moscow, surrounded by reporters and onlookers. Photographed with a tilt-shift lens creating selective focus and a dreamlike toy-world atmosphere. Medium shot, cold dramatic lighting emphasizing the stone architecture of the building and the small crowd, handcrafted details like tiny felt coats, clay faces with subtle expressions, and wooden accents on the building's facade. The color palette is muted, primarily greys, blues, and browns, with a touch of warmth on Beth's face. Mood: Intrigued and hopeful but with an undercurrent of tension and anticipation. The scene should feel both cinematic and charmingly handmade."


Now we have a `scene_prompt` column that contains the generated text descriptions. This prompt will be reused for both image and video generation in the next sections.

Let's look at our table history again:

In [18]:
scene_view.history()

Unnamed: 0,version,created_at,user,change_type,inserts,updates,deletes,errors,schema_change
0,8,2026-01-29 03:47:29.797296+00:00,,schema,0,18,0,0,Added: scene_prompt
1,7,2026-01-29 03:47:23.266118+00:00,,schema,0,18,0,0,Added: prompt_response
2,6,2026-01-29 03:46:42.999632+00:00,,schema,0,18,0,0,Added: prompt_text
3,5,2026-01-29 03:46:34.468099+00:00,,schema,0,18,0,0,Added: beginning_frame
4,4,2026-01-29 03:43:21.765418+00:00,,data,101,0,0,0,
5,3,2026-01-29 03:42:34.977887+00:00,,schema,0,10,0,0,Added: transcript_text
6,2,2026-01-29 03:39:48.513358+00:00,,schema,0,10,0,0,Added: transcription
7,1,2026-01-29 03:39:44.071679+00:00,,schema,0,10,0,0,Added: audio
8,0,2026-01-29 03:39:36.073935+00:00,,schema,10,0,0,0,Initial Version


## 04 - Generate Scene Images

Now let's put these prompts from `gemini-2.0-flash` to the test. We will use these scene prompts to generate visual images, using `imagen-4.0-fast-generate-001` as our text-to-image model.

The examples below use `.select().where()` to test generation on a single scene. This is a **query** - it runs once and returns results without storing anything. This is perfect for testing expensive operations before applying them to all rows. Later, you'll see how to add image generation as a **computed column** to process all scenes automatically and store the results persistently.

Here, we are creating a variable `scene_image` to hold the result of the query. This is an in memory result and does not change your stored table. It is also limited: you can pull this result back up within your Python session only. We'll persist outputs in the next section.

In [19]:
scene_image = scene_view.where(scene_view.pos == 7).select(
    scene_view.pos,
    scene_view.scene_prompt,
    scene_image=pxtf.gemini.generate_images(
        prompt=scene_view.scene_prompt,
        model='imagen-4.0-fast-generate-001'
    )
).collect()

In [20]:
scene_image

pos,scene_prompt,scene_image
7,"An intricate miniature diorama of an outdoor chess scene in 1960s Berlin. A friendly, elderly man with glasses and a wool cap, made of clay, offers his hand for a handshake. He wears a dark wool coat and striped scarf, meticulously crafted from felt. Behind him, another elderly gentleman made of wood sits at a tiny chess table made of balsa wood, deeply engrossed in a game. A backdrop of a weathered concrete building with red window frames made of clay, lines the scene. Tiny, individually placed cobblestones cover the ground. The scene is photographed with a tilt-shift lens creating selective focus on the elderly man's outstretched hand. Dramatic miniature lighting casts long shadows and highlights the details of the cobblestones. Color palette: muted grays, browns, and reds, evoking a Cold War atmosphere. The mood is hopeful and subtly nostalgic, capturing a moment of connection in a miniature world.",
7,"An intricate miniature diorama of the hallway of the Arconia from ""Only Murders in the Building,"" featuring Charles, Oliver, and Mabel standing near an elevator door. Oliver gestures dramatically with open hands, Mabel looks concerned holding a cell phone and wearing a sequin dress, while Charles examines a small gold object with his fingers. The hallway is wood paneled and dimly lit, with an elevator door visible in the background. Photographed with a tilt-shift lens creating selective focus and a dreamlike toy-world atmosphere. Close-up shot, dramatic miniature lighting with warm highlights and deep shadows emphasizing the handcrafted textures. Tiny handcrafted details include felt clothing, clay faces with expressive features, and wooden paneling. Materials: felt, clay, wood, and tiny sequins. Color palette: muted golds, browns, and dark blues. Mood: Suspenseful, quirky, and charmingly handmade, with a hint of impending doom.",


## 05 - Add Image Generation to All Scenes

You've tested image and video generation on individual scenes using `.select().where()`. Now you'll add image generation as a computed column to process all scenes automatically.

**What changes when you use a computed column:**

1. **Processes all rows**: Generates images for all rows (here, we have 1 row per scene for each video) with a unique prompt per row
2. **Parallel execution**: Pixeltable parallelizes API calls automatically
3. **Persistent storage**: Results are stored in the table, not just in memory
4. **Incremental updates**: If you add new videos to the base table, the scene detection and image generation cascade automatically

Note on rate limits:

Many model providers, including Gemini, limit how many requests you can send per minute (RPM = requests per minute), per day (RPD = requests per day), or both (these also may vary by model, too). Pixeltable gives you a few options for working within those rate limits:

- First, you can do nothing. Pixeltable defaults to 600 requests per minute, but dynamically listens to model return messages to walk that pace back. For example, Google will tell you at some point to wait for 50+ seconds before submitting another API request. Pixeltable listens to these messages from Google and schedules requests appropriately.

- Second, you can set rate limits globally across all your Pixeltable projects with a `~./pixeltable/config.toml` per provider and per model. This is the best way to "cap" Pixeltable's API request pace based on known rate limits or budget constraints.

- Third, you can set rate limits as environment variables. This is a good option if you are working in a notebook environment. 

We are using the `imagen-4.0-fast-generate-001` model to generate images. At the time of this workshop, Gemini limits this model to 10 requests per minute with a max of 70 images per day, so you can run this code on all rows within the free tier. Optionally, you can use the following code to explicitly set a rate limit for your notebook session:

In [None]:
# Uncomment to set rate limits in your notebook
#os.environ['IMAGEN_RATE_LIMITS'] = '{"imagen-4.0-fast-generate-001": 10}'

This computed column will automatically generate a "diorama" image for each scene - it takes about 2 minutes to execute.

In [21]:
scene_view.add_computed_column(
    scene_image=pxtf.gemini.generate_images(
        prompt=scene_view.scene_prompt,
        model='imagen-4.0-fast-generate-001'
    ),
    if_exists='replace',
    on_error='ignore' # Continue processing despite individual row-wise failures
)

Added 18 column values with 0 errors in 37.99 s (0.47 rows/s)


18 rows updated.

**What happens when you execute this code:**

We've created a **declarative workflow** with three computed columns that form a processing pipeline:

1. `beginning_frame` - Extracts a frame from each video segment
2. `scene_prompt` - Generates a text description from the frame (multimodal AI)
3. `scene_image` - Generates an image from the text description (text-to-image AI)

Each computed column builds on the previous one. Pixeltable handles the orchestration - it knows the dependencies and executes them in the correct order automatically.

Let's take a look at our collection of generated images:

In [22]:
scene_view.where(
    scene_view.title.contains('Murders')
).select(
    scene_view.pos,
    scene_view.scene_prompt,
    scene_view.scene_image
).order_by(scene_view.title, scene_view.pos).collect()

pos,scene_prompt,scene_image
0,"An intricate miniature diorama of Mabel, Charles, and Oliver standing in Mabel's shadowy, cluttered apartment, a dead body outlined in chalk on the floorboards, miniature police evidence markers scattered around, photographed with a tilt-shift lens creating selective focus and a dreamlike toy-world atmosphere. Wide shot, dramatic backlighting from a tiny handcrafted lamp with a paper lampshade casting long, ominous shadows, tiny handcrafted details on the miniature furniture and the characters' felt clothes, materials of felt, clay, wood, color palette of dark greens, browns, and grays with pops of red from the evidence markers, mood of suspense and mystery.",
1,"An intricate miniature diorama of a home movie flashback scene from ""Only Murders in the Building"", featuring a young man running joyfully toward the camera outside of a brick building, photographed with a tilt-shift lens creating selective focus and a dreamlike toy-world atmosphere. Medium shot, dramatic miniature lighting with soft spotlights accentuating the man's face and the brick building, the background slightly blurred. Tiny handcrafted details like individual clay bricks, miniature felt clothing on the man, and small wooden building siding, creating a realistic, handcrafted feel. Materials include: felt, clay, wood, and miniature paint. The color palette is slightly desaturated with warm, nostalgic tones, reminiscent of old film stock; primarily brick red, white, and muted greens. The mood is one of innocent joy mixed with a hint of mystery due to the context of the show, feeling both cinematic and charmingly handmade.",
2,"An intricate miniature diorama of a young girl, crafted from felt and clay, hysterically crying and holding a vintage rotary phone receiver to her ear, against a simple pale green wall made of painted balsa wood, photographed with a tilt-shift lens creating selective focus and a dreamlike toy-world atmosphere. Medium shot. Overexposed, grainy, dramatic miniature lighting, shadows suggesting the scene is a memory, tiny handcrafted details emphasizing the girl's distraught expression, pastel color palette, deeply unsettling and nostalgic mood.",
3,"An intricate miniature diorama of a young man in a round above-ground swimming pool, submerged to his shoulders, with water splashed against his body, in a backyard setting with trees and overgrown shrubbery in the background, photographed with a tilt-shift lens creating selective focus and a dreamlike toy-world atmosphere. Medium shot. Gloomy, nostalgic lighting, with a slightly bluish-gray tint mimicking old home movie footage, and a bright spot shining on the young man. Tiny handcrafted details, the pool made of a thin corrugated metal-like material, the figure sculpted from clay, and the trees made of felt and miniature foliage. Materials: felt, clay, wood, corrugated metal. Color palette: muted blues, greens, grays, and browns, with a washed-out feel. Mood: reflective, melancholy, and slightly unsettling.",
4,"An intricate miniature diorama of Charles' Upper West Side apartment at night, recreating the scene after he finishes his podcast. The shot is a medium shot, focusing on an overstuffed armchair cluttered with a dark tailored blazer, scattered papers, and a magazine featuring Ben Glenroy, capturing a sense of cozy disarray; to the right, an antique silhouette portrait in a gilt frame leans against a stack of books. Photographed with a tilt-shift lens creating selective focus and a dreamlike t ...... andcrafted table lamp with a patterned shade casting long shadows, highlighting the dust motes in the air and the miniature details. Tiny handcrafted details include miniature books on shelves, a tiny ceramic bowl filled with miniature cookies, and a meticulously crafted silhouette portrait with distressed edges. The diorama is constructed from felt, clay, and wood. The color palette is muted browns, deep greens, and warm ambers. The mood is cozy, slightly cluttered, with a hint of mystery.",
5,"An intricate miniature diorama of the ""Only Murders in the Building"" trio – Charles, Oliver, and Mabel – inside the elevator of The Arconia. Charles is depicted partially exiting, holding the golden elevator door open slightly, while Mabel stands in the center, looking down at her phone. Oliver's back is to the viewer, obscuring part of the scene. Photographed with a tilt-shift lens creating selective focus, emphasizing the figures and blurring the background, creating a dreamlike toy-world ...... is dramatic miniature lighting, with a single, concentrated warm light source coming from the elevator's overhead light, casting long, soft shadows and highlighting the textures of the handcrafted details. Tiny handcrafted details include felt clothing, clay faces with expressive features, and a wooden elevator interior with carved details. The color palette is muted and earthy, with creams, browns, and golds dominating. The mood is mysterious and slightly ominous, with a hint of dark humor.",
6,"An intricate miniature diorama of three figures in an elevator, recreated from felt, clay, and wood. A woman in a sparkly dress stares intently at a miniature smartphone in her hands; to her left, a man in a miniature suit glances off to the side. To the right, a third man stares upwards toward a faux-wood ceiling with inlaid lighting panels. The elevator walls are crafted from faux-wood and a miniature hand rail goes around the elevator. Photographed with a tilt-shift lens creating selective focus and a dreamlike toy-world atmosphere. Medium shot, with dramatic miniature lighting emanating from overhead panels casting soft shadows. The color palette is warm and inviting, with dark wood tones, touches of gold sequins, and soft, diffused light. The mood is suspenseful, yet charmingly handmade, evoking a sense of mystery and whimsical intrigue.",
7,"An intricate miniature diorama of the hallway of the Arconia from ""Only Murders in the Building,"" featuring Charles, Oliver, and Mabel standing near an elevator door. Oliver gestures dramatically with open hands, Mabel looks concerned holding a cell phone and wearing a sequin dress, while Charles examines a small gold object with his fingers. The hallway is wood paneled and dimly lit, with an elevator door visible in the background. Photographed with a tilt-shift lens creating selective focus and a dreamlike toy-world atmosphere. Close-up shot, dramatic miniature lighting with warm highlights and deep shadows emphasizing the handcrafted textures. Tiny handcrafted details include felt clothing, clay faces with expressive features, and wooden paneling. Materials: felt, clay, wood, and tiny sequins. Color palette: muted golds, browns, and dark blues. Mood: Suspenseful, quirky, and charmingly handmade, with a hint of impending doom.",


In [23]:
scene_view.where(
    scene_view.title.contains('Queen')
).select(
    scene_view.pos,
    scene_view.scene_prompt,
    scene_view.scene_image
).order_by(scene_view.title, scene_view.pos).collect()

pos,scene_prompt,scene_image
0,"An intricate miniature diorama of a tense moment at a chess tournament in Moscow, 1968; a vintage VEF 206 transistor radio sits prominently in the foreground, slightly off-center, perched precariously on the edge of a dark green xylophone. Behind the radio, blurred but present, a large crowd of tiny, silhouetted figures observe the game with palpable anticipation. The diorama is photographed with a tilt-shift lens, creating selective focus; the radio is razor-sharp while the crowd fades into ...... the radio, highlighting its metallic details and the subtle glow of its dial; soft, diffused backlight emanates from the blurry crowd. Tiny handcrafted details include meticulously crafted knobs on the radio, individual figures in the crowd made from felt and clay, and a realistic wood grain on the xylophone. The color palette is muted and desaturated, dominated by blacks, grays, and dark greens, with hints of metallic silver on the radio. The mood is suspenseful, quiet, and contemplative.",
1,"An intricate miniature diorama of a tense late-night chess tournament scene from ""The Queen's Gambit,"" featuring a focused Russian chess grandmaster, Borgov, mid-game. He is hunched slightly forward over a chessboard, surrounded by a dimly lit hall populated by miniature, blurry figures representing the audience and officials. The hall's architecture reflects Soviet-era design, with miniature felt curtains and wood-paneled walls. A tilt-shift lens creates selective focus on Borgov and the ch ...... s of the hall, achieved through tiny LED spotlights. The tiny handcrafted details include meticulously painted chess pieces made of clay, miniature fabric details on Borgov’s suit, and textured felt representing the surrounding environment. The color palette is muted and warm, with shades of brown, beige, and deep red dominating the scene, reflecting the Cold War era. The mood is intense, contemplative, and slightly melancholic, capturing the high stakes and psychological depth of the game.",
2,"An intricate miniature diorama of Beth Harmon in a tense chess match, poised to make a winning move, the black queen centered prominently on the board, captured with a tilt-shift lens creating selective focus and a dreamlike toy-world atmosphere. Medium close-up shot, dramatic miniature lighting highlighting the polished chessboard and hand-painted chess pieces, made of tiny handcrafted details using felt, clay, and wood. Color palette of deep browns, blacks, and creamy whites, emphasizing the contrasting chess pieces and the antique feel of the room. The overall mood is one of intense focus, suspense, and the quiet thrill of impending victory.",
3,"An intricate miniature diorama of Beth Harmon, in a plain beige shirt, standing inside the darkened bathroom of a roadside motel. She is looking down at a tile with faint light. The walls are dirty beige. Photographed with a tilt-shift lens creating selective focus and a dreamlike toy-world atmosphere. Medium shot, eye-level. The scene is lit with a single, harsh, directional light source from the upper right, creating long, dramatic shadows emphasizing the grime. Tiny handcrafted details include miniature ceramic tiles, a tiny, clay sink with a faux-rust stain, and a miniature felt rendition of Beth's Afro. The color palette is muted and desaturated, dominated by shades of beige, brown, and grey. The mood is somber and isolating, capturing Beth's vulnerability.",
4,"An intricate miniature diorama of Beth Harmon arriving at the Central Chess Club in Moscow, surrounded by reporters and onlookers. Photographed with a tilt-shift lens creating selective focus and a dreamlike toy-world atmosphere. Medium shot, cold dramatic lighting emphasizing the stone architecture of the building and the small crowd, handcrafted details like tiny felt coats, clay faces with subtle expressions, and wooden accents on the building's facade. The color palette is muted, primarily greys, blues, and browns, with a touch of warmth on Beth's face. Mood: Intrigued and hopeful but with an undercurrent of tension and anticipation. The scene should feel both cinematic and charmingly handmade.",
5,"An intricate miniature diorama of Beth Harmon, a young woman with auburn hair, standing on a desolate road at dusk, suitcases at her feet; she's bundled in a stylish wool coat, gazing towards a tiny airplane on a distant tarmac. A vintage, slightly dilapidated Checker Cab stands parked on the shoulder of the road, its headlights casting long, dramatic shadows. Photographed with a tilt-shift lens creating selective focus and a dreamlike toy-world atmosphere. Medium shot, dramatic miniature lighting with a low key and shadows that emphasize the figure's isolation, handcrafted details like tiny felt luggage, meticulously detailed clay facial features on Beth, and a wooden base meticulously painted to resemble cracked asphalt. The color palette is muted and desaturated, dominated by grays, browns, and a hint of lavender in the twilight sky. Mood: melancholic, introspective, subtly hopeful.",
6,"An intricate miniature diorama of Beth Harmon, in a white wool coat and hat, walking on a cold, overcast day in Moscow with a black car parked behind her and a white one in the distance, in front of a drab soviet apartment building, photographed with a tilt-shift lens creating selective focus and a dreamlike toy-world atmosphere. Medium shot, dramatic miniature lighting highlighting Beth's face and the reflective chrome of the car, tiny handcrafted details like cobblestone sidewalk texture and bare winter tree branches made of wire and clay, materials including felt, clay, and balsa wood, muted color palette of grays, whites, and blacks, with a slight desaturation. The scene should feel both cinematic and charmingly handmade.",
7,"An intricate miniature diorama of an outdoor chess scene in 1960s Berlin. A friendly, elderly man with glasses and a wool cap, made of clay, offers his hand for a handshake. He wears a dark wool coat and striped scarf, meticulously crafted from felt. Behind him, another elderly gentleman made of wood sits at a tiny chess table made of balsa wood, deeply engrossed in a game. A backdrop of a weathered concrete building with red window frames made of clay, lines the scene. Tiny, individually placed cobblestones cover the ground. The scene is photographed with a tilt-shift lens creating selective focus on the elderly man's outstretched hand. Dramatic miniature lighting casts long shadows and highlights the details of the cobblestones. Color palette: muted grays, browns, and reds, evoking a Cold War atmosphere. The mood is hopeful and subtly nostalgic, capturing a moment of connection in a miniature world.",
8,"An intricate miniature diorama of Beth Harmon, in a cream-colored coat and hat, surrounded by smiling, chattering older Russian men in drab winter coats and hats. They stand outside a Moscow building. A tilt-shift lens is used, creating selective focus and a dreamlike toy-world atmosphere. Medium shot. Overcast, diffused daylight, with subtle miniature spotlighting on faces to enhance expression. Tiny handcrafted details: Individual strands of felt hair, realistically textured clay faces, miniature wool coats, tiny carved wooden buildings in the background. The color palette is muted grays, browns, and creams, capturing a Cold War aesthetic. The mood is warm, excited, and charmingly handmade.",
9,"An intricate miniature diorama of Beth Harmon, portrayed in clay, surrounded by blurred figures in felt coats, her face exhibiting a calm focus as she envisions chess moves, clasping her hands thoughtfully; tilt-shift lens creating selective focus on Beth, blurring the background into a dreamlike toy-world atmosphere. Close-up shot, low-key dramatic miniature lighting emphasizing the contours of Beth's face and the texture of her wool coat and beret, tiny handcrafted details including her makeup and watch, materials including soft felt for the crowd, smooth clay for Beth, and a backdrop of painted wood, a color palette of muted greys, blacks, and creams, with Beth's hair a striking red; the mood is pensive and intensely focused, creating a cinematic and charmingly handmade scene.",


Now that these are stored persistently, you can see the images have a spot in your local file cache. Read more about external files in our docs: https://docs.pixeltable.com/platform/external-files

In [24]:
scene_view.select(
    scene_view.scene_image,
    scene_view.scene_image.localpath
).limit(1).collect()

scene_image,scene_image_localpath
,/Users/alison-pxt/.pixeltable/media/dbff05ac987d4d389aca49c50dd087eb/a9/a9bf/dbff05ac987d4d389aca49c50dd087eb_17_9_a9bf0f192ba647f59706b92a5305015a.jpeg


If any rows encountered errors during processing (rate limits, API issues, etc.), you can recompute just those rows where Pixeltable logged errors:

In [25]:
# Recompute any rows that had errors
scene_view.recompute_columns(
    scene_view.scene_image,
    errors_only=True
)

No rows affected.

## 06 - (Optional) Video Generation

This section demonstrates two ways to use Gemini for video generation using queries to run each model on just two scenes total (one from each video in our base table). Video generation is slower and more expensive than text or image generation. 

- **Image Input: Generate Videos from Image Only** - Slower, simpler approach (`veo-3.1-generate-preview`)
- **Multimodal Inputs: Generate Videos from Prompt + Image** - Slower, more tokens, production quality (`veo-3.1-generate-preview`)

At the time of this workshop, the code we provide here should keep you safely in the free tier of four videos per day, but again we provide code to set rate limits.

In [None]:
# Uncomment to set rate limits in your notebook
#os.environ['VEO_RATE_LIMITS'] = '{"veo-3.1-generate-preview": 2}'

Some things to note about the following code examples:

1. All frames are resized from 1920x1080 to 640x360 before being sent to Gemini. Gemini's API limit for combined text and image data is 100MB per request (recently increased from 20MB). We resize to stay well under this limit.

1. In Pixeltable, you can chain image operations like [`.resize()`](https://docs.pixeltable.com/sdk/latest/image#udf-resize) directly on image columns - you don't need to save the resized image to disk or create a separate column to include it in your API call.

### Image Input: Animate an Image (No Text Prompt)

This simpler approach animates the frame without a text prompt - just the image itself.

In [31]:
scene_animation = scene_view.where(scene_view.pos == 7).select(
    scene_view.beginning_frame,
    scene_view.scene_image,
    animated_video=pxtf.gemini.generate_videos(
        image=scene_view.scene_image.resize((640, 640)),
        model='veo-3.1-generate-preview',
        config={'duration_seconds': 4}
    )
).collect()

In [32]:
scene_animation

beginning_frame,scene_image,animated_video
,,
,,


### Multimodal Inputs: Generate a Video from Prompt + Image

Test video generation using both the `scene_prompt` and the `scene_image`. Save the results of this query in memory as `scene_trailer`.

In [33]:
scene_trailer = scene_view.where(scene_view.pos == 7).select(
    scene_view.scene_image,
    scene_view.scene_prompt,
    trailer_video=pxtf.gemini.generate_videos(
        prompt=scene_view.scene_prompt,
        image=scene_view.scene_image.resize((640, 640)),
        model='veo-3.1-generate-preview',
        config={'duration_seconds': 4}
    )
).collect()

In [34]:
scene_trailer

scene_image,scene_prompt,trailer_video
,"An intricate miniature diorama of the hallway of the Arconia from ""Only Murders in the Building,"" featuring Charles, Oliver, and Mabel standing near an elevator door. Oliver gestures dramatically with open hands, Mabel looks concerned holding a cell phone and wearing a sequin dress, while Charles examines a small gold object with his fingers. The hallway is wood paneled and dimly lit, with an elevator door visible in the background. Photographed with a tilt-shift lens creating selective focus and a dreamlike toy-world atmosphere. Close-up shot, dramatic miniature lighting with warm highlights and deep shadows emphasizing the handcrafted textures. Tiny handcrafted details include felt clothing, clay faces with expressive features, and wooden paneling. Materials: felt, clay, wood, and tiny sequins. Color palette: muted golds, browns, and dark blues. Mood: Suspenseful, quirky, and charmingly handmade, with a hint of impending doom.",
,"An intricate miniature diorama of an outdoor chess scene in 1960s Berlin. A friendly, elderly man with glasses and a wool cap, made of clay, offers his hand for a handshake. He wears a dark wool coat and striped scarf, meticulously crafted from felt. Behind him, another elderly gentleman made of wood sits at a tiny chess table made of balsa wood, deeply engrossed in a game. A backdrop of a weathered concrete building with red window frames made of clay, lines the scene. Tiny, individually placed cobblestones cover the ground. The scene is photographed with a tilt-shift lens creating selective focus on the elderly man's outstretched hand. Dramatic miniature lighting casts long shadows and highlights the details of the cobblestones. Color palette: muted grays, browns, and reds, evoking a Cold War atmosphere. The mood is hopeful and subtly nostalgic, capturing a moment of connection in a miniature world.",


## Wrap-Up

You built a content generation pipeline using Pixeltable:

```
┌────────────────────────────────────────────────────────────────────────────┐
│  SCENE             PROMPT             GENERATE            ANIMATE          │
│                                                                            │
│  ┌────────┐       ┌────────┐        ┌──────────┐       ┌──────────┐        │
│  │ Frame  │──────▶│ Gemini │───────▶│  Imagen  │──────▶│   Veo    │        │
│  │ + Text │       │        │        │          │       │          │        │
│  └────────┘       └────────┘        └──────────┘       └──────────┘        │
│      │                │                   │                  │             │
│  Frame +          Scene              Diorama            Animated           │
│  metadata         prompt             image              video              │
│                                                                            │
└────────────────────────────────────────────────────────────────────────────┘
```

**What you built:**

You added content generation to your video pipeline. 

- Act 1 gave you visual search.  
- Act 2 gave you audio search.  
- Act 3 gives you generative AI—create new images and videos from any scene.

**Each step is declarative:**
- **Scene**: Frame and metadata from `scene_view` (built in Act 2)
- **Prompt**: Gemini generates scene descriptions from multimodal inputs
- **Generate**: Imagen creates images from prompts
- **Animate**: Veo turns images into video clips

**The complete pipeline:**

Across all three acts, you built a system that can search video by visual content, search by what's being said, and generate new content—all using declarative computed columns that automatically process new data.

**The final act:**

Save and share the scenes you generate with Pixeltable Cloud. Docs here: https://docs.pixeltable.com/platform/data-sharing

In [None]:
pxt.publish(
    source='primetime-workshop/scene_view', # replace with your own table name
    destination_uri='pxt://pixeltable:demos/primetime-scenes' # replace with your own username and dataset name
)

Creating a replica of 'primetime-workshop/scene_view' at: pxt://pixeltable:demos/primetime-scenes


Output()

Finalizing replica ...
The published table is now available at: pxt://pixeltable:demos/primetime-scenes


Or you can replicate mine!

In [None]:
import pixeltable as pxt

# Replicate this table to your local environment
local_table = pxt.replicate(
    remote_uri='pxt://pixeltable:demos/primetime-scenes',
    local_path='local_copy'  # Your local table name
)

# View the schema and data
local_table.limit(5).collect()