# Act 2: Iterating with Data


In Act 1, we built visual search by extracting frames at regular intervals. But what if we want to work with **semantic units** of content instead of arbitrary slices? Here we'll practice **iterating with data** - starting with basic media, then iteratively building up layers of enrichment (scenes, audio, transcripts, embeddings) as you refine your pipeline.

**The Challenge:** Multimodal content (videos, documents, audio) is organized into meaningful segments - scenes in video, chapters in documents, sections in audio. To search, analyze, or generate from content effectively, we need to identify these natural boundaries and process each segment individually. Without proper infrastructure, this requires managing separate systems for segmentation, extraction, and indexing.

**Pixeltable's Approach:**
- **Content-aware segmentation** - Automatically detect boundaries based on content changes (scenes, chapters, sections)
- **Views with iterators** - Expand segments into rows, enabling per-segment processing
- **Multimodal enrichment** - Extract and process audio, transcripts, and embeddings from each segment
- **Semantic search** - Build searchable embeddings for text-based retrieval across segments

**What you'll accomplish:**
- Detect meaningful boundaries using content-aware algorithms
- Create views that expand segments into processable units
- Extract audio and generate transcripts for each segment
- Build semantic search on transcripts for text-based content retrieval

**Documentation:** [Views in Pixeltable](https://docs.pixeltable.com/platform/views)


In [1]:
import pixeltable as pxt

In [2]:
pxt.list_tables()

Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/alison-pxt/.pixeltable/pgdata


['chess_vids']

In [None]:
v = pxt.get_table('chess_vids')

In [4]:
v

0
table 'chess_vids'

Column Name,Type,Computed With
video,Video,
duration,Float,video.get_duration()


## 01 - Detect Scene Boundaries

In Act 1, we extracted frames at regular intervals to get a specific number of frames. However, those frames might not reflect meaningful scene breaks - they're just evenly spaced snapshots of the video.

Here in Act 2, we'll use **content-aware scene detection** to find actual scene boundaries - places where the content changes significantly, not just arbitrary time intervals. As described in [PySceneDetect's features](https://www.scenedetect.com/features/), this detects breaks in-between content, identifying where meaningful scene changes occur.

We'll use content-based scene detection with tuned parameters to find these meaningful scene breaks:

In [19]:
# Add scene detection with tuned parameters
v.add_computed_column(
    scenes=v.video.scene_detect_histogram(
        fps=10,
        threshold=0.6,
        min_scene_len=100
    ),
    if_exists='replace'
)

Added 1 column value with 0 errors.


1 row updated, 1 value computed.

**What happens during processing:**

This operation analyzes the video frame-by-frame (at 10 frames per second) to detect visual changes. The algorithm:

- Compares frames in HSL (Hue, Saturation, Lightness) color space to measure visual differences
- Uses a threshold of 0.6 to determine if a change is significant enough to mark a scene boundary  
- Ensures each detected scene is at least 100 frames long (10 seconds at 10 fps) to filter out brief flashes

This typically takes about a minute for a 6-minute video. The output will be a JSON array with scene boundaries, where each scene has `start_time`, `start_pts`, and `duration` properties.

**Understanding the parameters:**
- `fps=10`: Analyzes 10 frames per second (balancing speed vs. accuracy)
- `threshold=0.6`: Sensitivity for detecting scene changes (lower = more sensitive)
- `min_scene_len=100`: Minimum scene length in frames (prevents very short false positives)


The `scenes` column contains a JSON array with scene boundaries. Check the updated table schema:


In [20]:
v

0
table 'chess_vids'

Column Name,Type,Computed With
video,Video,
duration,Float,video.get_duration()
scenes,Json,"video.scene_detect_histogram(fps=10,  threshold=0.6,  min_scene_len=100)"


In [21]:
v.collect()

video,duration,scenes
,377.043,"[{""duration"": 28.779, ""start_pts"": 0, ""start_time"": 0.}, {""duration"": 16.85, ""start_pts"": 690690, ""start_time"": 28.779}, {""duration"": 80.122, ""start_pts"": 1095094, ""start_time"": 45.629}, {""duration"": 28.737, ""start_pts"": 3018015, ""start_time"": 125.751}, {""duration"": 25.734, ""start_pts"": 3707704, ""start_time"": 154.488}, {""duration"": 51.301, ""start_pts"": 4325321, ""start_time"": 180.222}, {""duration"": 70.32, ""start_pts"": 5556551, ""start_time"": 231.523}, {""duration"": 7.132, ""start_pts"": 7244237, ""start_time"": 301.843}, {""duration"": 41.458, ""start_pts"": 7415408, ""start_time"": 308.975}, {""duration"": 11.261, ""start_pts"": 8410402, ""start_time"": 350.433}]"


## 02 - Create a View from Scenes

To create a view with video segments, we need to extract the scene start times from the `scenes` column. The `video_splitter` iterator needs an array of start times, which we can extract using JSON indexing: `v.scenes[1:].start_time` (skipping the first scene which typically starts at 0).

In [25]:
from pixeltable.functions.video import video_splitter

scenes = pxt.create_view(
    'scene_view',
    v,
    iterator=video_splitter(
        video=v.video,
        segment_times=v.scenes[1:].start_time,
        mode='fast',
    ),
    if_exists='replace'
)

Inserting rows into `scene_view`: 10 rows [00:00, 3923.21 rows/s]


The view now has one row per scene segment. Let's check the schema and see some sample rows:


In [26]:
scenes

0
view 'scene_view' (of 'chess_vids')

Column Name,Type,Computed With
pos,Required[Int],
segment_start,Float,
segment_start_pts,Int,
segment_end,Float,
segment_end_pts,Int,
video_segment,Required[Video],
video,Video,
duration,Float,video.get_duration()
scenes,Json,"video.scene_detect_histogram(fps=10,  threshold=0.6,  min_scene_len=100)"


The `video_splitter` iterator adds the following columns to the view:

- `pos`: Position/index of the segment
- `video_segment`: The actual video segment file
- `segment_start`: Start time of the segment in seconds
- `segment_end`: End time of the segment in seconds
- `segment_start_pts` / `segment_end_pts`: Presentation timestamps (for advanced use)

In [27]:
scenes.select(scenes.pos, scenes.segment_start, scenes.segment_end, scenes.video_segment).tail()

pos,segment_start,segment_end,video_segment
0,0.0,28.779,
1,28.779,45.629,
2,45.629,125.751,
3,125.751,154.488,
4,154.488,180.222,
5,180.222,231.524,
6,231.524,301.843,
7,301.843,308.975,
8,308.975,350.433,
9,350.433,377.043,


## 03 - Add Audio & Transcripts

Let's enrich our scene view with audio and transcripts. Since dialogue is typically sparse, scene-level transcription and embeddings are sufficient for search.


### Extract & Transcribe Audio

Extract audio from each scene's video segment and transcribe it using Whisper.


In [28]:
# Extract audio from video segments
scenes.add_computed_column(
    audio=scenes.video_segment.extract_audio(),
    if_exists='replace'
)

Added 10 column values with 0 errors.


10 rows updated, 10 values computed.

Transcribe audio using OpenAI's Whisper model. There are two options:

- **Local Whisper:** Free, no API key needed, but slower
- **OpenAI API:** Faster, but requires an API key and costs money

We'll use local Whisper for this example:


In [29]:
from pixeltable.functions import whisper

scenes.add_computed_column(
    transcription=whisper.transcribe(scenes.audio, model='base'),
    if_exists='replace'
)



Added 10 column values with 0 errors.


10 rows updated, 10 values computed.

In [None]:
# Alternative: Using OpenAI API for transcription (faster, but requires API key)
# Uncomment and use this instead of the built-in whisper function below
# 
# First, install the openai package: pip install openai
# Set your API key: export OPENAI_API_KEY='your-api-key-here'
#
# from pixeltable.functions.openai import transcriptions
# 
# scenes.add_computed_column(
#     api_tx=transcriptions(scenes.audio, model='whisper-1'),
#     if_exists='replace'
# )

In [35]:
scenes.select(scenes.pos, scenes.video_segment, scenes.transcription).tail()

pos,video_segment,transcription
0,,"{""text"": "" That check has been the whole point of the sequence beginning with the bishop cutting down the slope of the book by forcing it to a last threatening left. Question is... What will she do now?"", ""language"": ""en"", ""segments"": [{""id"": 0, ""end"": 4.96, ""seek"": 0, ""text"": "" That check has been the whole point of the sequence beginning with the bishop"", ""start"": 0., ""tokens"": [50364, 663, 1520, 575, 668, 264, ..., 8310, 2863, 365, 264, 34470, 50612], ""avg_logprob"": -0.556, ""temperature"": 0., ""no_speech_prob"": 0.364, ""compression_ratio"": 1.384}, {""id"": 1, ""end"": 8.16, ""seek"": 0, ""text"": "" cutting down the slope of the book by forcing it to a last threatening left."", ""start"": 4.96, ""tokens"": [50612, 6492, 760, 264, 13525, 295, ..., 257, 1036, 20768, 1411, 13, 50772], ""avg_logprob"": -0.556, ""temperature"": 0., ""no_speech_prob"": 0.364, ""compression_ratio"": 1.384}, {""id"": 2, ""end"": 10.16, ""seek"": 0, ""text"": "" Question is..."", ""start"": 8.16, ""tokens"": [50772, 14464, 307, 485, 50872], ""avg_logprob"": -0.556, ""temperature"": 0., ""no_speech_prob"": 0.364, ""compression_ratio"": 1.384}, {""id"": 3, ""end"": 12.16, ""seek"": 0, ""text"": "" What will she do now?"", ""start"": 10.16, ""tokens"": [50872, 708, 486, 750, 360, 586, 30, 50972], ""avg_logprob"": -0.556, ""temperature"": 0., ""no_speech_prob"": 0.364, ""compression_ratio"": 1.384}]}"
1,,"{""text"": "" \u041e\u0431 1918 \u0440\u044f\u0434\u0430 \u041d\u0435\u0442immer \u0432 positioned"", ""language"": ""ru"", ""segments"": [{""id"": 0, ""end"": 3.2, ""seek"": 0, ""text"": "" \u041e\u0431 1918 \u0440\u044f\u0434\u0430"", ""start"": 0.6, ""tokens"": [50394, 22853, 36588, 1475, 681, 3444, 50524], ""avg_logprob"": -5.472, ""temperature"": 1., ""no_speech_prob"": 0.311, ""compression_ratio"": 0.8}, {""id"": 1, ""end"": 10.34, ""seek"": 0, ""text"": "" \u041d\u0435\u0442immer \u0432 positioned"", ""start"": 7.62, ""tokens"": [50745, 21249, 14477, 740, 24889, 50881], ""avg_logprob"": -5.472, ""temperature"": 1., ""no_speech_prob"": 0.311, ""compression_ratio"": 0.8}]}"
2,,"{""text"": "" It's your game. Take it. It's your game."", ""language"": ""en"", ""segments"": [{""id"": 0, ""end"": 2., ""seek"": 0, ""text"": "" It's your game."", ""start"": 0., ""tokens"": [50364, 467, 311, 428, 1216, 13, 50464], ""avg_logprob"": -0.841, ""temperature"": 0., ""no_speech_prob"": 0.025, ""compression_ratio"": 0.75}, {""id"": 1, ""end"": 8., ""seek"": 0, ""text"": "" Take it."", ""start"": 6., ""tokens"": [50664, 3664, 309, 13, 50764], ""avg_logprob"": -0.841, ""temperature"": 0., ""no_speech_prob"": 0.025, ""compression_ratio"": 0.75}, {""id"": 2, ""end"": 62., ""seek"": 6000, ""text"": "" It's your game."", ""start"": 60., ""tokens"": [50364, 467, 311, 428, 1216, 13, 50464], ""avg_logprob"": -0.874, ""temperature"": 0.2, ""no_speech_prob"": 0.263, ""compression_ratio"": 0.652}]}"
3,,"{""text"": "" Thank you. Good for you, Crackle. Good for you."", ""language"": ""en"", ""segments"": [{""id"": 0, ""end"": 2., ""seek"": 0, ""text"": "" Thank you."", ""start"": 0., ""tokens"": [50364, 1044, 291, 13, 50464], ""avg_logprob"": -0.614, ""temperature"": 0., ""no_speech_prob"": 0.004, ""compression_ratio"": 1.175}, {""id"": 1, ""end"": 21.2, ""seek"": 0, ""text"": "" Good for you, Crackle."", ""start"": 19.2, ""tokens"": [51324, 2205, 337, 291, 11, 4779, 501, 306, 13, 51424], ""avg_logprob"": -0.614, ""temperature"": 0., ""no_speech_prob"": 0.004, ""compression_ratio"": 1.175}, {""id"": 2, ""end"": 25.8, ""seek"": 0, ""text"": "" Good for you."", ""start"": 23.8, ""tokens"": [51554, 2205, 337, 291, 13, 51654], ""avg_logprob"": -0.614, ""temperature"": 0., ""no_speech_prob"": 0.004, ""compression_ratio"": 1.175}]}"
4,,"{""text"": """", ""language"": ""en"", ""segments"": []}"
5,,"{""text"": "" The President has invited you to the White House. There'll be a chess board set up in the Oval Office, and of course a photo op of you kicking hi ...... a list of talking points. It's a big deal beating the Soviets at the wrong game. Could you stop the car, please? I'd like to walk. To the airport."", ""language"": ""en"", ""segments"": [{""id"": 0, ""end"": 3., ""seek"": 0, ""text"": "" The President has invited you to the White House."", ""start"": 0., ""tokens"": [50364, 440, 3117, 575, 9185, 291, 281, 264, 5552, 4928, 13, 50514], ""avg_logprob"": -0.417, ""temperature"": 0., ""no_speech_prob"": 0.225, ""compression_ratio"": 2.148}, {""id"": 1, ""end"": 5., ""seek"": 0, ""text"": "" There'll be a chess board set up in the Oval Office,"", ""start"": 3., ""tokens"": [50514, 821, 603, 312, 257, 24122, ..., 264, 422, 3337, 8935, 11, 50614], ""avg_logprob"": -0.417, ""temperature"": 0., ""no_speech_prob"": 0.225, ""compression_ratio"": 2.148}, {""id"": 2, ""end"": 8., ""seek"": 0, ""text"": "" and of course a photo op of you kicking his ass."", ""start"": 5., ""tokens"": [50614, 293, 295, 1164, 257, 5052, 999, 295, 291, 19137, 702, 1256, 13, 50764], ""avg_logprob"": -0.417, ""temperature"": 0., ""no_speech_prob"": 0.225, ""compression_ratio"": 2.148}, {""id"": 3, ""end"": 11., ""seek"": 0, ""text"": "" Texas being more of a checker state."", ""start"": 8., ""tokens"": [50764, 7885, 885, 544, 295, 257, 1520, 260, 1785, 13, 50914], ""avg_logprob"": -0.417, ""temperature"": 0., ""no_speech_prob"": 0.225, ""compression_ratio"": 2.148}, {""id"": 4, ""end"": 15., ""seek"": 0, ""text"": "" There's a dinner tonight after the reception"", ""start"": 13., ""tokens"": [51014, 821, 311, 257, 6148, 4440, 934, 264, 21682, 51114], ""avg_logprob"": -0.417, ""temperature"": 0., ""no_speech_prob"": 0.225, ""compression_ratio"": 2.148}, {""id"": 5, ""end"": 18., ""seek"": 0, ""text"": "" at the Russian chess club in Georgetown."", ""start"": 15., ""tokens"": [51114, 412, 264, 7220, 24122, 6482, 294, 34848, 13, 51264], ""avg_logprob"": -0.417, ""temperature"": 0., ""no_speech_prob"": 0.225, ""compression_ratio"": 2.148}, ..., {""id"": 13, ""end"": 33., ""seek"": 2800, ""text"": "" The visitor has been asked to be prepared to leave the residence at the White House."", ""start"": 31., ""tokens"": [50514, 440, 28222, 575, 668, 2351, ..., 412, 264, 5552, 4928, 13, 50614], ""avg_logprob"": -0.748, ""temperature"": 0.8, ""no_speech_prob"": 0.042, ""compression_ratio"": 1.878}, {""id"": 14, ""end"": 38., ""seek"": 2800, ""text"": "" This and its belong, so we've prepared a list of talking points."", ""start"": 33., ""tokens"": [50614, 639, 293, 1080, 5784, 11, ..., 1329, 295, 1417, 2793, 13, 50864], ""avg_logprob"": -0.748, ""temperature"": 0.8, ""no_speech_prob"": 0.042, ""compression_ratio"": 1.878}, {""id"": 15, ""end"": 43., ""seek"": 2800, ""text"": "" It's a big deal beating the Soviets at the wrong game."", ""start"": 38., ""tokens"": [50864, 467, 311, 257, 955, 2028, 13497, 264, 41354, 412, 264, 2085, 1216, 13, 51114], ""avg_logprob"": -0.748, ""temperature"": 0.8, ""no_speech_prob"": 0.042, ""compression_ratio"": 1.878}, {""id"": 16, ""end"": 46., ""seek"": 2800, ""text"": "" Could you stop the car, please?"", ""start"": 43., ""tokens"": [51114, 7497, 291, 1590, 264, 1032, 11, 1767, 30, 51264], ""avg_logprob"": -0.748, ""temperature"": 0.8, ""no_speech_prob"": 0.042, ""compression_ratio"": 1.878}, {""id"": 17, ""end"": 49., ""seek"": 2800, ""text"": "" I'd like to walk."", ""start"": 46., ""tokens"": [51264, 286, 1116, 411, 281, 1792, 13, 51414], ""avg_logprob"": -0.748, ""temperature"": 0.8, ""no_speech_prob"": 0.042, ""compression_ratio"": 1.878}, {""id"": 18, ""end"": 50., ""seek"": 2800, ""text"": "" To the airport."", ""start"": 49., ""tokens"": [51414, 1407, 264, 10155, 13, 51464], ""avg_logprob"": -0.748, ""temperature"": 0.8, ""no_speech_prob"": 0.042, ""compression_ratio"": 1.878}]}"
6,,"{""text"": "" You're gonna miss the flood? Come on. Come on. Lisa, come on."", ""language"": ""en"", ""segments"": [{""id"": 0, ""end"": 2., ""seek"": 0, ""text"": "" You're gonna miss the flood?"", ""start"": 0., ""tokens"": [50364, 509, 434, 799, 1713, 264, 10481, 30, 50464], ""avg_logprob"": -0.608, ""temperature"": 0., ""no_speech_prob"": 0.074, ""compression_ratio"": 0.778}, {""id"": 1, ""end"": 62., ""seek"": 6000, ""text"": "" Come on."", ""start"": 60., ""tokens"": [50364, 2492, 322, 13, 50464], ""avg_logprob"": -0.666, ""temperature"": 0., ""no_speech_prob"": 0.121, ""compression_ratio"": 1.143}, {""id"": 2, ""end"": 68., ""seek"": 6000, ""text"": "" Come on."", ""start"": 66., ""tokens"": [50664, 2492, 322, 13, 50764], ""avg_logprob"": -0.666, ""temperature"": 0., ""no_speech_prob"": 0.121, ""compression_ratio"": 1.143}, {""id"": 3, ""end"": 71., ""seek"": 6000, ""text"": "" Lisa, come on."", ""start"": 69., ""tokens"": [50814, 12252, 11, 808, 322, 13, 50914], ""avg_logprob"": -0.666, ""temperature"": 0., ""no_speech_prob"": 0.121, ""compression_ratio"": 1.143}]}"
7,,"{""text"": "" Da! Hormuz!"", ""language"": ""tr"", ""segments"": [{""id"": 0, ""end"": 1.44, ""seek"": 0, ""text"": "" Da!"", ""start"": 0., ""tokens"": [50364, 3933, 0, 50436], ""avg_logprob"": -1.719, ""temperature"": 1., ""no_speech_prob"": 0.072, ""compression_ratio"": 0.579}, {""id"": 1, ""end"": 5.14, ""seek"": 0, ""text"": "" Hormuz!"", ""start"": 3.94, ""tokens"": [50561, 389, 687, 3334, 0, 50621], ""avg_logprob"": -1.719, ""temperature"": 1., ""no_speech_prob"": 0.072, ""compression_ratio"": 0.579}]}"
8,,"{""text"": "" ..."", ""language"": ""fr"", ""segments"": [{""id"": 0, ""end"": 26.12, ""seek"": 0, ""text"": "" ..."", ""start"": 0., ""tokens"": [50364, 1097, 51670], ""avg_logprob"": -2.352, ""temperature"": 1., ""no_speech_prob"": 0.217, ""compression_ratio"": 0.273}]}"
9,,"{""text"": "" \u041c\u043e\u0436\u0435\u043c \u0438\u8000\u043a\u0430\u0442\u044c\u0441\u044f \u043d\u0430acz\u043a\u0438\u62d2\u0436\u0435\u043c"", ""language"": ""ru"", ""segments"": [{""id"": 0, ""end"": 0.74, ""seek"": 0, ""text"": "" \u041c\u043e\u0436\u0435\u043c \u0438"", ""start"": 0., ""tokens"": [50364, 3493, 2161, 1504, 1006, 50401], ""avg_logprob"": -4.671, ""temperature"": 1., ""no_speech_prob"": 0.063, ""compression_ratio"": 0.891}, {""id"": 1, ""end"": 2.9, ""seek"": 0, ""text"": ""\u8000\u043a\u0430\u0442\u044c\u0441\u044f \u043d\u0430"", ""start"": 1.52, ""tokens"": [50440, 4450, 222, 755, 8525, 1470, 50509], ""avg_logprob"": -4.671, ""temperature"": 1., ""no_speech_prob"": 0.063, ""compression_ratio"": 0.891}, {""id"": 2, ""end"": 4.22, ""seek"": 0, ""text"": ""acz\u043a\u0438"", ""start"": 3.12, ""tokens"": [50520, 14875, 2241, 50575], ""avg_logprob"": -4.671, ""temperature"": 1., ""no_speech_prob"": 0.063, ""compression_ratio"": 0.891}, {""id"": 3, ""end"": 9.7, ""seek"": 0, ""text"": ""\u62d2\u0436\u0435\u043c"", ""start"": 8.4, ""tokens"": [50784, 6852, 240, 1820, 1504, 50849], ""avg_logprob"": -4.671, ""temperature"": 1., ""no_speech_prob"": 0.063, ""compression_ratio"": 0.891}]}"


We can extract the text and language from the transcription JSON:

In [40]:
# Extract the text and language from the transcription JSON
# Cast to String type so we can use it for embedding index
scenes.add_computed_column(
    transcript_text=scenes.transcription.text.astype(pxt.String),
    if_exists='replace'
)

scenes.add_computed_column(
    transcript_lang=scenes.transcription.language,
    if_exists='replace'
)

Added 10 column values with 0 errors.
Added 10 column values with 0 errors.


10 rows updated, 10 values computed.

In [41]:
scenes.select(scenes.pos, scenes.video_segment, scenes.transcript_text, scenes.transcript_lang).tail()

pos,video_segment,transcript_text,transcript_lang
0,,That check has been the whole point of the sequence beginning with the bishop cutting down the slope of the book by forcing it to a last threatening left. Question is... What will she do now?,en
1,,Об 1918 ряда Нетimmer в positioned,ru
2,,It's your game. Take it. It's your game.,en
3,,"Thank you. Good for you, Crackle. Good for you.",en
4,,,en
5,,"The President has invited you to the White House. There'll be a chess board set up in the Oval Office, and of course a photo op of you kicking his ass. Texas being more of a checker state. There's a dinner tonight after the reception at the Russian chess club in Georgetown. A lot of prominent visitors belong, so I'm going to have to go back to the White House. I'm going to have to go back to the White House. I'm going to have to go back to the White House. I'm going to have to go back to the White House. The visitor has been asked to be prepared to leave the residence at the White House. The visitor has been asked to be prepared to leave the residence at the White House. This and its belong, so we've prepared a list of talking points. It's a big deal beating the Soviets at the wrong game. Could you stop the car, please? I'd like to walk. To the airport.",en
6,,"You're gonna miss the flood? Come on. Come on. Lisa, come on.",en
7,,Da! Hormuz!,tr
8,,...,fr
9,,Можем и耀каться наaczки拒жем,ru


### Create Embedding Index & Search

Now let's create an embedding index on the transcript text to enable semantic search.

We'll use a sentence transformer model from Hugging Face. Sentence transformers are models trained to convert text into dense vector representations (embeddings) that capture semantic meaning. Pixeltable integrates these models from Hugging Face's model hub - in this case, we're using `sentence-transformers/all-mpnet-base-v2`, which is optimized for semantic similarity tasks and works well for search.

See the [Sentence Transformers documentation](https://www.sbert.net/) for more details.

In [42]:
from pixeltable.functions.huggingface import sentence_transformer

scenes.add_embedding_index(
    scenes.transcript_text,
    embedding=sentence_transformer.using(model_id='sentence-transformers/all-mpnet-base-v2')
)

Now we can perform text-based semantic search on the transcripts:

In [46]:
# Search for scenes similar to a query string
query_text = "playing a chess game"
sim = scenes.transcript_text.similarity(query_text)

results = (
    scenes.where(sim >= 0.3)  # Minimum similarity threshold
    .order_by(sim, asc=False)  # Order by similarity (highest first)
    .select(scenes.transcript_text, scenes.pos, scenes.video_segment, similarity=sim)
    .limit(5)
    .collect()
)
results

transcript_text,pos,video_segment,similarity
"The President has invited you to the White House. There'll be a chess board set up in the Oval Office, and of course a photo op of you kicking his ass. Texas being more of a checker state. There's a dinner tonight after the reception at the Russian chess club in Georgetown. A lot of prominent visitors belong, so I'm going to have to go back to the White House. I'm going to have to go back to the White House. I'm going to have to go back to the White House. I'm going to have to go back to the White House. The visitor has been asked to be prepared to leave the residence at the White House. The visitor has been asked to be prepared to leave the residence at the White House. This and its belong, so we've prepared a list of talking points. It's a big deal beating the Soviets at the wrong game. Could you stop the car, please? I'd like to walk. To the airport.",5,,0.336
It's your game. Take it. It's your game.,2,,0.303


🎉 **You just built semantic search for your video transcripts!**

With just a few lines of code, you:
- Detected meaningful scene boundaries using content-aware scene detection
- Created a view with one row per scene segment
- Extracted audio and transcribed each scene
- Built a searchable embedding index on transcripts
- Found relevant scenes by text similarity

This demonstrates the power of iterating with data in Pixeltable - scenes, views, and semantic search all working together to make your video content searchable.


## 04 - Persistence Demo

The power of Pixeltable's persistent storage: your data, computed columns, and views survive kernel restarts!

**Trust us!** Clear all your outputs, restart your kernel, then run the code below:


In [47]:
%reset -f

In [48]:
import pixeltable as pxt
pxt.list_tables()

['chess_vids', 'scene_view']

Any of these tables and views are retrievable because Pixeltable persistently stores all of our data and computed columns. Let's get the scene view we created:

In [49]:
scenes = pxt.get_table('scene_view')

Check the schema - notice all our computed columns (audio, transcription, transcript_text, etc.) are still there:


In [50]:
scenes

0
view 'scene_view' (of 'chess_vids')

Column Name,Type,Computed With
pos,Required[Int],
segment_start,Float,
segment_start_pts,Int,
segment_end,Float,
segment_end_pts,Int,
video_segment,Required[Video],
audio,Required[Audio],video_segment.extract_audio()
transcription,Required[Json],"transcribe(audio, model='base')"
transcript_text,String,transcription.text.astype(String)
transcript_lang,Json,transcription.language

Index Name,Column,Metric,Embedding
idx1,transcript_text,cosine,"sentence_transformer(transcript_text, model_id='sentence-transformers/all-mpnet-base-v2', normalize_embeddings=False)"


All the data persists too - you can query it immediately without recomputing:


In [51]:
scenes.select(scenes.pos, scenes.transcript_text).head()

pos,transcript_text
0,That check has been the whole point of the sequence beginning with the bishop cutting down the slope of the book by forcing it to a last threatening left. Question is... What will she do now?
1,Об 1918 ряда Нетimmer в positioned
2,It's your game. Take it. It's your game.
3,"Thank you. Good for you, Crackle. Good for you."
4,
5,"The President has invited you to the White House. There'll be a chess board set up in the Oval Office, and of course a photo op of you kicking his ass. Texas being more of a checker state. There's a dinner tonight after the reception at the Russian chess club in Georgetown. A lot of prominent visitors belong, so I'm going to have to go back to the White House. I'm going to have to go back to the White House. I'm going to have to go back to the White House. I'm going to have to go back to the White House. The visitor has been asked to be prepared to leave the residence at the White House. The visitor has been asked to be prepared to leave the residence at the White House. This and its belong, so we've prepared a list of talking points. It's a big deal beating the Soviets at the wrong game. Could you stop the car, please? I'd like to walk. To the airport."
6,"You're gonna miss the flood? Come on. Come on. Lisa, come on."
7,Da! Hormuz!
8,...
9,Можем и耀каться наaczки拒жем


## Appendix - JSON Parsing Examples


This section covers JSON parsing in more detail, including step-by-step exploration of the JSON structure and different ways to access query results.


### Exploring JSON Structure

To create a view with video segments, we need to extract the scene start times from our `scenes` column. Let's build up to this step by step, exploring the JSON structure along the way.

**Step 1:** First, let's see what the `scenes` column contains:


In [None]:
# Step 1: Look at the scenes column structure
v.select(v.scenes).collect()

**Step 2:** The `scenes` column contains a JSON array. Let's access the first scene to see its structure:

In [None]:
# Step 2: Access the first scene element
v.select(v.scenes[0]).collect()

**Step 3:** Each scene has properties like `start_time` and `end_time`. Let's access the `start_time` of the first scene:


In [None]:
# Step 3: Access the start_time property of the first scene
v.select(v.scenes[0].start_time).collect()


**Step 4:** Now let's slice the array to get all scenes from index 1 onwards (skipping the first scene, which typically starts at 0):


In [None]:
# Step 4: Slice to get scenes from index 1 onwards
v.select(v.scenes[1:]).collect()


**Step 5:** Now access the `start_time` property for all scenes in the slice. Here, we'll also name the column `times`:


In [None]:
# Step 5: Access start_time for all scenes from index 1 onwards
v.select(times=v.scenes[1:].start_time).collect()

### Accessing Query Results

Using `select()`, we are composing a query to run. When you run a query, Pixeltable gives you a few ways to interact with the results:

1. You can convert to a list of dictionaries
2. You can index by row/column `[0,0]` and by column name

**Example 1:** Convert to a list of dictionaries


In [None]:
result = v.select(times=v.scenes[1:].start_time).collect()

In [None]:
result  # Returns as a table

In [None]:
result[0]  # Returns first row as dict

In [None]:
result['times']  # Returns list of times values

**Example 2:** Index by row/column `[0,0]` and by column name

In [None]:
# Index by position [row, column]
first_value = result[0, 0]  # First row, first column
first_value

In [None]:
# Index by column name
first_time = result[0, "times"]  # First row, "times" column
first_time