# 📸🗣️ Scene Index: What was on the Screen When X was spoken

<a href="https://colab.research.google.com/github/video-db/videodb-cookbook/blob/main/examples/What_was_on_the_screen_when_x_was_spoken.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction

This tutorial explores an advanced technique for retrieving visual information from video content based on spoken word queries. 

Specifically, we'll focus on a use case of finding information on slides in a video recording of a speech. This approach combines VideoDB's powerful scene indexing capabilities with spoken word search to create a robust, multimodal search pipeline.

## Setup 
---

### 📦  Installing packages 

In [None]:
!pip install videodb

### 🔑 API keys
Before proceeding, ensure access to [VideoDB](https://videodb.io). If not, sign up for API access on the respective platforms.

> Get your API key from [VideoDB Console](https://console.videodb.io). ( Free for first 50 uploads, **No credit card required** ) 🎉

In [1]:
import os

os.environ["VIDEO_DB_API_KEY"] = ""

## Tutorial Walkthrough

---

### 📋 Step 1: Connect to VideoDB

Gear up by establishing a connection to VideoDB 

In [2]:
from videodb import connect

# Connect to VideoDB using your API key
conn = connect()
coll = conn.get_collection()

### 🎬 Step 2: Upload the Video 

In [3]:
video = coll.upload(url="https://www.youtube.com/watch?v=IEe-5VOv0Js")

### 📸🗣️ Step 3: Index the Video on different Modalities

#### 🗣️ Indexing Spoken Content
---

In [4]:
# Index spoken content

video.index_spoken_words()

100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:58<00:00,  1.72it/s]


#### 📸️ Find Right Configuration for Scene Indexing
---

To learn more about Scene Index, explore the following guides:

- [Quickstart Guide](https://github.com/video-db/videodb-cookbook/blob/main/quickstart/Scene%20Index%20QuickStart.ipynb) guide provides a step-by-step introduction to Scene Index. It's ideal for getting started quickly and understanding the primary functions.

- [Scene Extraction Options Guide](https://github.com/video-db/videodb-cookbook/blob/main/guides/scene-index/playground_scene_extraction.ipynb) delves deeper into the various options available for scene extraction within Scene Index. It covers advanced settings, customization features, and tips for optimizing scene extraction based on different needs and preferences.





1.**Finding the Right Configuration for Scene Extraction**



In [None]:
from PIL import Image
import requests


# Helper function that will help us view the Scene Collection Images
def display_scenes(scenes, images=True):
    for scene in scenes:
        print(f"{scene.id} : {scene.start}-{scene.end}")
        if images:
            for frame in scene.frames:
                im = Image.open(requests.get(frame.url, stream=True).raw)
                display(im)
        print("----")


scene_collection_default = video.extract_scenes()
display_scenes(scene_collection_default.scenes)

**Adjusting the Threshold to get required frames**

In [None]:
from videodb import SceneExtractionType

scene_collection = video.extract_scenes(
    extraction_type=SceneExtractionType.shot_based,
    extraction_config={
        "threshold": 10,
    },
)
display_scenes(scene_collection.scenes)

2.✍️ **Finding the Right prompt for Indexing**


We will test on some sample Scenes with our prompt which is designed to focus on slide content

In [None]:
for scene in scene_collection.scenes[20:23]:
    description = scene.describe(
        "Give the content writen on the slides, output None if it isn't the slides."
    )
    print(f"{scene.id} : {scene.start}-{scene.end}")
    print(description)
    print("-----")

This output looks good enough, right? 
Now that we have found the right configuration for our Scene Indexing, it's like we've found the perfect match—let's commit to indexing those scenes ✨!

### 🎥 Index Scenes With The Finalized Config and Prompt
---

In [None]:
# Help function to View the Scene Index
def display_scene_index(scene_index):
    for scene in scene_index:
        print(f"{scene['start']} - {scene['end']}")
        print(scene["description"])
        print("----")


scene_index_id = video.index_scenes(
    prompt="Give the content writen on the slides, output None if it isn't the slides.",
    name="slides_index",
    extraction_type=SceneExtractionType.shot_based,
    extraction_config={
        "threshold": 10,
    },
)
print(scene_index_id)
scene_index = video.get_scene_index(scene_index_id)

In [None]:
display_scene_index(scene_index)

### Step4 : 🔍 Search Pipeline

---

The heart of this approach is the search pipeline, which combines spoken word search with scene indexing

This pipeline does the following:

1. Performs a keyword search on the spoken word index.
2. Extracts time ranges from the search results.
3. Retrieves the scene index.
4. Filters scenes based on the time ranges from the spoken word search.
5. Returns the filtered scenes, which should contain relevant slide content.

In [16]:
def simple_filter_scenes(time_ranges, scene_dicts):
    def is_in_range(scene, range_start, range_end):
        scene_start = scene["start"]
        scene_end = scene["end"]
        return (
            (range_start <= scene_start <= range_end)
            or (range_start <= scene_end <= range_end)
            or (scene_start <= range_start and scene_end >= range_end)
        )

    filtered_scenes = []
    for start, end in time_ranges:
        filtered_scenes.extend(
            [scene for scene in scene_dicts if is_in_range(scene, start, end)]
        )

    # Remove duplicates while preserving order
    seen = set()
    return [
        scene
        for scene in filtered_scenes
        if not (tuple(scene.items()) in seen or seen.add(tuple(scene.items())))
    ]

In [19]:
from videodb import IndexType, SearchType


def search_pipeline(query, video):

    # Search Query in Spoken Word Index
    search_result = video.search(
        query=query, index_type=IndexType.spoken_word, search_type=SearchType.keyword
    )
    time_ranges = [(shot.start, shot.end) for shot in search_result.get_shots()]

    scenes = scene_index

    # TODO: Check why it is string, server should send float only.
    for scene in scenes:
        scene["start"] = float(scene["start"])
        scene["end"] = float(scene["end"])

    # Filter Scene on the basis of Spoken results
    final_result = simple_filter_scenes(time_ranges, scenes)

    # Return Scene descriptions and Video Timelines of result
    result_text = "\n\n".join(
        result_entry["description"]
        for result_entry in final_result
        if result_entry.get("description", "").lower().strip() != "none"
    )
    result_timeline = [
        (result_entry.get("start"), result_entry.get("end"))
        for result_entry in final_result
    ]

    return result_text, result_timeline

### 👀 Viewing the Search Results
---

This will return scenes where the spoken words match your query, along with the content of any slides visible in those scenes.

In [None]:
from videodb import play_stream

query = "hard and fast rule"

result_text, result_timeline = search_pipeline(query, video)

stream_link = video.generate_stream(result_timeline)
play_stream(stream_link)

print(result_text)

In [None]:
query = "stripe api review"

result_text, result_timeline = search_pipeline(query, video)

stream_link = video.generate_stream(result_timeline)
play_stream(stream_link)

print(result_text)

In [None]:
query = "friction log"

result_text, result_timeline = search_pipeline(query, video)

stream_link = video.generate_stream(result_timeline)
play_stream(stream_link)

print(result_text)

## Further Steps
---

This tutorial explores an advanced technique for retrieving visual information from video content based on spoken word queries. 

To learn more about Scene Index, explore the following guides:

- [Quickstart Guide](https://github.com/video-db/videodb-cookbook/blob/main/quickstart/Scene%20Index%20QuickStart.ipynb) 
- [Scene Extraction Options](https://github.com/video-db/videodb-cookbook/blob/main/guides/scene-index/playground_scene_extraction.ipynb)
- [Advanced Visual Search](https://github.com/video-db/videodb-cookbook/blob/main/guides/scene-index/advanced_visual_search.ipynb)
- [Custom Annotation Pipelines](https://github.com/video-db/videodb-cookbook/blob/main/guides/scene-index/custom_annotations.ipynb)


If you have any questions or feedback. Feel free to reach out to us 🙌🏼

* [Discord](https://colab.research.google.com/corgiredirector?site=https%3A%2F%2Fdiscord.gg%2Fpy9P639jGz)
* [GitHub](https://github.com/video-db)
* [VideoDB](https://colab.research.google.com/corgiredirector?site=https%3A%2F%2Fvideodb.io)
* [Email](ashu@videodb.io)