# ✂️🎬️ Scene Index: Scene Extraction

<a href="https://colab.research.google.com/github/video-db/videodb-cookbook/blob/main/guides/video/scene-index/playground_scene_extraction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

> This guide assumes you are already familiar with the concept of Scene Indexing. If you are not, please refer to our [Scene Index: QuickStart](https://github.com/video-db/videodb-cookbook/blob/main/guides/video/scene-index/quickstart.ipynb) Guide to get up to speed.  

Sometimes, it's important to determine the number of scenes needed to describe a video, as this can vary depending on the type of video. For instance, videos of a podcast with two hosts tend to be less dynamic than sports videos

If you want to extract scenes from the video without indexing them, you can use the `Video.extract_scenes()` function. 

## Setup
---

### 📦  Installing packages   

In [None]:
!pip install videodb

### 🔑 API Keys

In [4]:
import os

os.environ["VIDEO_DB_API_KEY"] = ""

### 🌐 Connect to VideoDB

In [5]:
from videodb import connect

conn = connect()
coll = conn.get_collection()

### 🎥  Upload Video

In [6]:
video = coll.upload(url="https://www.youtube.com/watch?v=LejnTJL173Y")

## ✂️🎬 Extracting Scenes without Indexing
---

`Video.extract_scenes()` extracts the frame skipping the indexing part, This function is useful if you want to experiment with the configuration of your Extract Scenes pipeline


### ⚙️ Extract Scenes Parameters 

- `extraction_type`  - Choose a scene extraction algorithm.
- `extraction_config`  - Configuration settings for the chosen scene extraction algorithm..


Depending on your Scene Extraction pipeline, you will need to provide `extraction_config` according to the specific requirements of that pipeline.

Let’s delve into the details of each pipeline and the respective configurations needed for them.


### **⚙️ Time-Based Extraction**

First, you need to set `extraction_type` to `SceneExtractionType.time_based`    
Then to configure, you can pass a Python Dict to the `extraction_config` argument with following keys.

* `time`: Specifies the interval (in seconds) at which scenes are segmented. Default value is `10` - Every 10sec is a scene.
* `select_frames`: A list of frames to select from each segment. The list can contain strings from the following: `"first"`, `"middle"`, or `"last"`, which selects the respective frames.   
Default value is `["first"]`

>Note: This algorithm may not perform well with static videos. We can develop more advanced methods to segment videos into a few scenes and frames. One such method is based on shot detection 👇
<br>



In [None]:
from videodb import SceneExtractionType

scene_collection_time = video.extract_scenes(
    extraction_type=SceneExtractionType.time_based,
    extraction_config={"time": 30, "select_frames": ["first", "middle", "last"]},
)

### **⚙️ Shot-Based Extraction**  

Videos share context between timestamps. A scene is a logical segment of a video that completes a concept. There are many ways to describe a scene. One way is to identify scene changes based on visual content within the video. Key factors are: <u>significant changes in the visual content</u>, such as **transitions, lighting changes, and movement**.

First, you need to set `extraction_type` to `SceneExtractionType.time_based`.    
To configure, you can pass a Python Dict to the `extraction_config` argument with following keys.

* `threshold`: Determines the sensitivity of the model towards scene changes within the video. Default value is `20`, which known to be good for detecting camera shot changes from a video.
* `frame_count`: Accepts a number that specifies how many frames to pick from each shot. Default value is `1` Increasing this number will result in more frames being selected from each shot, which could provide a more detailed analysis of the scene.  
<br>

In [None]:

from videodb import SceneExtractionType

scene_collection_shot = video.extract_scenes(
    extraction_type=SceneExtractionType.shot_based,
    extraction_config={"threshold": 15, "frame_count": 5},
)

## Viewing, Inspecting, and Deleting Your SceneCollections
---

For every scene extraction pipeline that you run on a video, a `SceneCollection` object is created.

You can use following functions to View, Inspect and Delete your `SceneCollection`s

**Viewing all `SceneCollection`s for a Video**:

In [None]:
scene_collections = video.list_scene_collection()
for scene_collection in scene_collections:
    print("Scene Collection Id :",scene_collection["scene_collection_id"])

**Get `SceneCollection` by ID**:

In [None]:
first_coll_id = scene_collections[0]["scene_collection_id"]
scene_collection = video.get_scene_collection(first_coll_id)

print("This is Scene Collection", scene_collection)

**Inspecting `SceneCollection`**:

In [None]:
print("This is scene collection id", scene_collection_shot.id)
print("This is scene collection config", scene_collection_shot.config)
scenes = scene_collection_shot.scenes
for scene in scenes:
    print(f"Scene Duration {scene.start}-{scene.end}")
    for frame in scene.frames:
        print(f"- Frame at {frame.frame_time} {frame.url}")

**Delete a `SceneCollection`**:

In [None]:
video.delete_scene_collection(scene_collection_shot.id)

## ✍️ Playground: Play with Prompt
---

Before finalizing your prompt, consider experimenting with different ones. This will help you see how the search performs for your use cases. Start by iterating over only a few scenes. Then, experiment with your prompt and test it after indexing

We believe that the right prompt is very helpful in finding information that aligns with your domain knowledge and experience.  For this we provide following describe functions at Frame and Scene level. 

### `Frame.describe()`

In [None]:
frame_prompt = """
You will be provided with an image. Your task is to identify and describe the objects in the image.
1.	Identify Objects: List distinct objects in the image.
2.	Describe Objects: Provide a brief description of each object, including shape, color, and any notable features.

Ouput should be a list of objects
Expected Output:
[{"name": "book", "context": "a person wearing a white shirt is holding a book"}]
"""

# Fetch the first frame of the first scene
frame = scene_collection_time.scenes[0].frames[0]

# Describe the frame
frame.describe(prompt=frame_prompt)

print(frame)

### `Scene.Describe()`

In [None]:
scene_prompt = """
You will be provided with a series of images. Your task is to view all images together and describe the overall story or scene in the best possible way.

Expected Output:
- A detailed story or scene description.
- A list of objects and actions in each image.

Example Output:
{
  "scene_story": "A person is cooking in the kitchen and then someone rings the doorbell.",
  "images": [
    {"description": "Someone is cooking in the kitchen."},
    {"description": "Someone rings the doorbell."}
  ]
}
"""


# Fetch the first scene
scene = scene_collection_time.scenes[0]

# Describe the frame
scene.describe(prompt=scene_prompt)

print(scene)


### 🗂️ Describe All Scenes & Index
---

Once you are confident about your prompt. 
You can use `Frame.describe()` or `Scene.describe()` on the whole `SceneCollection` and index it

In [None]:
# get scene from collection
scenes = scene_collection.scenes

# Describe Scenes & frames
for scene in scenes:
  scene.describe(prompt=scene_prompt)
  for frame in scene.frames:
    frame.describe(prompt=frame_prompt)

# Index Scenes 
index_id = video.index_scenes(scenes=scenes, name="My Custom Annotations#1")

custom_scene_index = video.get_scene_index(index_id)
print(custom_scene_index)

### 🔍 Search 
---

> Note: it might take a additional 5-10 seconds for your index to become available for search

In [None]:

from videodb import IndexType

# search using the index_id
res = video.search(query="guns", index_type=IndexType.scene, index_id=index_id)
res.play()

## 🧑‍💻 Next Steps
---


Check out the other resources and tutorials using Scene Indexing
* If you want to bring your own scene descriptions and annotations, explore the [Custom Annotations  Pipeline](https://github.com/video-db/videodb-cookbook/blob/main/guides/video/scene-index/custom_annotations.ipynb)
* Check out our open and flexible [Advanced Visual Search Pipelines](https://github.com/video-db/videodb-cookbook/blob/main/guides/video/scene-index/advanced_visual_search.ipynb)


If you have any questions or feedback. Feel free to reach out to us 🙌🏼

* [Discord](https://colab.research.google.com/corgiredirector?site=https%3A%2F%2Fdiscord.gg%2Fpy9P639jGz)
* [GitHub](https://github.com/video-db)
* [VideoDB](https://colab.research.google.com/corgiredirector?site=https%3A%2F%2Fvideodb.io)
* [Email](ashu@videodb.io)