# VideoDB Retriever

### RAG: Instantly Search and Stream Video Results 📺


> [VideoDB](https://videodb.io) is a serverless database designed to streamline the storage, search, editing, and streaming of video content. VideoDB offers random access to sequential video data by building indexes and developing interfaces for querying and browsing video content. Learn more at [docs.videodb.io](https://docs.videodb.io).

Constructing a RAG pipeline for text is relatively straightforward, thanks to the tools developed for parsing, indexing, and retrieving text data. However, adapting RAG models for video content presents a greater challenge. Videos combine visual, auditory, and textual elements, requiring more processing power and sophisticated video pipelines.

While Large Language Models (LLMs) excel with text, they fall short in helping you consume or create video clips. VideoDB provides a sophisticated database abstraction for your MP4 files, enabling the use of LLMs on your video data. With VideoDB, you can not only analyze but also `instantly watch video streams` of your search results.

In this notebook, we introduce `VideoDBRetriever`, a tool specifically designed to simplify creation of RAG pipelines for video content, without any hassle of dealing with complex video infrastructure.

&nbsp;
## 🛠️️ Setup connection

###  Requirements

To connect to VideoDB, simply get the API key and create a connection. This can be done by setting the `VIDEO_DB_API_KEY` environment variable. You can get it from 👉🏼 [VideoDB Console](https://console.videodb.io). ( Free for first 50 uploads, **No credit card required!** )

Get your `OPENAI_API_KEY` from OpenAI platform for `llama_index` response synthesizer.

<!-- > Set the `OPENAI_API_KEY` & `VIDEO_DB_API_KEY` environment variable with your API keys. -->

In [1]:
import os
os.environ["OPENAI_API_KEY"] = ""
os.environ['VIDEO_DB_API_KEY'] = ""

### Installing Dependencies

To get started, we'll need to install the following packages:

- `llama-index`
- `llama-index-retrievers-videodb`
- `videodb`

In [1]:
%pip install llama-index 
%pip install videodb

Collecting llama-index
  Obtaining dependency information for llama-index from https://files.pythonhosted.org/packages/d9/8f/d20ab112842b7a7fed0d3a833705017d7e003b81c4d675ec8646e53553f0/llama_index-0.10.13.post1-py3-none-any.whl.metadata
  Downloading llama_index-0.10.13.post1-py3-none-any.whl.metadata (8.8 kB)
Collecting llama-index-agent-openai<0.2.0,>=0.1.4 (from llama-index)
  Obtaining dependency information for llama-index-agent-openai<0.2.0,>=0.1.4 from https://files.pythonhosted.org/packages/07/49/fb0fc6c16bbf4379ece978d09c8ca3b6541c9e648fb46e4c64b87af1f5d3/llama_index_agent_openai-0.1.5-py3-none-any.whl.metadata
  Downloading llama_index_agent_openai-0.1.5-py3-none-any.whl.metadata (695 bytes)
Collecting llama-index-cli<0.2.0,>=0.1.2 (from llama-index)
  Obtaining dependency information for llama-index-cli<0.2.0,>=0.1.2 from https://files.pythonhosted.org/packages/be/9a/b6336b4945043c8d503873bd5b5185bdcaa1a7193b05002d86d18bf0d3ef/llama_index_cli-0.1.5-py3-none-any.whl.metadata

In [3]:
# For PR reviewer - Please use this to run and test before publishing. 
# %pip install git+https://github.com/video-db/llama_index@add-videodb-retriever#subdirectory=llama-index-integrations/retrievers/llama-index-retrievers-videodb

%pip install llama-index-retrievers-videodb

Collecting git+https://github.com/video-db/llama_index@add-videodb-retriever#subdirectory=llama-index-integrations/retrievers/llama-index-retrievers-videodb
  Cloning https://github.com/video-db/llama_index (to revision add-videodb-retriever) to /private/var/folders/5f/syxw6y9d03b29k1xh4pfyg9m0000gn/T/pip-req-build-n0vgdh8q
  Running command git clone --filter=blob:none --quiet https://github.com/video-db/llama_index /private/var/folders/5f/syxw6y9d03b29k1xh4pfyg9m0000gn/T/pip-req-build-n0vgdh8q
  Running command git checkout -b add-videodb-retriever --track origin/add-videodb-retriever
  Switched to a new branch 'add-videodb-retriever'
  Branch 'add-videodb-retriever' set up to track remote branch 'add-videodb-retriever' from 'origin'.
  Resolved https://github.com/video-db/llama_index to commit 2248caf52cf82773eefee3117a3c6c4ed2a8d877
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) .

### Data Ingestion

Let's upload a few video files first. You can use any `public url`, `Youtube link` or a `local file` on your system. First 50 uploads are free!

In [2]:
from videodb import connect

# connect to VideoDB
conn = connect()

# upload videos to default collection in VideoDB 
print("uploading first video")
video1 = conn.upload(url="https://www.youtube.com/watch?v=lsODSDmY4CY")
print("uploading second video")
video2 = conn.upload(url="https://www.youtube.com/watch?v=vZ4kOr38JhY")

> * `coll = conn.get_collection()` : Returns default collection object.
> * `coll.get_videos()` : Returns list of all the videos in a collections.
> * `coll.get_video(video_id)`: Returns Video object from given`video_id`.

### Indexing

To search bits inside a video, you have to index the video first. We have two type of indexing possible for a video.


- `index_spoken_words`: Indexes spoken words in the video.
- `index_scenes`: Indexes visuals of the video. `(Note: This feature is currently available only for beta users, join our discord for early access)` https://discord.gg/py9P639jGz 

In [3]:
print("Indexing the videos...")
video1.index_spoken_words()
video2.index_spoken_words()

Indexing the videos...


100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:39<00:00,  2.56it/s]                                                
100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:39<00:00,  2.51it/s]                                                


### Querying

Now that the videos are indexed, we can use `VideoDBRetriever` to fetch relevant nodes from VideoDB.

In [4]:
from llama_index.retrievers.videodb import VideoDBRetriever
from llama_index.core import get_response_synthesizer
from llama_index.core.query_engine import RetrieverQueryEngine

In [5]:
# VideoDBRetriever by default uses the default collection in the VideoDB 
retriever = VideoDBRetriever()

# use your llama_index response_synthesizer on search results.
response_synthesizer = get_response_synthesizer()

query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
)

In [6]:
# query across all uploaded videos to get the text answer.
response = query_engine.query("What is Dopamine?")
print(response)

Dopamine is a neurotransmitter that plays a key role in various brain functions, including motivation, reward, and pleasure. It is involved in regulating mood, movement, and cognitive function.


In [7]:
response = query_engine.query("What's the benefit of morning sunlight?")
print(response)

Morning sunlight can help trigger a cortisol pulse shift, allowing individuals to capture a morning work block by waking up early and exposing themselves to sunlight. This exposure to morning sunlight, along with brief high-intensity exercise, can assist in adjusting the cortisol levels and potentially enhancing productivity during the early hours of the day.


&nbsp;
## Watch Video Stream of Search Result

Although, The `Nodes` returned by Retriever are of type `TextNode`. They also have metadata that can help you `watch the video stream` of results. You can create a compilation of all Nodes using VideoDB's [Programmable video streams](https://docs.videodb.io/version-0-0-3-timeline-and-assets-44). You can even modify it with Audio and Image overlays easily. 

![Timeline](https://codaio.imgix.net/docs/_s5lUnUCIU/blobs/bl-n4vT_dFztl/e664f43dbd4da89c3a3bfc92e3224c8a188eb19d2d458bebe049e780f72506ca6b19421c7168205f7ad307187e73da60c73cdbb9a0ef3fec77cc711927ad26a29a92cd13691fa9375c231f1c006853bacf28e09b3bf0bbcb5f7b76462b354a180fb437ad?auto=format%2Ccompress&fit=max "Programmable Video Streams")



In [11]:
from videodb import connect, play_stream 
from videodb.timeline import Timeline
from videodb.asset import VideoAsset

In [8]:
# create video stream of search results
conn = connect()
timeline = Timeline(conn)

relevant_nodes = retriever.retrieve("What's the benefit of morning sunlight?")

for node_obj in relevant_nodes:
    node = node_obj.node
    # create a video asset for each node
    node_asset = VideoAsset(asset_id=node.metadata["video_id"], start=node.metadata["start"], end=node.metadata["end"])
    # add the asset to timeline
    timeline.add_inline(node_asset)

#generate stream for the compiled timeline
stream_url = timeline.generate_stream()
play_stream(stream_url)

'https://console.videodb.io/player?url=https://stream.videodb.io/v3/published/manifests/9c39c8a9-62a2-4b5e-b15d-8565cc58c8ae.m3u8'

&nbsp;
### Configuring `VideoDBRetriever`

**1. Retriever for only one Video**:
You can pass the `id` of the video object to search in only that video. 
```python
VideoDBRetriever(video="my_video.id")
```

**2. Retriever for differnt type of Indexes**:
```python
# VideoDBRetriever that uses keyword search - Matches exact occurance of words and sentences. It only supports single video. 
keyword_retriever = VideoDBRetriever(search_type="keyword", video="my_video.id")

# VideoDBRetriever that uses semantic search - Perfect for question answers type of query.
semantic_retriever = VideoDBRetriever(search_type="semantic")

# [only for beta users of VideoDB] VideoDBRetriever that uses scene search - Search visual information in the videos.
visual_retriever = VideoDBRetriever(search_type="scene")
```

**3. Configure threshold parameters**:  
- `result_threshold`: is threshold for number of results returned by retriever; default value is `5`
- `score_threshold`: only nodes with score higher than `score_threshold` will be returned by retriever; default value is `0.2`  

```python
custom_retriever = VideoDBRetriever(result_threshold=2, score_threshold=0.5)
```

### View Specific Node

To watch stream of each retrieved node, you can directly generate the stream of that part directly from `video` object of VideoDB. 


In [10]:
relevant_nodes

[NodeWithScore(node=TextNode(id_='6ca84002-49df-4091-901d-48248dbe0977', embedding=None, metadata={'collection_id': 'c-33978c87-33e6-4259-9e27-a9edc79be9ad', 'video_id': 'm-f201ff7c-88ec-47ca-938b-a4e968676ba0', 'length': '1496.711837', 'title': 'AMA #1: Leveraging Ultradian Cycles, How to Protect Your Brain, Seed Oils Examined and More', 'start': 906.01, 'end': 974.59}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text=" So for somebody that wants to learn an immense amount of material, or who has the opportunity to capture another Altradian cycle, the other time where that tends to occur is also early days. So some people, by waking up early and using stimulants like caffeine and hydration or some brief high intensity city exercise, can trigger that cortisol pulse to shift a little bit earlier so that they can capture a morning work block that occurs somewhere, let's say between six and 07:30 a.m. So let's think about our typical person, at least 

In [9]:
from videodb import connect

# retriever = VideoDBRetriever()
# relevant_nodes = retriever.retrieve("What is Dopamine?")

video_node = relevant_nodes[0].node
conn = connect()
coll = conn.get_collection()

video = coll.get_video(video_node.metadata["video_id"])
start = video_node.metadata["start"]
end = video_node.metadata["end"]

stream_url = video.generate_stream(timeline=[(start, end)])
play_stream(stream_url)

'https://console.videodb.io/player?url=https://stream.videodb.io/v3/published/manifests/b7201145-7302-4ec5-b87c-d1a4c6592f69.m3u8'

## 🧹 Cleanup

In [None]:
video1.delete()
video2.delete()

## 👨‍👩‍👧‍👦 Support & Community

Leveraging the capabilities of automation and AI-driven content understanding, the possibilities for creation and repurposing of your content are boundless with VideoDB.

If you have any questions or feedback. Feel free to reach out to us 🙌🏼

- [Discord](https://discord.gg/py9P639jGz)  
- [Github](https://videodb.io)  
- [VideoDB](https://videodb.io)  
- [Email](mailto:contact@videodb.io)  