In this blog post, we will have a look at how can we build AI chat with youtube videos. 

The main problem that I wanted to solve was, sometimes I need to watch a long videos to get the information that I need. So, I thought why not build a chatbot that can help me with that. This would not only save my time but also help me to get the information that I need quickly.

Checkout this blog in video 

[![AI Workshop - Chat with videos](https://img.youtube.com/vi/ISCLsXS9Sns/0.jpg)](https://www.youtube.com/watch?v=ISCLsXS9Sns "AI Workshop - Chat with videos")

Lets install the pytubefix library. We will be using this library to get the audio from the video.


In [7]:
!pip install pytubefix

Collecting pytubefix
  Downloading pytubefix-6.16.3-py3-none-any.whl (78 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.7/78.7 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pytubefix
Successfully installed pytubefix-6.16.3

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


I have choosen a video that explains the concept of pointers in C language. I will be using this video to build the chatbot.

Checkout the video here

[![#23 C Pointers | C Programming For Beginners](https://img.youtube.com/vi/KGhacRRMnDw/0.jpg)](https://www.youtube.com/watch?v=KGhacRRMnDw "#23 C Pointers | C Programming For Beginners")

Lets download the audio from the video.

In [8]:
from pytubefix import YouTube
yt = YouTube("https://www.youtube.com/watch?v=KGhacRRMnDw")

In [12]:
ys = yt.streams.get_audio_only()
ys.download()

'/Users/rajesh/Documents/Work/personal/ai/notebooks/video-chat/23 C Pointers  C Programming For Beginners.mp4'

Now, lets install the openai library. We will be first transcribing the audio to text using the openai library and then use the LLM to generate coherent responses.

In [13]:
!pip install openai

Collecting openai
  Downloading openai-1.47.1-py3-none-any.whl (375 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m375.6/375.6 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting anyio<5,>=3.5.0 (from openai)
  Downloading anyio-4.6.0-py3-none-any.whl (89 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m89.6/89.6 kB[0m [31m13.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting distro<2,>=1.7.0 (from openai)
  Using cached distro-1.9.0-py3-none-any.whl (20 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Using cached httpx-0.27.2-py3-none-any.whl (76 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Using cached jiter-0.5.0-cp311-cp311-macosx_11_0_arm64.whl (299 kB)
Collecting pydantic<3,>=1.9.0 (from openai)
  Using cached pydantic-2.9.2-py3-none-any.whl (434 kB)
Collecting sniffio (from openai)
  Using cached sniffio-1.3.1-py3-none-any.whl (10 kB)
Collecting tqdm>4 (from openai)
  Using cached tqdm-4.66.5-py3-none-any.whl 

In the next step, I am setting up the openai api key, in the jupyter notebook. If you are not using jupyter notebook, you can set the api key in the environment variable.

In [15]:
import getpass
openai_key = getpass.getpass("Enter your OpenAI key: ")

!export OPENAI_API_KEY=$openai_key

Now, we do the transcription of the audio to text.

In [17]:
from openai import OpenAI

client = OpenAI(api_key=openai_key)

audio_file = open("./23 C Pointers  C Programming For Beginners.mp4", "rb")
transcription = client.audio.transcriptions.create(model="whisper-1", file=audio_file)
transcription = transcription.text

What's up guys? Welcome back to this series on C programming. In this video, we'll learn about pointers in C. More specifically, we'll learn to work directly with computer memory address with the help of pointers. So let's get started. Pointer is one of the main features that make C programming so powerful. It allows us to work directly with the computer memory. Before we learn about pointers, let's first learn about addresses. In C programming, whenever we declare a variable, a space will be allocated in the memory for the variable and C allows us to access the address of the variable. We use the ampersand symbol with the variable name to access the memory address. Let's see an example. You might be familiar with the basic structure of C program. Now I'll create an integer variable, age, so int age, and I'll assign value 25 to this. Now let's use ampersand symbol to access the address where the age variable is stored. So printf %p, ampersand age. Here I have used %p format specifier t

Now that we have transcribed the audio to text, we need to split the text into smaller chunks. This is because with most of the LLMs we need to keep the context to a certain limit. So, we will split the text into smaller chunks and then use the LLM to generate responses. We will use `nltk` library to split the text into smaller chunks.

In [19]:
!pip install nltk

Collecting nltk
  Downloading nltk-3.9.1-py3-none-any.whl (1.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m16.3 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting click (from nltk)
  Using cached click-8.1.7-py3-none-any.whl (97 kB)
Collecting joblib (from nltk)
  Downloading joblib-1.4.2-py3-none-any.whl (301 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m301.8/301.8 kB[0m [31m40.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting regex>=2021.8.3 (from nltk)
  Downloading regex-2024.9.11-cp311-cp311-macosx_11_0_arm64.whl (284 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m284.6/284.6 kB[0m [31m21.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: regex, joblib, click, nltk
Successfully installed click-8.1.7 joblib-1.4.2 nltk-3.9.1 regex-2024.9.11

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m

In [21]:
import nltk.data
nltk.download('punkt_tab')
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')

[nltk_data] Downloading package punkt_tab to
[nltk_data]     /Users/rajesh/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


Using the tokenizer, we will split the text into smaller chunks.

In [23]:
lines = tokenizer.tokenize(transcription)

Lets group the sentences into smaller chunks. Something like 5 sentences in each chunk.

In [24]:
# group the lines array by combining 5 lines into a single string
grouped_lines = []
for i in range(0, len(lines), 5):
    grouped_lines.append(" ".join(lines[i : i + 5]))

Next, we will install ChromaDB, a popular vector database, that is very convinient to use with python. We will be using in-memory mode with ChromaDB.

In [26]:
!pip install chromadb

Collecting chromadb
  Using cached chromadb-0.5.7-py3-none-any.whl (599 kB)
Collecting build>=1.0.3 (from chromadb)
  Using cached build-1.2.2-py3-none-any.whl (22 kB)
Collecting chroma-hnswlib==0.7.6 (from chromadb)
  Using cached chroma_hnswlib-0.7.6-cp311-cp311-macosx_11_0_arm64.whl (185 kB)
Collecting fastapi>=0.95.2 (from chromadb)
  Using cached fastapi-0.115.0-py3-none-any.whl (94 kB)
Collecting uvicorn[standard]>=0.18.3 (from chromadb)
  Using cached uvicorn-0.30.6-py3-none-any.whl (62 kB)
Collecting numpy>=1.22.5 (from chromadb)
  Using cached numpy-2.1.1-cp311-cp311-macosx_14_0_arm64.whl (5.4 MB)
Collecting posthog>=2.4.0 (from chromadb)
  Using cached posthog-3.6.6-py2.py3-none-any.whl (54 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Using cached onnxruntime-1.19.2-cp311-cp311-macosx_11_0_universal2.whl (16.8 MB)
Collecting opentelemetry-api>=1.2.0 (from chromadb)
  Using cached opentelemetry_api-1.27.0-py3-none-any.whl (63 kB)
Collecting opentelemetry-exporter-otlp-

Now that we have installed the ChromaDB, let initialize the database.

In [27]:
import chromadb
# setup Chroma in-memory, for easy prototyping. Can add persistence easily!
client = chromadb.Client()

# Create collection. get_collection, get_or_create_collection, delete_collection also available!
collection = client.create_collection("all-my-documents")

Let's feed the text to ChromaDB. ChromaDB will convert the text to vectors and store them in the database. Although, ChromaDB uses a smaller transformer model, it is still very powerful and can be used for a lot of NLP tasks. But for production use, you might want to use a bigger transformer model.

In [29]:
grouped_lines_metadata = [{"text": line} for line in grouped_lines]
# use the index of the line in the grouped_lines array as the document ID
grouped_lines_ids = [str(i) for i in range(len(grouped_lines))]

In [30]:
collection.add(
    documents=grouped_lines,  # we handle tokenization, embedding, and indexing automatically. You can skip that and add your own embeddings as well
    metadatas=grouped_lines_metadata,  # filter on these!
    ids=grouped_lines_ids,  # unique for each doc
)

Let's now define a user query and get the response from the ChromaDB.

In [42]:
question = "how to read a pointers value?"

In [43]:
results = collection.query(
    query_texts=[question],
    n_results=2,
    # where={"metadata_field": "is_equal_to_this"}, # optional filter
    # where_document={"$contains":"search_string"}  # optional filter
)

In [44]:
results
import json
context = json.dumps(results["documents"], indent=2)

Let's now download LangChain, a library that can be used to generate responses from the LLMs. 

In [45]:
!pip install langchain

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


Also install the openai version of langchain.

In [36]:
!pip install langchain-openai

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting langchain-openai
  Downloading langchain_openai-0.2.0-py3-none-any.whl (51 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.5/51.5 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
Collecting tiktoken<1,>=0.7 (from langchain-openai)
  Using cached tiktoken-0.7.0-cp311-cp311-macosx_11_0_arm64.whl (907 kB)
Installing collected packages: tiktoken, langchain-openai
Successfully installed langchain-openai-0.2.0 tiktoken-0.7.0

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


Lets now prompt the LLM with the user query and get the response. We are using a very basic prompt here. You can use more complex prompts to get better responses.

In [46]:
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAI

llm = OpenAI(api_key=openai_key)
prompt = PromptTemplate.from_template(
    """
    You are a C programming instructor. you can only answer from the context that is given to you.
    Question: {question}
    
    Context: {context}
    
    Based on the the question and the context, answer the question. Do not provide any information that is not present in the context.
"""
)

chain = prompt | llm
chain.invoke({"question": question, "context": context})

"    \n    To read a pointer's value, you need to use the asterisk symbol before the pointer variable name. This will give you the value stored at the memory address pointed by the pointer."

## Conclusion

In this blog post, we saw how can we build a chatbot that can help us with the information in the videos. We used the openai library to transcribe the audio to text and then used the LLM to generate responses. We also used the ChromaDB to store the text and get the responses. This is a very basic implementation of the chatbot. You can use more complex prompts and models to get better responses.