# Multimodal AI Q&A of youtube video with LangChain & the OpenAI API 

## Goals 

Videos can be full of useful information, but getting hold of that info can be slow, since you need to watch the whole thing or try skipping through it. It can be much faster to use a bot to ask questions about the contents of the transcript.

In this project, you'll download a tutorial video from YouTube, transcribe the audio, and create a simple Q&A bot to ask questions about the content.

## Task 0: Setup

The project requires several packages that need to be installed into Workspace.

- `langchain` is a framework for developing generative AI applications.
- `yt_dlp` lets you download YouTube videos.
- `tiktoken` converts text into tokens.
- `docarray` makes it easier to work with multi-model data (in this case mixing audio and text).

### Instructions

Run the following code to install the packages.

In [2]:
# Install the openai package, locked to version 1.27
!pip install openai==1.27

# Install the langchain package, locked to version 0.1.19
!pip install langchain==0.1.19

# Install the langchain-openai package, locked to version 0.1.6
!pip install langchain-openai==0.1.6

# Install the yt_dlp package, locked to version 2024.4.9
!pip install yt_dlp==2024.4.9

# Install the tiktoken package, locked to version 0.6.0
!pip install tiktoken==0.6.0

# Install the docarray package, locked to version 0.40.0
!pip install docarray==0.40.0

Defaulting to user installation because normal site-packages is not writeable
Collecting openai==1.27
  Using cached openai-1.27.0-py3-none-any.whl.metadata (21 kB)
Using cached openai-1.27.0-py3-none-any.whl (314 kB)
Installing collected packages: openai
  Attempting uninstall: openai
    Found existing installation: openai 1.35.10
    Uninstalling openai-1.35.10:
      Successfully uninstalled openai-1.35.10
[0mSuccessfully installed openai-1.27.0

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m
Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, r

## Task 1: Import The Required Libraries 

For this project we need the `os` and the `yt_dlp` packages to download the YouTube video of your choosing, convert it to an `.mp3` and save the file. We will also be using the `openai` package to make easy calls to the OpenAI models we will use. 

### Instructions

Import the following packages.

- Import `os`. 
- Import `glob`.
- Import `openai`.
- Import `yt_dlp` with the alias `youtube_dl`.
- From the `yt_dlp` package, import `DowloadError`.
- Assign `openai_api_key` to `os.getenv("OPENAI_API_KEY")`.

In [3]:
# Import the os package
import os

# Import the glob package
import glob

# Import the openai package 
import openai

# Import the yt_dlp package as youtube_dl
import yt_dlp as youtube_dl

# Import DownloadError from yt_dlp
from yt_dlp import DownloadError

# Import DocArray 
import docarray


We will also assign the variable `openai_api_key` to the environment variable "OPEN_AI_KEY". This will help keep our key secure and remove the need to write it in the code here. 

In [4]:
openai_api_key = os.getenv("OPENAI_API_KEY")

## Task 2: Download the YouTube Video

After creating the setup, the first step we will need to do is download the video from Youtube and convert it to an audio file (.mp3). 

We'll download a video from youtube of interest.

We will do this by setting a variable to store the `youtube_url` and the `output_dir` that we want the file to be stored. 

The `yt_dlp` allows us to download and convert in a few steps but does require a few configuration steps. This code is provided to you. 

Lastly, we will create a loop that looks in the `output_dir` to find any .mp3 files. Then we will store those in a list called `audio_files` that will be used later to send each file to the Whisper model for transcription. 

In [5]:
# An example YouTube tutorial video
youtube_url = "https://www.youtube.com/watch?v=G86V5loIsSM"

# Directory to store the downloaded video
output_dir = "files/audio/"

# Config for youtube-dl
ydl_config = {
    "format": "bestaudio/best",
    "postprocessors": [
        {
            "key": "FFmpegExtractAudio",
            "preferredcodec": "mp3",
            "preferredquality": "192",
        }
    ],
    "outtmpl": os.path.join(output_dir, "%(title)s.%(ext)s"),
    "verbose": True
}


In [6]:
# Check if the output directory exists, if not create it
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

# Print a message indicating which video is being downloaded
print(f"Downloading video from {youtube_url}")

# Try to download the video using the specified configuration
# If a DownloadError occurs, attempt to download the video again
try:
    with youtube_dl.YoutubeDL(ydl_config) as ydl:
        ydl.download([youtube_url])
except DownloadError:
    with youtube_dl.YoutubeDL(ydl_config) as ydl:
        ydl.download([youtube_url])

[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out UTF-8 (No ANSI), error UTF-8 (No ANSI), screen UTF-8 (No ANSI)
[debug] yt-dlp version stable@2024.04.09 from yt-dlp/yt-dlp [ff0779267] (pip) API
[debug] params: {'format': 'bestaudio/best', 'postprocessors': [{'key': 'FFmpegExtractAudio', 'preferredcodec': 'mp3', 'preferredquality': '192'}], 'outtmpl': 'files/audio/%(title)s.%(ext)s', 'verbose': True, 'compat_opts': set(), 'http_headers': {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4556.0 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en-us,en;q=0.5', 'Sec-Fetch-Mode': 'navigate'}}
[debug] Python 3.8.10 (CPython x86_64 64bit) - Linux-5.10.216-204.855.amzn2.x86_64-x86_64-with-glibc2.29 (OpenSSL 1.1.1f  31 Mar 2020, glibc 2.31)


Downloading video from https://www.youtube.com/watch?v=G86V5loIsSM


[debug] exe versions: ffmpeg 4.2.7, ffprobe 4.2.7
[debug] Optional libraries: Cryptodome-3.20.0, brotli-1.1.0, certifi-2019.11.28, mutagen-1.47.0, requests-2.31.0, secretstorage-3.3.3, sqlite3-3.31.1, urllib3-1.25.8, websockets-12.0
[debug] Proxy map: {}
[debug] Request Handlers: urllib, websockets
[debug] Loaded 1810 extractors


[youtube] Extracting URL: https://www.youtube.com/watch?v=G86V5loIsSM
[youtube] G86V5loIsSM: Downloading webpage
[youtube] G86V5loIsSM: Downloading ios player API JSON
[youtube] G86V5loIsSM: Downloading android player API JSON
[youtube] G86V5loIsSM: Downloading player 5352eb4f


[debug] Saving youtube-nsig.5352eb4f to cache
[debug] [youtube] Decrypted nsig 02qkW5ZAjEBB9ZdLZFI => 60ce3BpqnQvFFw
[debug] Loading youtube-nsig.5352eb4f from cache
[debug] [youtube] Decrypted nsig 1bkc32LWvPXESqNi25M => IjYSal_ZaohCOQ


[youtube] G86V5loIsSM: Downloading m3u8 information


[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id


[info] G86V5loIsSM: Downloading 1 format(s): 251


[debug] Invoking http downloader on "https://rr4---sn-vgqsrnzs.googlevideo.com/videoplayback?expire=1720231839&ei=P1OIZoOEEZypkucPk4qq0AE&ip=3.233.97.1&id=o-ANQyR0lxR6brMGYhqNLhBjJRhaYFWX_Hg1LN0Edg-3t1&itag=251&source=youtube&requiressl=yes&xpc=EgVo2aDSNQ%3D%3D&mh=Jv&mm=31%2C29&mn=sn-vgqsrnzs%2Csn-vgqsknzl&ms=au%2Crdu&mv=u&mvi=4&pl=19&gcr=us&bui=AbKP-1Pbv80FRTmaNdth-CmPxlR5iWyD8_ugBysiQXbBZ7y3MVR0hHIAo3eeWadXxhrDiS0RCuSuMJAg&spc=NO7bAcOr-56v5R5U6ync0vwyV0TX4YcmJjhYFLo1is16DFVBL7IWSeQ_txdL&vprv=1&svpuc=1&mime=audio%2Fwebm&ns=ozXtI-eTcdJEX_Cvxt4biywQ&rqh=1&gir=yes&clen=15076076&dur=943.881&lmt=1656696218271718&mt=1720208733&fvip=5&keepalive=yes&c=WEB&sefc=1&txp=5532434&n=IjYSal_ZaohCOQ&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cxpc%2Cgcr%2Cbui%2Cspc%2Cvprv%2Csvpuc%2Cmime%2Cns%2Crqh%2Cgir%2Cclen%2Cdur%2Clmt&sig=AJfQdSswRgIhANmILk5NxDq4dEHXKwVB5h5fjfxpDFb53z0_2GG2oRd9AiEAxQ5wvOjKg-evHQwLV1hnAqM4-1CRVRO0MQ6MeNJjKhA%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl&lsig=AHlkHjA

[download] Destination: files/audio/DEMON TANJIRO SLAYS NEZUKO!？ Full Cinematic End of Demon Slayer： Kimetsu no Yaiba.webm
[download] 100% of   14.38MiB in 00:00:00 at 19.97MiB/s  


[debug] ffmpeg command line: ffprobe -show_streams 'file:files/audio/DEMON TANJIRO SLAYS NEZUKO!？ Full Cinematic End of Demon Slayer： Kimetsu no Yaiba.webm'


[ExtractAudio] Destination: files/audio/DEMON TANJIRO SLAYS NEZUKO!？ Full Cinematic End of Demon Slayer： Kimetsu no Yaiba.mp3


[debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -i 'file:files/audio/DEMON TANJIRO SLAYS NEZUKO!？ Full Cinematic End of Demon Slayer： Kimetsu no Yaiba.webm' -vn -acodec libmp3lame -b:a 192.0k -movflags +faststart 'file:files/audio/DEMON TANJIRO SLAYS NEZUKO!？ Full Cinematic End of Demon Slayer： Kimetsu no Yaiba.mp3'


Deleting original file files/audio/DEMON TANJIRO SLAYS NEZUKO!？ Full Cinematic End of Demon Slayer： Kimetsu no Yaiba.webm (pass -k to keep)


To find the audio files that we will use the `glob`module that looks in the `output_dir` to find any .mp3 files. Then we will append the file to a list called `audio_files`. This will be used later to send each file to the Whisper model for transcription. 

### Instructions

Find the audio file in the output directory.

- Find all the MP3 audio files in the output directory by joining the output directory to the pattern `*.mp3` and using glob to list them.
- Select the first file in the list and assign it to `audio_filename`.
- _Check your work._ Print `audio_filename`.

In [7]:
# Find all the audio files in the output directory
audio_file = glob.glob(os.path.join(output_dir, "*.mp3"))

# Select the first audio file in the list
audio_filename = audio_file[0]

# Print the name of the selected audio file
print(audio_filename)

files/audio/DEMON TANJIRO SLAYS NEZUKO!？ Full Cinematic End of Demon Slayer： Kimetsu no Yaiba.mp3


## Task 3: Transcribe the Video using Whisper

In this step we will take the downloaded and converted Youtube video and send it to the Whisper model to be transcribed. To do this we will create variables for the `audio_file`, for the `output_file` and the model. 

Using these variables we will:
- create a list to store the transcripts
- Read the Audio File 
- Send the file to the Whisper Model using the OpenAI package 

In [17]:
# Use these settings
audio_file = audio_filename
output_file = "files/transcripts/transcript.txt"
model = "whisper-1"

# Transcribe the audio file to text using OpenAI API
print("Converting audio to text...")

# Define an OpenAI client model. Assign to client.
client = openai.OpenAI()

# Open the audio file as read-binary
with open(audio_file, "rb") as audio:
    # Use the model to create a transcription
    response = client.audio.transcriptions.create(model=model, file=audio)
    

# Extract the transcript from the response
transcript = (response.text)

# Print the transcript
print(transcript)


Converting audio to text...
The Price of Victory As the sun climbs over the horizon, the massive form that Muzan had taken to defend himself chips away piece by piece. His body dissolves as these seconds pass, leaving behind a chunk of flesh that has yet to dissipate on the ground. Meanwhile, Anosuke lies elsewhere unwilling to accept his treatment calmly. He chomps on a Kakashi member's hand to the point of blood, convincing the others taking care of him that he'll be just fine. However, that's when he also spews out blood from his boar mask and they start to doubt that he'll be okay after all. Not far from there, Zenitsu dramatically requests the Kakashi let his wife know that he loves her, his wife Nezuko that is, adding that she should be made aware of how bravely he fought while thinking of her. The core members can't help but say, this guy never shuts up, but that's where the comedy ends. Gyu wanders around looking for Tanjiro, while the Kakashi try to calm him down so they can t

In [19]:
# Create the directory for the output file if it doesn't exist
os.makedirs(os.path.dirname(output_file), exist_ok=True)

# Write the transcript to the output file
with open(output_file, "w") as file:
    file.write(transcript)

## Task 4: Create a TextLoader using LangChain 

In order to use text or other types of data with LangChain we must first convert that data into Documents. This is done by using loaders. In this tutorial, we will use the `TextLoader` that will take the text from our transcript and load it into a document. 

### Instructions

Load the documents from the text file using a TextLoader.

- From the `langchain.document_loaders` module, import `TextLoader`.
- Create a `TextLoader`, passing it the directory of the transcripts, `"./files/text"`. Assign to `loader`.
- Use the TextLoader to load the documents. Assign to `docs`.

In [20]:
# From the langchain.document_loaders module, import TextLoader
from langchain.document_loaders import TextLoader

# Create a `TextLoader`, passing the directory of the transcripts. Assign to `loader`.
loader = TextLoader("./files/transcripts/transcript.txt")

# Use the TextLoader to load the documents. Assign to docs.
docs = loader.load()

In [21]:
# Show the first element of docs to verify it has been loaded 
docs[0]

Document(page_content="The Price of Victory As the sun climbs over the horizon, the massive form that Muzan had taken to defend himself chips away piece by piece. His body dissolves as these seconds pass, leaving behind a chunk of flesh that has yet to dissipate on the ground. Meanwhile, Anosuke lies elsewhere unwilling to accept his treatment calmly. He chomps on a Kakashi member's hand to the point of blood, convincing the others taking care of him that he'll be just fine. However, that's when he also spews out blood from his boar mask and they start to doubt that he'll be okay after all. Not far from there, Zenitsu dramatically requests the Kakashi let his wife know that he loves her, his wife Nezuko that is, adding that she should be made aware of how bravely he fought while thinking of her. The core members can't help but say, this guy never shuts up, but that's where the comedy ends. Gyu wanders around looking for Tanjiro, while the Kakashi try to calm him down so they can treat 

## Task 5: Create an In-Memory Vector Store 

Now that we have created Documents of the transcription, we will store that Document in a vector store. Vector stores allows LLMs to traverse through data to find similiarity between different data based on their distance in space. 

For large amounts of data, it is best to use a designated Vector Database. Since we are only using one transcript for this tutorial, we can create an in-memory vector store using the `docarray` package. 

We will also tokenize our queries using the `tiktoken` package. This means that our query will be seperated into smaller parts either by phrases, words or characters. Each of these parts are assigned a token which helps the model "understand" the text and relationships with other tokens. 

### Instructions

- Import the `tiktoken` package. 

In [22]:
# Import the tiktoken package
import tiktoken

## Task 6: Create the Document Search 

We will now use LangChain to complete some important operations to create the Question and Answer experience. Let´s import the follwing: 

- Import `RetrievalQA` from `langchain.chains` - this chain first retrieves documents from an assigned Retriver and then runs a QA chain for answering over those documents 
- Import `ChatOpenAI` from `langchain.chat_models` - this imports the ChatOpenAI model that we will use to query the data 
- Import `DocArrayInMemorySearch` from `langchain.vectorstores` - this gives the ability to search over the vector store we have created. 
- Import `OpenAIEmbeddings` from `langchain.embeddings` - this will create embeddings for the data store in the vector store. 
- Import `display` and `Markdown`from `IPython.display` - this will create formatted responses to the queries. (

In [26]:
# From the langchain.chains module, import RetrievalQA
from langchain.chains import RetrievalQA

# From the langchain_openai package, import ChatOpenAI, OpenAIEmbeddings
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# From the langchain.vectorstores module, import DocArrayInMemorySearch
from langchain.vectorstores import DocArrayInMemorySearch

Now we will create a vector store that will use the `DocArrayInMemory` search methods which will search through the created embeddings created by the OpenAI Embeddings function. 

In [27]:
# Create a new DocArrayInMemorySearch instance from the specified documents and embeddings
db = DocArrayInMemorySearch.from_documents(
docs,
OpenAIEmbeddings()
)

We will now create a retriever from the `db` we created in the last step. This enables the retrieval of the stored embeddings. Since we are also using the `ChatOpenAI` model, will assigned that as our LLM.

Recall that the temperature of an LLM refers to how random the results are. Setting the temperature to zero makes the results more repeatable.

In [28]:
# Convert the DocArrayInMemorySearch instance to a retriever
retriever = db.as_retriever()

# Create a new ChatOpenAI instance with a temperature of 0.0
llm = ChatOpenAI(temperature=0.0)

Our last step before starting to ask questions is to create the `RetrievalQA` chain. This chain takes in the:  
- The `llm` we want to use.
- The `chain_type` which is how the model retrieves the data. Here we will use a _stuff_ chain, where all the documents are stuffed into the prompt. It is the simplest type, but only works where you only have a few small documents.
- The `retriever` that we have created.
- An option called `verbose` that prints details of each step of the chain.

In [30]:
# Create a new RetrievalQA instance with the specified parameters
qa_stuff = RetrievalQA.from_chain_type(
# The ChatOpenAI instance to use for generating responses
llm=llm,
# The type of chain to use for the QA system
chain_type="stuff",
# The retriever to use for retrieving relevant documents
retriever=retriever,
# Whether to print verbose output during retrieval and generation
verbose=True    
)

## Task 7: Create the Queries 

Now we are ready to create queries about the YouTube video and read the responses from the LLM. This done first by creating a query and then running the RetrievalQA we setup in the last step and passing it the query. 

In [34]:
# Set the query to be used for the QA system
query = "What is this story about?"

# Invoke the query through the RetrievalQA instance. Assign to response.
response = qa_stuff.invoke(query)

# Print the response to the console
response




[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


{'query': 'What is this story about?',
 'result': "This story is about a battle against Muzan, the King of Demons, and the aftermath of that battle where Tanjiro, one of the main characters, is affected by Muzan's powers and turns into a demon. The story follows the struggles of Tanjiro's friends and family as they try to bring him back to his human self and the emotional journey they all go through in the process."}

We can continue on creating queries and even creating queries that we know would not be answered in this video to see how the model responds. 

In [35]:
# Set the query to be used for the QA system
query = "What is the difference between muzan and tanjiro?"

# Invoke the query through the RetrievalQA instance and store the response
response = qa_stuff.invoke(query)

# Print the response to the console
response



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


{'query': 'What is the difference between muzan and tanjiro?',
 'result': 'Muzan and Tanjiro are characters from the anime "Demon Slayer." Muzan is the main antagonist, a powerful demon who seeks to become the ultimate demon and rule over all others. Tanjiro, on the other hand, is the protagonist, a kind-hearted demon slayer who fights against demons to protect humanity. The main difference between them is their goals and moral compass. Muzan is driven by power and selfish desires, while Tanjiro fights for justice and to protect others.'}

In [36]:
# Set the query to be used for the QA system
query = "Does tanjiro die after the battle?"

# Invoke the query through the RetrievalQA instance and store the response
response = qa_stuff.invoke(query)

# Print the response to the console
response



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


'No, Tanjiro does not die after the battle. He is turned back to normal from being a demon and is reunited with his sister, Nezuko, and his friends.'

In [37]:
# Set the query to be used for the QA system
query = "Who is One punch man?"

# Invoke the query through the RetrievalQA instance and store the response
response = qa_stuff.invoke(query)

# Print the response to the console
response



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


"I don't know."

In [39]:
# Set the query to be used for the QA system
query = "What was zenitsu doing?"

# Invoke the query through the RetrievalQA instance and store the response
response = qa_stuff.invoke(query)

# Print the response to the console
response



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


{'query': 'What was zenitsu doing?',
 'result': 'Zenitsu was dramatically requesting the Kakashi to let his wife Nezuko know that he loves her and that she should be made aware of how bravely he fought while thinking of her.'}