# RAG from YouTube video

Loading the environment variables we need to use.

In [29]:
import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

# Is it wrong to kill Harold?, Fallout 3 Lore.
YOUTUBE_VIDEO = "https://www.youtube.com/watch?v=3Tf0Ld62844"

## Setting up the model
Let's define the LLM model that we'll use as part of the workflow.

In [30]:
from langchain_openai.chat_models import ChatOpenAI

model = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model="gpt-3.5-turbo")

We can test the model by asking a simple question.

In [31]:
model.invoke("What MLB team won the World Series during the COVID-19 pandemic?")

AIMessage(content='The Los Angeles Dodgers won the World Series during the COVID-19 pandemic in 2020. They defeated the Tampa Bay Rays in a six-game series to claim their first championship since 1988.', response_metadata={'token_usage': {'completion_tokens': 40, 'prompt_tokens': 21, 'total_tokens': 61}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-e364c240-e6e1-432a-a058-e2e83d16eb2c-0')

In [32]:
model.invoke("How do you make enchiladas?")

AIMessage(content="To make enchiladas, you will need the following ingredients:\n\n- 1 pound of cooked and shredded chicken, beef, or pork\n- 1 can of enchilada sauce\n- 1 small diced onion\n- 1 minced clove of garlic\n- 1 can of diced green chilies\n- 1 cup of shredded cheese\n- 8-10 corn tortillas\n- Salt and pepper to taste\n- Optional toppings such as sliced avocado, sour cream, chopped cilantro, and sliced jalapenos\n\nHere's how to make enchiladas:\n\n1. Preheat your oven to 350°F (175°C).\n\n2. In a skillet, sauté the diced onion and minced garlic until they are soft and fragrant.\n\n3. Add in the shredded chicken, beef, or pork, diced green chilies, salt, and pepper. Cook until everything is heated through.\n\n4. Pour a small amount of enchilada sauce into a baking dish to coat the bottom.\n\n5. Take a corn tortilla and fill it with a spoonful of the meat mixture. Roll up the tortilla and place it seam-side down in the baking dish.\n\n6. Continue filling and rolling the rest of

In [33]:
model.invoke("How much is 2+2?")

AIMessage(content='2 + 2 equals 4.', response_metadata={'token_usage': {'completion_tokens': 8, 'prompt_tokens': 15, 'total_tokens': 23}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-90cf854f-97d0-4a49-984d-9b32d86a873f-0')

The result from the model is an `AIMessage` instance containing the answer. We can extract this answer by chaining the model with an [output parser](https://python.langchain.com/docs/modules/model_io/output_parsers/).

For this example, we'll use a simple `StrOutputParser` to extract the answer as a string.

In [34]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

chain = model | parser
chain.invoke("What MLB team won the World Series during the COVID-19 pandemic?")

'The Los Angeles Dodgers won the World Series during the COVID-19 pandemic in 2020. They defeated the Tampa Bay Rays in a six-game series to clinch their first championship since 1988.'

In [35]:
chain.invoke("How do you make enchiladas?")

'1. Preheat your oven to 350°F (175°C).\n\n2. In a skillet, cook and crumble 1 pound of ground beef or turkey over medium heat until fully cooked. Add 1 diced onion and 2 minced garlic cloves and cook until the onion is translucent.\n\n3. Stir in 1 can of diced green chilies, 1 can of black beans (drained and rinsed), and 1 cup of enchilada sauce. Cook for a few minutes until heated through.\n\n4. In a separate skillet, heat a small amount of oil over medium heat. Lightly fry each corn tortilla for a few seconds on each side until softened.\n\n5. Spread a small amount of enchilada sauce on the bottom of a baking dish. Spoon some of the meat mixture onto each tortilla, roll it up, and place it seam side down in the baking dish.\n\n6. Pour the remaining enchilada sauce over the rolled tortillas, making sure they are all coated.\n\n7. Sprinkle shredded cheese over the top of the enchiladas.\n\n8. Cover the baking dish with foil and bake in the preheated oven for 20-25 minutes, until the c

In [36]:
chain.invoke("How much is 2+2?")

'2 + 2 equals 4.'

## Introducing prompt templates

We want to provide the model with some context and the question. [Prompt templates](https://python.langchain.com/docs/modules/model_io/prompts/quick_start) are a simple way to define and reuse prompts.

In [37]:
from langchain.prompts import ChatPromptTemplate

template = """
Answer the question based on the context below. If you can't 
answer the question, reply "I don't know".

Context: {context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
prompt.format(context="Mary's sister is Susana", question="Who is Mary's sister?")

'Human: \nAnswer the question based on the context below. If you can\'t \nanswer the question, reply "I don\'t know".\n\nContext: Mary\'s sister is Susana\n\nQuestion: Who is Mary\'s sister?\n'

We can now chain the prompt with the model and the output parser.

In [38]:
chain = prompt | model | parser
chain.invoke({
    "context": "Mary's sister is Susana",
    "question": "Who is Mary's sister?"
})

'Susana'

## Transcribing the YouTube Video

The context we want to send the model comes from a YouTube video. Let's download the video and transcribe it using [OpenAI's Whisper](https://openai.com/research/whisper).

In [39]:
import tempfile
import whisper
from pytube import YouTube


# Let's do this only if we haven't created the transcription file yet.
if not os.path.exists("transcription.txt"):
    youtube = YouTube(YOUTUBE_VIDEO)
    audio = youtube.streams.filter(only_audio=True).first()

    # Let's load the base model. This is not the most accurate
    # model but it's fast.
    whisper_model = whisper.load_model("base")

    with tempfile.TemporaryDirectory() as tmpdir:
        file = audio.download(output_path=tmpdir)
        transcription = whisper_model.transcribe(file, fp16=False)["text"].strip()

        with open("transcription.txt", "w") as file:
            file.write(transcription)

Let's read the transcription and display the first few characters to ensure everything works as expected.

In [40]:
with open("transcription.txt") as file:
    transcription = file.read()

transcription[:100]

'We recently completed a series on Oasis that presented us with one of what in my opinion is one of t'

## Splitting the transcription

Since we can't use the entire transcription as the context for the model, a potential solution is to split the transcription into smaller chunks. We can then invoke the model using only the relevant chunks to answer a particular question:

Let's start by loading the transcription in memory:

In [41]:
from langchain_community.document_loaders import TextLoader

loader = TextLoader("transcription.txt")
text_documents = loader.load()
text_documents

[Document(page_content="We recently completed a series on Oasis that presented us with one of what in my opinion is one of the most difficult ethical dilemmas in all of Fallout 3. This video will make more sense if you first watch that series by clicking here. When we arrived at Oasis, we discovered that a group called the Tree Minders were worshipping a talking tree. It was only later we realized that the talking tree is actually herald, a human being who was mutated by FEV nearly 200 years ago. During the events of Fallout 2, a tree-like structure began to sprout out of his head, and it grew over the ensuing decades as herald wandered the continent. It finally became too much for him when he reached the capital wasteland, and when he stopped here on the Oasis, his mutation took root, stopping herald in place permanently for the rest of his life. His mutation, which he called Bob, then went on to produce seeds, that the winds scattered all over the place, which miraculously took root 

There are many different ways to split a document. For this example, we'll use a simple splitter that splits the document into chunks of a fixed size. Check [Text Splitters](https://python.langchain.com/docs/modules/data_connection/document_transformers/) for more information about different approaches to splitting documents.

For illustration purposes, let's split the transcription into chunks of 100 characters with an overlap of 20 characters and display the first few chunks:

In [42]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)
text_splitter.split_documents(text_documents)[:5]

[Document(page_content='We recently completed a series on Oasis that presented us with one of what in my opinion is one of', metadata={'source': 'transcription.txt'}),
 Document(page_content='opinion is one of the most difficult ethical dilemmas in all of Fallout 3. This video will make', metadata={'source': 'transcription.txt'}),
 Document(page_content='video will make more sense if you first watch that series by clicking here. When we arrived at', metadata={'source': 'transcription.txt'}),
 Document(page_content='When we arrived at Oasis, we discovered that a group called the Tree Minders were worshipping a', metadata={'source': 'transcription.txt'}),
 Document(page_content='were worshipping a talking tree. It was only later we realized that the talking tree is actually', metadata={'source': 'transcription.txt'})]

For our specific application, let's use 1000 characters instead:

In [43]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
documents = text_splitter.split_documents(text_documents)

## Loading transcription into the vector store

We initialized the vector store with a few random strings. Let's create a new vector store using the chunks from the video transcription.

In [53]:
from langchain_openai.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()


from sklearn.metrics.pairwise import cosine_similarity



In [54]:
from langchain_community.vectorstores import DocArrayInMemorySearch
vectorstore2 = DocArrayInMemorySearch.from_documents(documents, embeddings)

Let's set up a new chain using the correct vector store. This time we are using a different equivalent syntax to specify the [`RunnableParallel`](https://python.langchain.com/docs/expression_language/how_to/map) portion of the chain:

In [55]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
# retriever1 = vectorstore1.as_retriever()
# chain = setup | prompt | model | parser
# setup = RunnableParallel(context=retriever1, question=RunnablePassthrough())

chain = (
    {"context": vectorstore2.as_retriever(), "question": RunnablePassthrough()}
    | prompt
    | model
    | parser
)
chain.invoke("Who is Harold?")

'Harold is a character mentioned in the context, who is described as living in a state of suffering and facing the possibility of being enslaved for eternity by either Birch or Laurel. Harold is also mentioned as the guardian of a grove where only he and Bob can produce seeds for new life. Additionally, Harold is described as having a close relationship with a child named sapling you, who considers Harold her best friend.'

## Setting up Pinecone

So far we've used an in-memory vector store. In practice, we need a vector store that can handle large amounts of data and perform similarity searches at scale. For this example, we'll use [Pinecone](https://www.pinecone.io/).

The first step is to create a Pinecone account, set up an index, get an API key, and set it as an environment variable `PINECONE_API_KEY`.

Then, we can load the transcription documents into Pinecone:

In [56]:
from langchain_pinecone import PineconeVectorStore

index_name = "fallout"

pinecone = PineconeVectorStore.from_documents(
    documents, embeddings, index_name=index_name
)

Let's now run a similarity search on pinecone to make sure everything works:

In [57]:
pinecone.similarity_search("Who is Harold?")[:3]

[Document(page_content="who has ever lived really knows what kind of hell Harold is living with, but Harold. Burning Harold to death is one of the evilest things we could possibly do. But once he's dead, he's dead. He is no longer in pain. Choosing to side with either Birch or Laurel, Doom's Harold to an eternity of being a slave for someone else. By citing with Birch, we doom Harold to an eternity of being a slave to the Treeminders. He becomes their unwilling God forever. By citing with Treemother Laurel, we doom Harold to an eternity of being a slave to the people of the capital wasteland, revitalizing their world, providing them with resources. Honestly, those two options are worse than killing Harold with fire. The ends justify the means argument would likely side with Treemother Laurel and say that sure, we're sacrificing Harold, but it's a necessary sacrifice. After all, the needs of the many outweigh the needs of the few. But I wonder what that same person would say if faced wi

Let's setup the new chain using Pinecone as the vector store:

In [61]:
chain = (
    {"context": pinecone.as_retriever(), "question": RunnablePassthrough()}
    | prompt
    | model
    | parser
)

chain.invoke("Who is Harold?")

'Harold is a character who is described as living in a kind of hell, facing the possibility of being burned to death or becoming a slave for someone else for eternity. He is also connected to the production of seeds and is mentioned as having a close relationship with a character referred to as "sapling you".'