In [1]:
import os
google_api_key = os.getenv("GOOGLE_API_KEY")

In [2]:
%pip install youtube-transcript-api

Collecting youtube-transcript-api
  Downloading youtube_transcript_api-1.1.1-py3-none-any.whl (485 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m485.9/485.9 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting defusedxml<0.8.0,>=0.7.1
  Downloading defusedxml-0.7.1-py2.py3-none-any.whl (25 kB)
Installing collected packages: defusedxml, youtube-transcript-api
Successfully installed defusedxml-0.7.1 youtube-transcript-api-1.1.1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [9]:
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate
from langchain.embeddings import SentenceTransformerEmbeddings

Step 1 Indexing

In [5]:
video_id = "T-D1OfcDW1M"
try:
    transcript_list = YouTubeTranscriptApi.get_transcript(video_id, languages=["en"])
    transcript = " ".join(chunk["text"] for chunk in transcript_list)
    print(transcript)

except TranscriptsDisabled:
    print("No captions available")

Large language models. They are everywhere. They get some things amazingly right and other things very interestingly wrong. My name is Marina Danilevsky. I am a Senior Research Scientist here at IBM Research. And I want to tell you about a framework to help large language models be more accurate and more up to date: Retrieval-Augmented Generation, or RAG. Let's just talk about the "Generation" part for a minute. So forget the "Retrieval-Augmented". So the generation, this refers to large language models, or LLMs, that generate text in response to a user query, referred to as a prompt. These models can have some undesirable behavior. I want to tell you an anecdote to illustrate this. So my kids, they recently asked me this question: "In our solar system, what planet has the most moons?" And my response was, “Oh, that's really great that you're asking this question. I loved space when I was your age.” Of course, that was like 30 years ago. But I know this! I read an article and the artic

In [6]:
splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap = 100
)
chunks = splitter.create_documents([transcript])

In [7]:
len(chunks)

15

In [8]:
chunks[0]

Document(metadata={}, page_content='Large language models. They are everywhere. They get some things amazingly right and other things very interestingly wrong. My name\xa0is Marina Danilevsky. I am a Senior Research Scientist here at IBM Research. And I want\xa0to tell you about a framework to help large language models be more accurate and more up to\xa0date: Retrieval-Augmented Generation, or RAG. Let\'s just talk about the "Generation" part for a\xa0minute. So forget the "Retrieval-Augmented". So the\xa0generation, this refers to large')

In [10]:
embeddings = SentenceTransformerEmbeddings(
    model_name="all-MiniLM-L6-v2"
)

  embeddings = SentenceTransformerEmbeddings(


In [11]:
vector_store = FAISS.from_documents(chunks, embeddings)

In [12]:
vector_store.index_to_docstore_id

{0: '1dfbb345-001d-433b-b1c5-ecb63d8569bf',
 1: 'b524b8cb-5855-4676-88bd-28cb3483b04b',
 2: '37f0831d-c389-4ead-bbc3-73d2695a5d10',
 3: '1eb80633-5bd4-4cb0-a0be-c2234c3ab1a4',
 4: 'c3a0d5fb-695b-4f0a-b3dc-0ed33f3c9a30',
 5: '123c6cf3-bc66-46fd-a959-cfb1e3432ba1',
 6: '6d2b03ce-36d9-4baa-bdb4-b3ec8ebd1d03',
 7: '26cd9609-cbbe-4622-8a70-6de111a14e15',
 8: 'e25b3b82-310f-4815-ae2a-f7f0276bed6f',
 9: '6a51bae1-1c9d-4ca8-ba11-af5b31e9df7f',
 10: '490c822c-48b5-4c2d-b2aa-a1ae53cfdc29',
 11: '5401b0d4-73db-42a9-8315-b3b3a5d2f235',
 12: '7660e850-de52-4184-9b55-999ba0f330bb',
 13: '34b3067d-cebf-4d97-9044-eb36abb08b87',
 14: 'e582efec-2ae1-47cc-99f4-dcf4b1b02b2b'}

In [14]:
vector_store.get_by_ids(["1eb80633-5bd4-4cb0-a0be-c2234c3ab1a4"])

[Document(id='1eb80633-5bd4-4cb0-a0be-c2234c3ab1a4', metadata={}, page_content="though I confidently said “I read an article, I know the answer!”, I'm not\xa0sourcing it. I'm giving the answer off the top of my head. And also, I actually haven't kept up with\xa0this for awhile, and my answer is out of date. So we have two problems here. One is no source.\xa0And the second problem is that I am out of date.\xa0\xa0 And these, in fact, are two behaviors that are\xa0often observed as problematic when interacting with large language models. They’re LLM\xa0challenges. Now, what would have happened")]

Step 2 Retrieval

In [15]:
retrieval = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 4})

In [16]:
retrieval

VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x13fe2cc10>, search_kwargs={'k': 4})

In [18]:
retrieval.invoke("What is RAG?")

[Document(id='6a51bae1-1c9d-4ca8-ba11-af5b31e9df7f', metadata={}, page_content='that says, "No, no, no." "First, go and retrieve\xa0relevant content." "Combine that with the user\'s question and only then generate the\xa0answer." So the prompt now has three parts: the instruction to pay attention to, the retrieved\xa0content, together with the user\'s question. Now give a response. And in fact, now you can give\xa0evidence for why your response was what it was.\xa0\xa0 So now hopefully you can see, how does RAG help the two LLM challenges that I had mentioned before?\xa0\xa0 So first of all, I\'ll'),
 Document(id='e582efec-2ae1-47cc-99f4-dcf4b1b02b2b', metadata={}, page_content='to give the large language model the best quality data on which\xa0to ground its response, and also the generative part so that the LLM can give the richest, best\xa0response finally to the user when it generates the answer. Thank you for learning more about RAG\xa0and like and subscribe to the channel. Thank y

Step 3 Augmentation

In [19]:
prompt = PromptTemplate(
    template="""
    You are a helpful assistant.
    Answer ONLY from the provided transcript context.
    If the context is insufficient, just say you do not know.

    {context}
    Question: {question}

    """,
    input_variables=['context', 'question']
)

In [23]:
question = "Is the topic of RAG discussed in this video? If yes explain"
retrieved_docs = retrieval.invoke(question)
retrieved_docs

[Document(id='6a51bae1-1c9d-4ca8-ba11-af5b31e9df7f', metadata={}, page_content='that says, "No, no, no." "First, go and retrieve\xa0relevant content." "Combine that with the user\'s question and only then generate the\xa0answer." So the prompt now has three parts: the instruction to pay attention to, the retrieved\xa0content, together with the user\'s question. Now give a response. And in fact, now you can give\xa0evidence for why your response was what it was.\xa0\xa0 So now hopefully you can see, how does RAG help the two LLM challenges that I had mentioned before?\xa0\xa0 So first of all, I\'ll'),
 Document(id='e25b3b82-310f-4815-ae2a-f7f0276bed6f', metadata={}, page_content='Jupiter anymore. We know that\xa0it is Saturn. What does this look like? Well, first user prompts the LLM\xa0with their question. They say, this is what my question was. And originally,\xa0if we\'re just talking to a generative model, the generative model says, “Oh, okay, I know\xa0the response. Here it is. Her

In [21]:
context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)

In [24]:
final_prompt = prompt.invoke({"context": context_text, "question": question})
final_prompt

StringPromptValue(text='\n    You are a helpful assistant.\n    Answer ONLY from the provided transcript context.\n    If the context is insufficient, just say you do not know.\n\n    that says, "No, no, no." "First, go and retrieve\xa0relevant content." "Combine that with the user\'s question and only then generate the\xa0answer." So the prompt now has three parts: the instruction to pay attention to, the retrieved\xa0content, together with the user\'s question. Now give a response. And in fact, now you can give\xa0evidence for why your response was what it was.\xa0\xa0 So now hopefully you can see, how does RAG help the two LLM challenges that I had mentioned before?\xa0\xa0 So first of all, I\'ll\n\nJupiter anymore. We know that\xa0it is Saturn. What does this look like? Well, first user prompts the LLM\xa0with their question. They say, this is what my question was. And originally,\xa0if we\'re just talking to a generative model, the generative model says, “Oh, okay, I know\xa0the r

Step 4: Generation

In [25]:
llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    temperature=0.1
)

In [26]:
answer = llm.invoke(final_prompt)
print(answer)

content="Yes, the topic of RAG (Retrieval Augmented Generation) is discussed.  The video explains that in the RAG framework, the LLM receives an instruction to retrieve relevant content before answering a user's question.  This retrieved content is combined with the user's question to generate a response.  The speaker also explains how RAG helps address the challenges of LLMs having outdated information and generating poor-quality responses." additional_kwargs={} response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'safety_ratings': []} id='run--a7c98501-1ae2-415a-9224-2e6fa31dd9b0-0' usage_metadata={'input_tokens': 490, 'output_tokens': 84, 'total_tokens': 574, 'input_token_details': {'cache_read': 0}}


Lets make this into a chain now

In [27]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser

In [28]:
def format_doc(retrieved_docs):
    context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
    return context_text

In [31]:
parallel_chain = RunnableParallel({
    'context': retrieval | RunnableLambda(format_doc),
    'question': RunnablePassthrough()
})

In [32]:
parallel_chain.invoke("What is RAG?")

{'context': 'that says, "No, no, no." "First, go and retrieve\xa0relevant content." "Combine that with the user\'s question and only then generate the\xa0answer." So the prompt now has three parts: the instruction to pay attention to, the retrieved\xa0content, together with the user\'s question. Now give a response. And in fact, now you can give\xa0evidence for why your response was what it was.\xa0\xa0 So now hopefully you can see, how does RAG help the two LLM challenges that I had mentioned before?\xa0\xa0 So first of all, I\'ll\n\nto give the large language model the best quality data on which\xa0to ground its response, and also the generative part so that the LLM can give the richest, best\xa0response finally to the user when it generates the answer. Thank you for learning more about RAG\xa0and like and subscribe to the channel. Thank you.\n\nJupiter anymore. We know that\xa0it is Saturn. What does this look like? Well, first user prompts the LLM\xa0with their question. They say, 

In [33]:
parser = StrOutputParser()

In [34]:
main_chain = parallel_chain | prompt | llm | parser

In [35]:
main_chain.invoke("can you summarize the video?")

"The video describes Retrieval Augmented Generation (RAG).  In RAG, a large language model (LLM) receives a user's question. Instead of immediately generating an answer, it first retrieves relevant content from a content store (which could be the internet or a closed collection of documents).  The LLM then combines this retrieved content with the user's question to generate a final, more accurate and evidence-based answer.  This addresses two challenges of LLMs: providing high-quality data for grounding the response and enabling richer, better responses.  The example given shows how a question about which planet has rings would be answered differently: a standard LLM might incorrectly say Jupiter, while a RAG-based LLM would retrieve relevant information and correctly identify Saturn."