# Youtube RAG

In [1]:
from dotenv import load_dotenv
load_dotenv()

True

## Installing Dependencies 

In [2]:
!pip install -q youtube-transcript-api langchain-community langchain-openai faiss-cpu tiktoken python-dotenv

## Importing Libraries

In [26]:
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled, NoTranscriptFound
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate
from urllib.parse import urlparse, parse_qs

## Utility Functions 

In [27]:
# Url parser to get video_id 
def get_video_id(url):
    """
    Extract video ID from various YouTube URL formats:
    - https://www.youtube.com/watch?v=VIDEO_ID
    - https://youtu.be/VIDEO_ID
    - https://m.youtube.com/watch?v=VIDEO_ID
    """
    # Handle youtu.be short URLs
    if 'youtu.be/' in url:
        return url.split('youtu.be/')[-1].split('?')[0]
    
    # Handle standard YouTube URLs
    parsed_url = urlparse(url)
    if parsed_url.hostname in ['www.youtube.com', 'youtube.com', 'm.youtube.com']:
        query = parsed_url.query
        video_id = parse_qs(query).get("v", [None])[0]
        return video_id
    
    # Try regex as fallback for embedded URLs or other formats
    pattern = r'(?:youtube\.com\/(?:[^\/]+\/.+\/|(?:v|e(?:mbed)?)\/|.*[?&]v=)|youtu\.be\/)([^"&?\/\s]{11})'
    match = re.search(pattern, url)
    return match.group(1) if match else None

## Steps

### Indexing 

#### Document Ingestion

In [51]:
def get_transcript(url):
    video_id = get_video_id(url)
    if not video_id: 
        return "Cannot fetch video ID from the URL"
    print(video_id)
    try:
        transcript_list = YouTubeTranscriptApi.get_transcript(video_id=video_id, languages=['en'])
        
        # Flatten into plain text
        transcript = ' '.join(chunk['text'] for chunk in transcript_list)
        return transcript
    except TranscriptsDisabled:
        return "Caption disabled."
    except NoTranscriptFound:
        return "No transcript found for this video."
    

In [52]:
transcript = get_transcript("https://www.youtube.com/watch?v=aircAruvnKk")
print(transcript)

aircAruvnKk
This is a 3. It's sloppily written and rendered at an extremely low resolution of 28x28 pixels, but your brain has no trouble recognizing it as a 3. And I want you to take a moment to appreciate how crazy it is that brains can do this so effortlessly. I mean, this, this and this are also recognizable as 3s, even though the specific values of each pixel is very different from one image to the next. The particular light-sensitive cells in your eye that are firing when you see this 3 are very different from the ones firing when you see this 3. But something in that crazy-smart visual cortex of yours resolves these as representing the same idea, while at the same time recognizing other images as their own distinct ideas. But if I told you, hey, sit down and write for me a program that takes in a grid of 28x28 pixels like this and outputs a single number between 0 and 10, telling you what it thinks the digit is, well the task goes from comically trivial to dauntingly difficult. 

#### Text Splitting

In [54]:
splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap=200)
chunks = splitter.create_documents([transcript])

In [55]:
print(len(chunks))

23


#### Embedding & Storage in Vector Store

In [56]:
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vector_store = FAISS.from_documents(chunks, embedding=embeddings)

In [57]:
vector_store.index_to_docstore_id

{0: '9cfc5a3f-6542-436f-9e5b-e5e7de60731e',
 1: '17a367f4-b0c8-4a31-aa50-8b55d47a8212',
 2: '8d059dd9-f06a-48d3-b41a-9d077acab981',
 3: '80e05153-2069-4b98-a04b-d591a870ce0f',
 4: '607c6634-0486-4c26-a2fc-9047ca9895ad',
 5: '86c1df05-de9b-4e58-abeb-8704e72d2bce',
 6: '95a8f2ca-6281-473f-85ee-f71fd744c5f7',
 7: '42104ab7-4fb7-4932-9623-4052ba43fcdc',
 8: '4f070a1d-4c85-4878-88b9-182c3a377820',
 9: '1b4eaefb-5f98-4ae0-ba6c-432ffc2d92a0',
 10: '114eb6d9-44ff-4b6a-9136-2422cb6a80cf',
 11: 'bb52b53e-c7fd-451a-8dc1-f97517df1fb4',
 12: 'd3ed007b-1cc3-4cd6-a2c8-27bcd51f44a3',
 13: '3f90c79c-3eeb-40f9-8885-08415c4d5cc4',
 14: '622e6b38-fff2-4947-8e8f-67e62a51ac0f',
 15: '91438ca7-d574-4216-9c34-a1e8b5885f2c',
 16: 'e8920907-6047-4bfa-aaad-1698cc43df2d',
 17: '09dfeac3-265d-42ab-9305-d5e64783e4dd',
 18: '3f1185e4-e5e3-40be-8435-bc3604bf7d14',
 19: '9da78654-81c0-49a2-9da3-fb031e164742',
 20: '06be6d5b-f817-4c3d-9147-254f72422a01',
 21: 'daab12ab-68f9-4f7d-bd9a-6da86a730db5',
 22: '5f9bd6b7-e853-

## Retriever

In [58]:
retriever = vector_store.as_retriever(search_type='similarity', search_kwargs={"k":4})

In [59]:
retriever.invoke("What is deepmind?")

[Document(id='daab12ab-68f9-4f7d-bd9a-6da86a730db5', metadata={}, page_content="videos on Patreon. I've been a little slow to progress in the probability series this summer, but I'm jumping back into it after this project, so patrons you can look out for updates there. To close things off here I have with me Lisha Li who did her PhD work on the theoretical side of deep learning and who currently works at a venture capital firm called Amplify Partners who kindly provided some of the funding for this video. So Lisha one thing I think we should quickly bring up is this sigmoid function. As I understand it early networks use this to squish the relevant weighted sum into that interval between zero and one, you know kind of motivated by this biological analogy of neurons either being inactive or active. Exactly."),
 Document(id='80e05153-2069-4b98-a04b-d591a870ce0f', metadata={}, page_content="the more powerful modern variants, and trust me it still has plenty of complexity for us to wrap ou

## Augmentation

In [63]:
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)

In [64]:
prompt = PromptTemplate(
    template="""
        You are a helpful assistant.
        Answer ONLY from the provided transcript context.
        If the context is insufficient, just say I don't know.

        {context}
        Question: {question}
    """,
    input_variables = ['context', 'question']
)

In [65]:
question = "Is topics of Mars discussed in the video? If yes then what is discussed"
retriever_docs = retriever.invoke(question) 

In [66]:
context_text = "\n\n".join(doc.page_content for doc in retriever_docs)

In [68]:
final_prompt = prompt.invoke({'context': context_text, 'question': question})

## Generate

In [70]:
answer = llm.invoke(final_prompt)
print(answer.content)

I don't know.


# Building a Chain 

![Local Image](../static/images/youtube_rag_chain_architecture.png)

In [71]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser

In [72]:
def format_docs(retrieve_docs):
    context_text = "\n\n".join(doc.page_content for doc in retriever_docs) 
    return context_text

In [75]:
parser = StrOutputParser() 

In [73]:
parallel_chain = RunnableParallel({
    'context': retriever | RunnableLambda(format_docs), 
    'question': RunnablePassthrough()
})

In [74]:
parallel_chain.invoke("Who is Neural Network?")

{'context': "videos on Patreon. I've been a little slow to progress in the probability series this summer, but I'm jumping back into it after this project, so patrons you can look out for updates there. To close things off here I have with me Lisha Li who did her PhD work on the theoretical side of deep learning and who currently works at a venture capital firm called Amplify Partners who kindly provided some of the funding for this video. So Lisha one thing I think we should quickly bring up is this sigmoid function. As I understand it early networks use this to squish the relevant weighted sum into that interval between zero and one, you know kind of motivated by this biological analogy of neurons either being inactive or active. Exactly.\n\nchallenge of recognizing digits? And how does it take on that challenge? How does this network learn the appropriate weights and biases just by looking at data? Well that's what I'll show in the next video, and I'll also dig a little more into wh

In [76]:
main_chain = parallel_chain | prompt | llm | parser 

In [77]:
main_chain.invoke("Can you summerize this video?")

'The video introduces the concept of neural networks, specifically focusing on their structure and how they can learn to recognize handwritten digits. It features Lisha Li, who has a background in deep learning and works at a venture capital firm. The video discusses the sigmoid function used in early networks to map outputs between zero and one, drawing an analogy to biological neurons. The presenter plans to explain how a neural network learns appropriate weights and biases from data in the next video. The aim is to provide a foundational understanding of neural networks, preparing viewers for more complex variants in the future. The video also mentions resources for further learning and encourages viewers to subscribe for updates.'