# üé¨ YouTube Video Chatbot

A RAG (Retrieval-Augmented Generation) based chatbot that answers questions from YouTube video transcripts.

## üìã Overview

This notebook demonstrates how to:
1. **Extract transcripts** from YouTube videos
2. **Split text** into manageable chunks
3. **Create embeddings** using HuggingFace models
4. **Store vectors** in FAISS for similarity search
5. **Answer questions** using Groq LLM with retrieved context

## üõ†Ô∏è Tech Stack

| Component | Library |
|-----------|---------|
| Transcript Extraction | `youtube-transcript-api` |
| Text Splitting | `langchain-text-splitters` |
| Embeddings | `sentence-transformers/all-MiniLM-L6-v2` |
| Vector Store | `FAISS` |
| LLM | `Groq (llama3-8b-8192)` |
| Framework | `LangChain` |

## üì¶ Requirements

```bash
pip install youtube-transcript-api langchain langchain-groq langchain-huggingface langchain-community faiss-cpu
```

## üîë Environment Variables

Make sure to set your Groq API key:
```bash
export GROQ_API_KEY="your_api_key_here"
```

## üöÄ Workflow

```
YouTube URL ‚Üí Extract Video ID ‚Üí Fetch Transcript ‚Üí Split into Chunks
     ‚Üì
Create Embeddings ‚Üí Store in FAISS ‚Üí Retrieve Relevant Chunks
     ‚Üì
Build Prompt with Context ‚Üí Query LLM ‚Üí Get Answer
```

---

In [2]:
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_groq import ChatGroq
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate
import re

  from .autonotebook import tqdm as notebook_tqdm


In [3]:

def extract_video_id(url):
    """Extract video ID from various YouTube URL formats"""
    patterns = [
        r'(?:v=|\/)([0-9A-Za-z_-]{11}).*',
        r'(?:embed\/)([0-9A-Za-z_-]{11})',
        r'(?:watch\?v=)([0-9A-Za-z_-]{11})',
        r'youtu\.be\/([0-9A-Za-z_-]{11})',
    ]
    
    for pattern in patterns:
        match = re.search(pattern, url)
        if match:
            return match.group(1)
    return None


In [4]:
input_url=input("enter the url:")
video_id1=extract_video_id(input_url)

In [37]:
transcript=[]
ytt_api = YouTubeTranscriptApi()
transcript_data = ytt_api.fetch(video_id1)
for snippet in transcript_data:
    transcript.append({"text": snippet.text,"start": snippet.start, "duration":snippet.duration,})



In [38]:
transcript

[{'text': 'I think if people are like I want to',
  'start': 0.04,
  'duration': 3.879},
 {'text': 'accomplish this goal the very next',
  'start': 1.68,
  'duration': 4.719},
 {'text': 'question is why and then go why why why',
  'start': 3.919,
  'duration': 4.64},
 {'text': 'why why if your why is so strong how',
  'start': 6.399,
  'duration': 4.12},
 {'text': "it's going to be done it doesn't matter",
  'start': 8.559,
  'duration': 3.761},
 {'text': "you'll figure out you've got a book out",
  'start': 10.519,
  'duration': 4.2},
 {'text': 'called level up rub dial from the',
  'start': 12.32,
  'duration': 5.2},
 {'text': 'mindset Mentor podcast not everything',
  'start': 14.719,
  'duration': 3.841},
 {'text': "that's happened to you in your life is",
  'start': 17.52,
  'duration': 2.56},
 {'text': 'your fault but whatever it is that you',
  'start': 18.56,
  'duration': 2.959},
 {'text': 'have it is your responsibility to work',
  'start': 20.08,
  'duration': 2.359},
 {'tex

In [40]:
# Join all transcript text into a single string
full_text = " ".join([snippet["text"] for snippet in transcript])
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.create_documents([full_text])

In [42]:
len(chunks)

99

In [44]:
from langchain_huggingface import HuggingFaceEmbeddings
embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

vector_store = FAISS.from_documents(chunks, embedding)

In [45]:
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 4})

In [46]:
retriever.invoke('What is deepmind')

[Document(id='19006bcd-559d-4115-9f79-efb6d47630aa', metadata={}, page_content="is that we can project our minds into a future that doesn't exist yet and so what we can do right now is we can sit here you know we're in this studio everything is amazing we have food Water Shelter clothing everything but if I'm like as a lot of people do like if I'm starting business right I can sit here and I can think well I really want to start this business but then I start playing out all of the things that could go wrong which is what we tend to do not what goes right but what could go wrong when I project my mind into that future the crazy part is that my mind and my body are constantly connected so my my brain is sending neuropeptides to my body and creating hormones as if that moment is actually happening right now and so what happens is we tend to as humans as a protection mechanism we tend to project oursel into a future that is fearful that is scary and that fearful scary feeling usually Keep

In [47]:
from langchain_groq import ChatGroq
llm=ChatGroq(model="openai/gpt-oss-120b")
prompt = PromptTemplate(
    template="""
      You are a helpful assistant.
      Answer ONLY from the provided transcript context.
      If the context is insufficient, just say you don't know.

      {context}
      Question: {question}
    """,
    input_variables = ['context', 'question']
)

In [48]:
question          = "is the topic of nuclear fusion discussed in this video? if yes then what was discussed"
retrieved_docs    = retriever.invoke(question)

In [49]:
context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
context_text

"didn't see this and that's where they all are so it's like a lot of times we're trying to force things to happen we're trying to force creativity we're trying to force ideas but sometimes when I get there I'm like oh man yeah this is the thing I've been searching for and it's just I think every person just needs to give themsel just a little bit more space to not be on their phone to not be constantly you know and I a lot of people have children their their children are up can you wake up a little bit earlier not be on your phone and just sit in silence and see what comes up and and I think that it's really important for people because um what you'll notice is exactly what you're saying which is what we do like this life is so important but it is very insignificant and that's okay there's nothing wrong with that like I've I love watching videos on YouTube about how small the Earth is compared to the rest of the universe and it stresses my wife out she's like I'm going to have a panic\

In [50]:
final_prompt = prompt.invoke({"context": context_text, "question": question})

In [51]:
answer = llm.invoke(final_prompt)
print(answer.content)

I don't know.
