### Youtube Chatbot using RAG

**Workflow Overview**

1. **Transcript Extraction**
  - Get transcript from YouTube video using either:
    - LangChain YouTube loader
    - YouTube API

2. **Text Splitting**
  - Divide transcript into manageable chunks.

3. **Embedding & Vector Store**
  - Generate embeddings for each chunk.
  - Store embeddings in a vector database.

4. **Retrieval**
  - User sends a query.
  - Query is embedded and a semantic search is performed in the vector store.

5. **Prompt Construction**
  - Merge retrieved chunks.
  - Create a prompt using the retrieved context and user query.

6. **LLM Response**
  - Send the prompt to a language model (LLM).
  - Return the generated response to the user.

#### Installation of librariees


In [None]:
!pip install -q youtube-transcript-api langchain-community langchain-ollama faiss-cpu tiktoken python-dotenv

In [None]:
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_ollama import ChatOllama, OllamaEmbeddings
from langchain.vectorstores import FAISS

#### step 1.1 : Indexing - Document ingestion

In [None]:
video_id = "uhWzVdGmX2w"
try:
    yt_api = YouTubeTranscriptApi()
    transcript_list = yt_api.list(video_id=video_id)
    transcript = transcript_list.find_generated_transcript(['en'])
    print(transcript.fetch())
except TranscriptsDisabled:
    print("No captions available for this video")

#### step 1.2 : Text splitting

In [None]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

# Extract text from snippets
fetched = transcript.fetch()
full_text = " ".join([snippet.text for snippet in fetched.snippets])

# Now split
chunks = splitter.create_documents([full_text])

In [None]:
# Use only the first 1000 characters for quick testing
test_text = full_text[:1000]
test_chunks = splitter.create_documents([test_text])
test_embeddings = OllamaEmbeddings(model="llama3:latest")
test_vector_store = FAISS.from_documents(test_chunks, test_embeddings)
print(f"Number of test chunks: {len(test_chunks)}")

In [None]:
len(chunks)

In [None]:
chunks[22]

#### step 1.3 & 1.4 : Embedding generaton adn storing in vector store

In [None]:
embeddings = OllamaEmbeddings(model="llama3:latest")
vector_store = FAISS.from_documents(chunks,embeddings)

In [None]:
vector_store.index_to_docstore_id

In [None]:
vector_store.get_by_ids(['5c82ad3d-8155-4479-a409-e5f2c1ec5982'])

#### step 2 Retrieval

In [None]:
query = "what is deepmind"
retriever =vector_store.as_retriever(
  search_type="similarity",
  search_kwargs={"k":4}
)

In [None]:
retriever.invoke(query)

#### step 3 - Augmentation

In [None]:
llm = ChatOllama(model="llama3:latest", temperature=0.2)

In [None]:
from langchain import PromptTemplate
prompt = PromptTemplate(
  template= """
  You are a helpful AI assistant.
  Answer ONLY from the provided transcript context.
  If the context is insufficient, just say " I don't know".
  {context},
  Question:{question}
  """,
  input_variables=["context","question"]
  )


In [None]:
question = "Is the topic of aliens discussed in this video ? If yes , what was discussed?"
retrieved_docs = retriever.invoke(question)

In [None]:
retrieved_docs

In [None]:
context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)

In [None]:
final_prompt = prompt.invoke({"context":context_text,"question":question})

#### step 4: Generation

In [None]:
answer = llm.invoke(final_prompt)
print(answer.content)

#### Same steps using Chains

In [None]:
from langchain_core.runnables import RunnablePassthrough, RunnableLambda, RunnableParallel
from langchain_core.output_parsers import StrOutputParser

In [None]:
def format_docs(retrieved_docs):
  context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
  return context_text

In [None]:
parallel_chain = RunnableParallel({
  'context': retriever | RunnableLambda(format_docs),
  'question':RunnablePassthrough()}
)

In [None]:
parallel_chain.invoke("who is robert greene")


In [None]:
parser = StrOutputParser()
main_chain = parallel_chain | prompt | llm | parser

In [None]:
main_chain.invoke("summarize everything about human nature from the video")

### Improvements for this Application

1. **UI Enhancements**
  - Streamlit-based interface
  - Chrome extension/plugin

2. **Evaluation**
  - Ragas
  - LangSmith

3. **Indexing**
  - Document ingestion for multiple languages
  - Semantic text splitting
  - Cloud-based vector store (e.g., Pinecone)

4. **Retrieval**
  - **Pre-Retrieval**
    - Query rewriting using LLM
    - Multi-query generation
    - Domain-aware routing (complex RAG systems)
  - **During Retrieval**
    - MMR (Maximal Marginal Relevance)
    - Hybrid retrieval
    - Re-ranking
  - **Post-Retrieval**
    - Contextual compression (retain only meaningful parts)

5. **Augmentation**
  - Prompt templating
  - Answer grounding
  - Context window optimization

6. **Generation**
  - Answers with citations
  - Guard railing

7. **System Design**
  - Multimodal RAG system
  - Agentic workflows
  - Memory-based architecture