### Youtube Chatbot using RAG

**Workflow Overview**

1. **Transcript Extraction**
  - Get transcript from YouTube video using either:
    - LangChain YouTube loader
    - YouTube API

2. **Text Splitting**
  - Divide transcript into manageable chunks.

3. **Embedding & Vector Store**
  - Generate embeddings for each chunk.
  - Store embeddings in a vector database.

4. **Retrieval**
  - User sends a query.
  - Query is embedded and a semantic search is performed in the vector store.

5. **Prompt Construction**
  - Merge retrieved chunks.
  - Create a prompt using the retrieved context and user query.

6. **LLM Response**
  - Send the prompt to a language model (LLM).
  - Return the generated response to the user.

#### Installation of librariees


In [73]:
!pip install -q youtube-transcript-api langchain-community langchain-ollama faiss-cpu tiktoken python-dotenv

In [74]:
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_ollama import ChatOllama, OllamaEmbeddings
from langchain.vectorstores import FAISS

#### step 1.1 : Indexing - Document ingestion

In [75]:
video_id = "uhWzVdGmX2w"
try:
    yt_api = YouTubeTranscriptApi()
    transcript_list = yt_api.list(video_id=video_id)
    transcript = transcript_list.find_generated_transcript(['en'])
    print(transcript.fetch())
except TranscriptsDisabled:
    print("No captions available for this video")

FetchedTranscript(snippets=[FetchedTranscriptSnippet(text='the laws of human nature by robert', start=2.0, duration=2.319), FetchedTranscriptSnippet(text='greene', start=3.6, duration=2.0), FetchedTranscriptSnippet(text="we're going to be doing a detailed", start=4.319, duration=3.121), FetchedTranscriptSnippet(text='breakdown of this book and by the end of', start=5.6, duration=3.52), FetchedTranscriptSnippet(text="this video you're going to have a super", start=7.44, duration=3.279), FetchedTranscriptSnippet(text='clear understanding of how you can use', start=9.12, duration=3.04), FetchedTranscriptSnippet(text='the laws of human nature', start=10.719, duration=3.84), FetchedTranscriptSnippet(text='to live a better life and avoid toxic', start=12.16, duration=3.599), FetchedTranscriptSnippet(text='relationships', start=14.559, duration=3.201), FetchedTranscriptSnippet(text='we are all social creatures and knowing', start=15.759, duration=3.761), FetchedTranscriptSnippet(text='why peo

#### step 1.2 : Text splitting

In [76]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

# Extract text from snippets
fetched = transcript.fetch()
full_text = " ".join([snippet.text for snippet in fetched.snippets])

# Now split
chunks = splitter.create_documents([full_text])

In [None]:
# Use only the first 1000 characters for quick testing
test_text = full_text[:1000]
test_chunks = splitter.create_documents([test_text])
test_embeddings = OllamaEmbeddings(model="llama3:latest")
test_vector_store = FAISS.from_documents(test_chunks, test_embeddings)
print(f"Number of test chunks: {len(test_chunks)}")

In [None]:
len(chunks)

46

In [None]:
chunks[22]

Document(metadata={}, page_content="of desirability here are three strategies for stimulating desire one know when and how to withdraw be a little cold don't be needy never be too obvious with your opinions feelings values and tastes you want to allow for others to create their own picture of you in their imagination know when to withdraw maybe for a day maybe for a week create some sense of mystery around yourself two create rivalries of desire what we want almost always reflects what others want the man who has been single for five years gets into a relationship and all of a sudden has two other women interested in him or the child who wants his brother's new toy creating the impression that you or your work are desired by others will attract others in negotiation tactics always have another party that is also interested to create a rivalry three use induction associate yourself with something slightly illicit unconventional or defiant almost all people desire voyeurism which is seei

#### step 1.3 & 1.4 : Embedding generaton adn storing in vector store

In [None]:
embeddings = OllamaEmbeddings(model="llama3:latest")
vector_store = FAISS.from_documents(chunks,embeddings)

In [None]:
vector_store.index_to_docstore_id

{0: '35becfc6-7acd-429c-b4b7-df1764803ee5',
 1: 'fe0a3f2f-3301-41c3-bd71-f7b5ead3a0a6',
 2: 'dfcd0cb2-626a-4bf7-974f-baa7e5254fa3',
 3: '79917163-ea73-4b95-bd6c-93409decff88',
 4: '666e34c9-f8ad-45cc-9617-e9467f43ffbe',
 5: 'f498a409-edbd-4e1b-9773-1f8dfa2946d6',
 6: '5eddd65c-0a97-4a6b-8866-6133031e1231',
 7: '34a50c99-fc17-471b-babc-a83d7599bfec',
 8: '22558d02-5091-4e5b-8c6c-cb1ed7603c75',
 9: '2c7c618b-19f4-4caf-aa1e-5d66d3cce113',
 10: 'f9c76145-293b-48cc-ac14-6150c21c7987',
 11: '57d644b2-da0f-423d-a896-dba524234cdb',
 12: '9e4182d1-3afc-4442-95b0-22b89bbcc388',
 13: 'f7778f03-6c53-42eb-a77d-6ccc27c43b51',
 14: '5b72bac7-7c19-4bef-8f05-b1baecd57d7e',
 15: 'adf3ae2f-e464-46ed-9d06-72f265ae34e3',
 16: '3fb29580-6cf9-42cd-9e4b-f9a3e7fd6fcc',
 17: '9918068b-ba87-43f5-bc5f-dc8b6a1cf6fc',
 18: 'a7336de3-831c-40a9-b10c-199035e295ea',
 19: '8b770825-9108-4136-b1c6-f8cc0a7e15b3',
 20: 'f46ff19c-187e-4885-945d-d3bab0e5a32c',
 21: '5f9b62bb-f390-4a0c-90a0-07f10cf51190',
 22: 'e24a872b-20d4-

In [None]:
vector_store.get_by_ids(['5c82ad3d-8155-4479-a409-e5f2c1ec5982'])

[Document(id='5c82ad3d-8155-4479-a409-e5f2c1ec5982', metadata={}, page_content="you and finally don't be afraid to use the shadow at times to offend and even hurt people that have bad intentions or unjustly criticize you show your shadow proudly the more you repress the shadow the darker and uglier it becomes thank you for making it to the end of the video let me know in the comments below which law you found the most helpful and if you guys don't want to miss out on more videos like this one in the future please hit the like and subscribe button because it really helps out also this was only 9 of the 18 laws in the book so if you would like me to do the other laws which would be laws 10 to 18 please let me know down below so i can get an idea of if these laws were even helpful to you guys or not so thank you so much have a good day")]

#### step 2 Retrieval

In [None]:
query = "what is deepmind"
retriever =vector_store.as_retriever(
  search_type="similarity",
  search_kwargs={"k":4}
)

In [None]:
retriever.invoke(query)

[Document(id='5b72bac7-7c19-4bef-8f05-b1baecd57d7e', metadata={}, page_content="schopenhauer if you want to present yourself in the most optimal way and become a master at reading others practice these three skills observational skills one when you are having a conversation with someone pay close attention to micro these are split-second expressions such as forced smiles changes in the tone of voice or body language which gives mixed signals 2. get to know how someone operates in normal situations or at their baseline then compare that to how they operate under conditions of stress or excitement three people watch take some time to watch how people behave in situations take notes and also observe yourself remember the cracks in the mask we spoke about earlier those flashes of micro expressions people can try to prevent these cracks from leaking and showing their true feelings but that would be futile as they are for the most part completely unconscious and uncontrollable in order to se

#### step 3 - Augmentation

In [None]:
llm = ChatOllama(model="llama3:latest", temperature=0.2)

In [None]:
from langchain import PromptTemplate
prompt = PromptTemplate(
  template= """
  You are a helpful AI assistant.
  Answer ONLY from the provided transcript context.
  If the context is insufficient, just say " I don't know".
  {context},
  Question:{question}
  """,
  input_variables=["context","question"]
  )


In [None]:
question = "Is the topic of aliens discussed in this video ? If yes , what was discussed?"
retrieved_docs = retriever.invoke(question)

In [None]:
retrieved_docs

[Document(id='5c82ad3d-8155-4479-a409-e5f2c1ec5982', metadata={}, page_content="you and finally don't be afraid to use the shadow at times to offend and even hurt people that have bad intentions or unjustly criticize you show your shadow proudly the more you repress the shadow the darker and uglier it becomes thank you for making it to the end of the video let me know in the comments below which law you found the most helpful and if you guys don't want to miss out on more videos like this one in the future please hit the like and subscribe button because it really helps out also this was only 9 of the 18 laws in the book so if you would like me to do the other laws which would be laws 10 to 18 please let me know down below so i can get an idea of if these laws were even helpful to you guys or not so thank you so much have a good day"),
 Document(id='f46ff19c-187e-4885-945d-d3bab0e5a32c', metadata={}, page_content="given situation the big talker someone who is always talking a big game 

In [None]:
context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)

In [None]:
final_prompt = prompt.invoke({"context":context_text,"question":question})

#### step 4: Generation

In [None]:
answer = llm.invoke(final_prompt)
print(answer.content)

I don't know. The transcript does not mention aliens at all. It appears to be discussing various personality types or "toxic characters" and providing advice on how to deal with them.


#### Same steps using Chains

In [None]:
from langchain_core.runnables import RunnablePassthrough, RunnableLambda, RunnableParallel
from langchain_core.output_parsers import StrOutputParser

In [None]:
def format_docs(retrieved_docs):
  context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
  return context_text

In [None]:
parallel_chain = RunnableParallel({
  'context': retriever | RunnableLambda(format_docs),
  'question':RunnablePassthrough()}
)

In [None]:
parallel_chain.invoke("who is robert greene")


{'context': "of narcissist the complete control narcissist this type of narcissist in general has more ambition and higher energy levels than of an average deep narcissist as well as higher levels of insecurities control narcissists are hypersensitive to any criticism but they are good listeners so you need to be extra careful because they can mimic empathy well they do not do anything with the intention of connecting with people but rather to control and manipulate them the theoretical narcissists are masters of disguise they can play many different roles they will do anything to seem moral and altruistic and they love reveling in their status as the victim everything they do is for others to see and gain attention from the healthy narcissist healthy narcissists are excellent mood readers and can pick up on body language and tone of voice cues a healthy narcissist has high levels of optimism and confidence which they can use to lead a team and boost morale law three the law of role-pl

In [None]:
parser = StrOutputParser()
main_chain = parallel_chain | prompt | llm | parser

In [None]:
main_chain.invoke("summarize everything about human nature from the video")

'Based on the provided transcript, here\'s a summary of what can be inferred about human nature:\n\n* Humans have depressive moments, and it\'s essential to channel that energy into work or creative activities.\n* Some people hold onto grudges and resentments, which can lead to vengeful behavior. It\'s crucial to learn to let go of these emotions.\n* People with a resentful attitude tend to stew on their emotions rather than expressing them openly.\n* Humans have a "shadow" aspect that can be used to defend against criticism or unjust treatment. Repressing this shadow can make it darker and more malevolent.\n* There are different types of narcissists, including the complete control narcissist, which has higher energy levels, ambition, and insecurities than average deep narcissists.\n* Humans have a natural tendency to form connections with others, but some people may use manipulation or control to achieve this connection.\n* Empathy is a crucial aspect of human nature, allowing us to u

### Improvements for this Application

1. **UI Enhancements**
  - Streamlit-based interface
  - Chrome extension/plugin

2. **Evaluation**
  - Ragas
  - LangSmith

3. **Indexing**
  - Document ingestion for multiple languages
  - Semantic text splitting
  - Cloud-based vector store (e.g., Pinecone)

4. **Retrieval**
  - **Pre-Retrieval**
    - Query rewriting using LLM
    - Multi-query generation
    - Domain-aware routing (complex RAG systems)
  - **During Retrieval**
    - MMR (Maximal Marginal Relevance)
    - Hybrid retrieval
    - Re-ranking
  - **Post-Retrieval**
    - Contextual compression (retain only meaningful parts)

5. **Augmentation**
  - Prompt templating
  - Answer grounding
  - Context window optimization

6. **Generation**
  - Answers with citations
  - Guard railing

7. **System Design**
  - Multimodal RAG system
  - Agentic workflows
  - Memory-based architecture