<a href="https://colab.research.google.com/github/Abhishek603124/YouTube_ChatBot/blob/main/RAG_based_Youtube_Chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [31]:
!pip install -q youtube-transcript-api \
               langchain-community \
               langchain-huggingface \
               faiss-cpu \
               sentence-transformers \
               transformers \
               python-dotenv


In [32]:
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint


# Step1A-Indexing(Document Ingestion)

In [91]:
video_id = "Gfr50f6ZBvo" # only the ID, not full URL
try:
    # If you don’t care which language, this returns the “best” one
    transcript_list = YouTubeTranscriptApi.get_transcript(video_id, languages=["en"])

    # Flatten it to plain text
    transcript = " ".join(chunk["text"] for chunk in transcript_list)
    trimmed_transcript = transcript[:1000]
    print(trimmed_transcript)

except TranscriptsDisabled:
    print("No captions available for this video.")

the following is a conversation with demus hasabis ceo and co-founder of deepmind a company that has published and builds some of the most incredible artificial intelligence systems in the history of computing including alfred zero that learned all by itself to play the game of gold better than any human in the world and alpha fold two that solved protein folding both tasks considered nearly impossible for a very long time demus is widely considered to be one of the most brilliant and impactful humans in the history of artificial intelligence and science and engineering in general this was truly an honor and a pleasure for me to finally sit down with him for this conversation and i'm sure we will talk many times again in the future this is the lex friedman podcast to support it please check out our sponsors in the description and now dear friends here's demis hassabis let's start with a bit of a personal question am i an ai program you wrote to interview people until i get good enough 

In [92]:
trimmed_transcript_list = transcript_list[:50]
trimmed_transcript_list

[{'text': 'the following is a conversation with',
  'start': 0.08,
  'duration': 3.44},
 {'text': 'demus hasabis', 'start': 1.76, 'duration': 4.96},
 {'text': 'ceo and co-founder of deepmind', 'start': 3.52, 'duration': 5.119},
 {'text': 'a company that has published and builds',
  'start': 6.72,
  'duration': 4.48},
 {'text': 'some of the most incredible artificial',
  'start': 8.639,
  'duration': 4.561},
 {'text': 'intelligence systems in the history of',
  'start': 11.2,
  'duration': 4.8},
 {'text': 'computing including alfred zero that',
  'start': 13.2,
  'duration': 3.68},
 {'text': 'learned', 'start': 16.0, 'duration': 2.96},
 {'text': 'all by itself to play the game of gold',
  'start': 16.88,
  'duration': 4.559},
 {'text': 'better than any human in the world and',
  'start': 18.96,
  'duration': 5.6},
 {'text': 'alpha fold two that solved protein',
  'start': 21.439,
  'duration': 4.241},
 {'text': 'folding', 'start': 24.56, 'duration': 4.16},
 {'text': 'both tasks consider

# Step1B-Indexing (Text Splitting)

In [66]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=150)
chunks = splitter.create_documents([transcript])

In [67]:
len(chunks)

158

In [68]:
chunks[0]

Document(metadata={}, page_content="the following is a conversation with demus hasabis ceo and co-founder of deepmind a company that has published and builds some of the most incredible artificial intelligence systems in the history of computing including alfred zero that learned all by itself to play the game of gold better than any human in the world and alpha fold two that solved protein folding both tasks considered nearly impossible for a very long time demus is widely considered to be one of the most brilliant and impactful humans in the history of artificial intelligence and science and engineering in general this was truly an honor and a pleasure for me to finally sit down with him for this conversation and i'm sure we will talk many times again in the future this is the lex friedman podcast to support it please check out our sponsors in the description and now dear friends here's demis hassabis let's start with a bit of a personal question am i an ai program you wrote to inter

# Step 1c & 1d - Indexing (Embedding Generation and Storing in Vector Store)



In [69]:
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

In [70]:
vector_store = FAISS.from_documents(chunks, embeddings)

In [71]:
vector_store.index_to_docstore_id

{0: 'c0e4dd4b-db0d-45fe-a372-4996b270f619',
 1: '9639c855-100f-48ac-bb74-4a28bbe32a44',
 2: '14f7a30e-b7cf-471a-8e93-96020a77b5dd',
 3: '17da1df1-77ba-4401-abe7-24ad53175fb1',
 4: '1052f1aa-cba9-47dc-9acd-575f64e9c772',
 5: '416e09a3-aa15-47f9-8654-7a00ba7ba25c',
 6: '605c769c-92c2-4b3e-9edf-44e5003b607c',
 7: 'c3b683ef-7b54-40ea-8a1b-44d79dcbb3d3',
 8: 'cd835152-8125-4346-8842-6b1372b0ead7',
 9: 'a3bdff42-5085-49d9-8d20-b29b81076052',
 10: '54c5ef3b-b66a-4421-b800-34e72e5ae8a1',
 11: '3d895f06-fbd7-4c3e-8a08-b1c219d7d146',
 12: '8ae56fce-c210-4e25-98e7-ba4fa266e36e',
 13: '97296013-7fa4-4b57-b514-fc26a852677f',
 14: '47fbfd51-8d80-4925-b71b-3719d1f88f10',
 15: '4eea2817-b186-4aaf-8e74-686745485c6a',
 16: '28324842-831a-4f95-9cff-2965f7b86f60',
 17: 'def96075-ab7e-43db-a1e5-79a82bdc6cd9',
 18: '465d6960-e088-44a2-9af4-d5c6a4b780e9',
 19: '940b5a53-3743-4e66-8f76-4c5eb52fda19',
 20: 'f9f51b96-2bf5-4061-9739-358db62f3d9e',
 21: '5ccf9b27-b1d5-4e81-a91e-180e3c576540',
 22: '744fe97a-65fd-

In [72]:
vector_store.get_by_ids(['c0cc193e-bfc2-4329-9d33-fd2d4d46a131'])

[Document(id='c0cc193e-bfc2-4329-9d33-fd2d4d46a131', metadata={}, page_content="physics of today exactly just here here's glimpses of no like there's a much uh a much more elaborate world or a much simpler world or something a much deeper maybe simpler explanation yes of things right than the standard model of physics which we know doesn't work but we still keep adding to so um and and that's how i think the beginning of an explanation would look and it would start encompassing many of the mysteries that we have wondered about for thousands of years like you know consciousness uh life and gravity all of these things yeah giving us a glimpses of explanations for those things yeah well um damas dear one of the special human beings in this giant puzzle of ours and it's a huge honor that you would take a pause from the bigger puzzle to solve this small puzzle of a conversation with me today it's truly an honor and a pleasure thank you thank you i really enjoyed it thanks lex thanks for lis

# Step 2 - Retrieval

In [73]:
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 4})

In [74]:
retriever

VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x7929c074e010>, search_kwargs={'k': 4})

In [75]:
retriever.invoke('What is deepmind')

[Document(id='f9f51b96-2bf5-4061-9739-358db62f3d9e', metadata={}, page_content="of our vision at the start of deepmind was that we would use games very heavily uh as our main testing ground certainly to begin with um because it's super efficient to use games and also you know it's very easy to have metrics to see how well your systems are improving and what direction your ideas are going in and whether you're making incremental improvements and because those games are often rooted in something that humans did for a long time beforehand there's already a strong set of rules like it's already a damn good benchmark yes it's really good for so many reasons because you've got you've got you've got clear measures of how good humans can be at these things and in some cases like go we've been playing it for thousands of years um and and uh often they have scores or at least win conditions so it's very easy for reward learning systems to get a reward it's very easy to specify what that reward i

# Step 3 - Augmentation

In [76]:
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint
import os



# Use a model that works well via Hugging Face Inference API
llm = HuggingFaceEndpoint(
    repo_id="meta-llama/Meta-Llama-3-8B-Instruct",  # or any from the table
    task="text-generation"
)

# Wrap for chat-style usage
model = ChatHuggingFace(llm=llm)

In [77]:
prompt = PromptTemplate(
    input_variables=["context", "question"],
    template="""
You are an expert assistant that answers questions based on a YouTube transcript.

Here is the transcript chunk:\n\n{context}

Answer the following question as clearly and specifically as possible:
{question}
"""
)

In [78]:
question          = "is the topic of nuclear fusion discussed in this video? if yes then what was discussed"
retrieved_docs    = retriever.invoke(question)

In [79]:
retrieved_docs

[Document(id='69f2a6ba-d135-4c4b-9108-9cba9e0bc5e5', metadata={}, page_content="is we would like to learn that instead and they also had a simulator of these plasma so there were lots of criteria that matched what we we like to to to use so can ai eventually solve nuclear fusion well so we with this problem and we published it in a nature paper last year uh we held the fusion that we held the plasma in specific shapes so actually it's almost like carving the plasma into different shapes and control and hold it there for the record amount of time so um so that's one of the problems of of fusion sort of um solved so i have a controller that's able to no matter the shape uh contain it continue yeah contain it and hold it in structure and there's different shapes that are better for for the energy productions called droplets and and and so on so um so that was huge and now we're looking we're talking to lots of fusion startups to see what's the next problem we can tackle uh in the fusion a

In [80]:
context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
context_text

"is we would like to learn that instead and they also had a simulator of these plasma so there were lots of criteria that matched what we we like to to to use so can ai eventually solve nuclear fusion well so we with this problem and we published it in a nature paper last year uh we held the fusion that we held the plasma in specific shapes so actually it's almost like carving the plasma into different shapes and control and hold it there for the record amount of time so um so that's one of the problems of of fusion sort of um solved so i have a controller that's able to no matter the shape uh contain it continue yeah contain it and hold it in structure and there's different shapes that are better for for the energy productions called droplets and and and so on so um so that was huge and now we're looking we're talking to lots of fusion startups to see what's the next problem we can tackle uh in the fusion area so another fascinating place in a paper title pushing the frontiers of\n\nh

In [81]:
final_prompt = prompt.invoke({"context": context_text, "question": question})

In [82]:
final_prompt

StringPromptValue(text="\nYou are an expert assistant that answers questions based on a YouTube transcript.\n\nHere is the transcript chunk:\n\nis we would like to learn that instead and they also had a simulator of these plasma so there were lots of criteria that matched what we we like to to to use so can ai eventually solve nuclear fusion well so we with this problem and we published it in a nature paper last year uh we held the fusion that we held the plasma in specific shapes so actually it's almost like carving the plasma into different shapes and control and hold it there for the record amount of time so um so that's one of the problems of of fusion sort of um solved so i have a controller that's able to no matter the shape uh contain it continue yeah contain it and hold it in structure and there's different shapes that are better for for the energy productions called droplets and and and so on so um so that was huge and now we're looking we're talking to lots of fusion startups

# Step 4 - Generation

In [83]:
answer = model.invoke(final_prompt)
print(answer.content)

Yes, the topic of nuclear fusion is discussed in this video. 

Specifically, the speaker talks about the challenges of achieving nuclear fusion and how they are using Artificial Intelligence (AI) and Deep Reinforcement Learning (DRL) to solve some of these challenges. 

The discussion includes topics such as:

1. The need to hold plasma in specific shapes and control it for a record amount of time, which is a problem in fusion that has been solved.
2. The collaboration with EPFL in Switzerland to use their test reactor and work on plasma control.
3. The use of AI and DRL to solve the bottleneck problems in fusion, such as plasma control.
4. The speaker's personal interest in solving fusion and its potential transformative impact on energy and climate challenges.

Overall, the speaker is discussing their work in using AI and DRL to tackle some of the challenges in achieving nuclear fusion.


# Building a Chain

In [84]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser

In [85]:
def format_docs(retrieved_docs):
  context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
  return context_text

In [86]:
parallel_chain = RunnableParallel({
    'context': retriever | RunnableLambda(format_docs),
    'question': RunnablePassthrough()
})

In [87]:
parallel_chain.invoke('who is Demis')

{'context': "the following is a conversation with demus hasabis ceo and co-founder of deepmind a company that has published and builds some of the most incredible artificial intelligence systems in the history of computing including alfred zero that learned all by itself to play the game of gold better than any human in the world and alpha fold two that solved protein folding both tasks considered nearly impossible for a very long time demus is widely considered to be one of the most brilliant and impactful humans in the history of artificial intelligence and science and engineering in general this was truly an honor and a pleasure for me to finally sit down with him for this conversation and i'm sure we will talk many times again in the future this is the lex friedman podcast to support it please check out our sponsors in the description and now dear friends here's demis hassabis let's start with a bit of a personal question am i an ai program you wrote to interview people until i get

In [88]:
parser = StrOutputParser()

In [89]:
main_chain = parallel_chain | prompt | model | parser

In [90]:
main_chain.invoke('what is the Conversation Going on in the video')

'The conversation in the video is about the future of Artificial Intelligence (AI), specifically the capabilities of a new AI model called Gato. The conversation delves into various topics, including:\n\n1. The limitations of language as a primary modality for human communication and how AI systems can interact with humans in multiple ways, such as through visual, robotic, and body language.\n2. The potential for AI to understand and predict human thoughts and behaviors, and the possibility of communication with other life forms or consciousness.\n3. The importance of language in AI systems, particularly in relation to the Turing Test and the ability of machines to mimic human cognitive capabilities.\n4. The potential for AI to provide explanations for complex mysteries, such as consciousness, life, and gravity, and how these explanations might be more profound than the current understanding of physics.\n5. The excitement and anticipation for the future of AI and its potential impact o