<a href="https://colab.research.google.com/github/beruscoder/gen-AI/blob/main/rag_langchain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [6]:
import os
os.environ["OPENAI_API_KEY"] = ""

In [7]:
!pip install -q youtube-transcript-api langchain-community langchain-openai \
               faiss-cpu tiktoken python-dotenv

In [8]:
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate

In [12]:
video_id = "Gfr50f6ZBvo" # only the ID, not full URL
try:
    # If you don’t care which language, this returns the “best” one
    transcript_list = YouTubeTranscriptApi.get_transcript(video_id, languages=["en"])

    # Flatten it to plain text
    transcript = " ".join(chunk["text"] for chunk in transcript_list)
    print(transcript)

except TranscriptsDisabled:
    print("No captions available for this video.")

the following is a conversation with demus hasabis ceo and co-founder of deepmind a company that has published and builds some of the most incredible artificial intelligence systems in the history of computing including alfred zero that learned all by itself to play the game of gold better than any human in the world and alpha fold two that solved protein folding both tasks considered nearly impossible for a very long time demus is widely considered to be one of the most brilliant and impactful humans in the history of artificial intelligence and science and engineering in general this was truly an honor and a pleasure for me to finally sit down with him for this conversation and i'm sure we will talk many times again in the future this is the lex friedman podcast to support it please check out our sponsors in the description and now dear friends here's demis hassabis let's start with a bit of a personal question am i an ai program you wrote to interview people until i get good enough 

In [13]:
transcript_list

[{'text': 'the following is a conversation with',
  'start': 0.08,
  'duration': 3.44},
 {'text': 'demus hasabis', 'start': 1.76, 'duration': 4.96},
 {'text': 'ceo and co-founder of deepmind', 'start': 3.52, 'duration': 5.119},
 {'text': 'a company that has published and builds',
  'start': 6.72,
  'duration': 4.48},
 {'text': 'some of the most incredible artificial',
  'start': 8.639,
  'duration': 4.561},
 {'text': 'intelligence systems in the history of',
  'start': 11.2,
  'duration': 4.8},
 {'text': 'computing including alfred zero that',
  'start': 13.2,
  'duration': 3.68},
 {'text': 'learned', 'start': 16.0, 'duration': 2.96},
 {'text': 'all by itself to play the game of gold',
  'start': 16.88,
  'duration': 4.559},
 {'text': 'better than any human in the world and',
  'start': 18.96,
  'duration': 5.6},
 {'text': 'alpha fold two that solved protein',
  'start': 21.439,
  'duration': 4.241},
 {'text': 'folding', 'start': 24.56, 'duration': 4.16},
 {'text': 'both tasks consider

In [14]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.create_documents([transcript])

In [15]:
len(chunks)

168

In [16]:
chunks[100]

Document(metadata={}, page_content="and and kind of come up with descriptions of the electron clouds where they're gonna go how they're gonna interact when you put two elements together uh and what we try to do is learn a simulation uh uh learner functional that will describe more chemistry types of chemistry so um until now you know you can run expensive simulations but then you can only simulate very small uh molecules very simple molecules we would like to simulate large materials um and so uh today there's no way of doing that and we're building up towards uh building functionals that approximate schrodinger's equation and then allow you to describe uh what the electrons are doing and all materials sort of science and material properties are governed by the electrons and and how they interact so have a good summarization of the simulation through the functional um but one that is still close to what the actual simulation would come out with so what um how difficult is that to ask w

In [17]:
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vector_store = FAISS.from_documents(chunks, embeddings)

In [18]:
vector_store.index_to_docstore_id

{0: '94467a1b-f744-4939-9c57-90a9bd721e26',
 1: 'b2809fb8-084c-4c1d-abef-c5fdbcb750ad',
 2: '3c844e5c-ce87-45c6-8a61-cd56c78d30fe',
 3: 'd09c2128-6c9d-414b-a1c1-1b04ce8486b2',
 4: '4fbbdf26-eb61-4754-ad19-dc46495fad99',
 5: '9767d74c-e2f9-4436-9bf5-943ca2f9ad3d',
 6: '8309b7f4-fade-4c6a-9ac4-47462b0cce8b',
 7: '5deb994b-8f56-46dc-a438-74d736c27542',
 8: '0a505928-84d5-4e08-8cb8-aff18beb83ec',
 9: '3774db94-55f7-40fc-b4dc-ae34f11c11a2',
 10: '62365edd-708d-4fe5-8304-f88e16428d8f',
 11: 'd7a8b29f-6df2-45e6-afde-dbed18a033d7',
 12: '6cdea144-5b59-4eee-9d36-f9d0c8ec8d7c',
 13: 'b1694a2e-6f9f-4f15-872a-344ee410c16d',
 14: 'fae08a51-703c-40c7-ad75-9ea13db38b27',
 15: '257cd33a-ba0a-407b-aba6-824d73880b87',
 16: 'a284f81f-7989-4841-93f2-c0175ed61f25',
 17: '33774982-b64e-4bf6-9e4b-c624cba2ae06',
 18: '2a93c57f-d44e-4740-9e9b-a94eae9b0c0c',
 19: '86778b0a-8e80-4591-ba79-ac18db581ecd',
 20: 'd9a3d294-e8dc-41b4-8948-95b4abbc2bb1',
 21: '85a2ec42-31fe-4502-9257-1bc5edf7535f',
 22: '5eb0015c-8e0b-

In [19]:
vector_store.get_by_ids(['2436bdb8-3f5f-49c6-8915-0c654c888700'])

[]

# **Retrieval**

In [20]:
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 4})

In [21]:
retriever

VectorStoreRetriever(tags=['FAISS', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x7ece1a9fad90>, search_kwargs={'k': 4})

In [22]:
retriever.invoke('What is deepmind')

[Document(id='94467a1b-f744-4939-9c57-90a9bd721e26', metadata={}, page_content="the following is a conversation with demus hasabis ceo and co-founder of deepmind a company that has published and builds some of the most incredible artificial intelligence systems in the history of computing including alfred zero that learned all by itself to play the game of gold better than any human in the world and alpha fold two that solved protein folding both tasks considered nearly impossible for a very long time demus is widely considered to be one of the most brilliant and impactful humans in the history of artificial intelligence and science and engineering in general this was truly an honor and a pleasure for me to finally sit down with him for this conversation and i'm sure we will talk many times again in the future this is the lex friedman podcast to support it please check out our sponsors in the description and now dear friends here's demis hassabis let's start with a bit of a personal qu

# **Augmentation:**

In [23]:
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)

In [24]:
prompt = PromptTemplate(
    template="""
      You are a helpful assistant.
      Answer ONLY from the provided transcript context.
      If the context is insufficient, just say you don't know.

      {context}
      Question: {question}
    """,
    input_variables = ['context', 'question']
)

In [25]:
question          = "is the topic of nuclear fusion discussed in this video? if yes then what was discussed"
retrieved_docs    = retriever.invoke(question)

In [26]:
retrieved_docs

[Document(id='29f576c2-d6ab-48bb-b99a-c4bac2763e44', metadata={}, page_content="so we with this problem and we published it in a nature paper last year uh we held the fusion that we held the plasma in specific shapes so actually it's almost like carving the plasma into different shapes and control and hold it there for the record amount of time so um so that's one of the problems of of fusion sort of um solved so i have a controller that's able to no matter the shape uh contain it continue yeah contain it and hold it in structure and there's different shapes that are better for for the energy productions called droplets and and and so on so um so that was huge and now we're looking we're talking to lots of fusion startups to see what's the next problem we can tackle uh in the fusion area so another fascinating place in a paper title pushing the frontiers of density functionals by solving the fractional electron problem so you're taking on modeling and simulating the quantum mechanical 

In [27]:
context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
context_text

"so we with this problem and we published it in a nature paper last year uh we held the fusion that we held the plasma in specific shapes so actually it's almost like carving the plasma into different shapes and control and hold it there for the record amount of time so um so that's one of the problems of of fusion sort of um solved so i have a controller that's able to no matter the shape uh contain it continue yeah contain it and hold it in structure and there's different shapes that are better for for the energy productions called droplets and and and so on so um so that was huge and now we're looking we're talking to lots of fusion startups to see what's the next problem we can tackle uh in the fusion area so another fascinating place in a paper title pushing the frontiers of density functionals by solving the fractional electron problem so you're taking on modeling and simulating the quantum mechanical behavior of electrons yes um can you explain this work and can ai model and\n\n

In [28]:
final_prompt = prompt.invoke({"context": context_text, "question": question})

In [29]:
final_prompt

StringPromptValue(text="\n      You are a helpful assistant.\n      Answer ONLY from the provided transcript context.\n      If the context is insufficient, just say you don't know.\n\n      so we with this problem and we published it in a nature paper last year uh we held the fusion that we held the plasma in specific shapes so actually it's almost like carving the plasma into different shapes and control and hold it there for the record amount of time so um so that's one of the problems of of fusion sort of um solved so i have a controller that's able to no matter the shape uh contain it continue yeah contain it and hold it in structure and there's different shapes that are better for for the energy productions called droplets and and and so on so um so that was huge and now we're looking we're talking to lots of fusion startups to see what's the next problem we can tackle uh in the fusion area so another fascinating place in a paper title pushing the frontiers of density functionals

## **Generation**

In [30]:
answer = llm.invoke(final_prompt)
print(answer.content)

Yes, the topic of nuclear fusion is discussed in the video. The discussion includes the following points:

1. The speaker mentions a problem in fusion that was published in a Nature paper, where they held plasma in specific shapes to control and contain it for a record amount of time. They describe this process as "carving the plasma into different shapes."

2. They talk about collaborating with EPFL in Switzerland, which has a test reactor that they used for their experiments. The speaker emphasizes the importance of identifying bottleneck problems in fusion and how their AI methods can address these challenges.

3. The speaker expresses the belief that AI can help accelerate solutions in energy and climate, particularly in the field of fusion, which faces various challenges in physics, material science, and engineering.

4. They mention a specific paper on "magnetic control of tokamak plasmas using deep reinforcement learning," indicating their work on controlling high-temperature pl

# **building chain**

In [31]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser

In [32]:
def format_docs(retrieved_docs):
  context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
  return context_text

In [33]:
parallel_chain = RunnableParallel({
    'context': retriever | RunnableLambda(format_docs),
    'question': RunnablePassthrough()
})

In [34]:
parallel_chain.invoke('who is Demis')

{'context': "the following is a conversation with demus hasabis ceo and co-founder of deepmind a company that has published and builds some of the most incredible artificial intelligence systems in the history of computing including alfred zero that learned all by itself to play the game of gold better than any human in the world and alpha fold two that solved protein folding both tasks considered nearly impossible for a very long time demus is widely considered to be one of the most brilliant and impactful humans in the history of artificial intelligence and science and engineering in general this was truly an honor and a pleasure for me to finally sit down with him for this conversation and i'm sure we will talk many times again in the future this is the lex friedman podcast to support it please check out our sponsors in the description and now dear friends here's demis hassabis let's start with a bit of a personal question am i an ai program you wrote to interview people until i get

In [35]:
parser = StrOutputParser()

In [36]:
main_chain = parallel_chain | prompt | llm | parser

In [47]:
import os
os.environ["OPENAI_API_KEY"] = ""

In [48]:
!pip install -q youtube-transcript-api langchain-community langchain-openai \
               faiss-cpu tiktoken python-dotenv

In [49]:
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate

In [50]:
video_id = "VAzKqh00g3c"
try:
    transcript_list = YouTubeTranscriptApi.get_transcript(video_id, languages=["en"])

    transcript = " ".join(chunk["text"] for chunk in transcript_list)
    print(transcript)

except TranscriptsDisabled:
    print("No captions available for this video.")

When Demis Hassabas won the Nobel Prize last year, he celebrated by playing poker with a world champion of chess. Habisas loves a game, which is how he became a pioneer of artificial intelligence. The 48-year-old British scientist is co-founder and CEO of Google's AI powerhouse called Deep Mind. We met two years ago when chatbots announced a new age. Now, Habisas and others are chasing what's called artificial general intelligence, a silicon intellect as versatile as a human, but with superhuman speed and knowledge. After his Nobel and a nighthood from King Charles, we hurried back to London to see what's next from a genius who may hold the cards of our future. What's always guided me and and and the passion I've always had is understanding the world around us. I've always been um since I was a kid fascinated by the biggest questions, you know, the the the meaning of of life, the the the nature of consciousness, the nature of reality itself. I've loved reading about all the great scien

In [51]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.create_documents([transcript])

In [52]:
len(chunks)

92

In [53]:
chunks[30]

Document(metadata={}, page_content="in that process. You can ask a follow-up question. Salon hopes this new vision technology can be incorporated into Kmigo and available to students and teachers in 2 to 3 years, but he wants it to undergo more robust testing and meet strict guidelines for privacy and data security. I can imagine a lot of teachers watching this and thinking, okay, well, this is just going to replace me. Why would I want this in my classroom? It's like a Trojan horse. I'm pretty confident that teaching any job that is has a very human centric element of it is as long as it adapts reasonably well in this AI world they're going to be some of the safest jobs out there. You think there will always be a need for teachers in a classroom talking with the student looking the student in the eye. Oh yeah. I mean that's what I'll always want for my own children and frankly for anyone's children. And the hope here is that we can use artificial intelligence and other technologies to

In [54]:
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vector_store = FAISS.from_documents(chunks, embeddings)

In [55]:
vector_store.index_to_docstore_id

{0: 'e43bf5ce-5d8a-4f48-8630-d28a113b5efd',
 1: '91f21bc4-8108-48bb-a39f-529d85a08f0d',
 2: 'd003f05b-5457-4ae1-ae75-fe85c296f28a',
 3: '36f14a83-eb7e-46a2-931d-46bfc102c49e',
 4: '361ee6e2-81d4-4755-a731-fcdf90cb590c',
 5: '662c1809-5a1b-4315-8e40-01de7872080c',
 6: 'e457c372-ff3e-4577-a9bb-ffcd2e50772c',
 7: '64769cdb-1f94-48ac-8ac0-be8ced0511d5',
 8: 'f8be4e7e-d04e-42ed-9d34-c50d0f803ac2',
 9: '47e331fc-a72c-4c71-a60e-0d3314b5c275',
 10: '233d5f4b-26cc-47b1-99b9-d063a9e7afc5',
 11: '31857a61-287c-4249-951c-56c5f31fdd46',
 12: '4a743b6e-1afc-44c7-9457-0ddb510da6e7',
 13: '95efc86d-feba-44f4-a994-9d4f96acb03d',
 14: 'fc5555b5-9c00-4bde-85b2-57aea710da71',
 15: 'f5fb09c7-1db6-45b9-9155-95ab27ec221e',
 16: '4cf4ffa2-6e96-4180-9e2f-ab83848abfe2',
 17: 'bdf025c9-d72a-41dd-b757-b96aa71f2ea7',
 18: 'ba037930-d622-4c55-9076-105af0912ef0',
 19: '2e22d35b-4c20-4416-834e-52fbecceb038',
 20: '8c576a94-cb41-4080-816d-daee9ec1b1a7',
 21: 'fac15f48-a401-493a-9dec-ca0965621a4b',
 22: 'cd0eee8b-3d3d-

In [56]:
vector_store.get_by_ids(['c157bfdd-d529-48c8-b459-bdfb746d1591'])

[]

In [57]:
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 2})

In [58]:
retriever

VectorStoreRetriever(tags=['FAISS', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x7ece1a851f50>, search_kwargs={'k': 2})

In [59]:
retriever.invoke('what is AIagent')

[Document(id='c0a7ce8e-05a5-4ecc-99f6-2f2fa0af4acd', metadata={}, page_content='between smart weapons and dumb weapons. Lucky showed us how those so-called smart weapons can be synchronized on Andrew\'s AI platform. It\'s called Lattis. Lattis collects data from various sensors and sources, including satellites, drones, radar, and cameras, allowing, he says, the AI to analyze, move assets, and execute missions faster than a human. If you were having to require the human operator to actually map every single action and say, "Hey, do this, if that, then this." It would take so long to manage it that you would be better off just remotely piloting it. If it\'s the AI on board all these weapons that makes it possible to make it so easy. There are lots of people who go, "Oh, AI, I don\'t know. I don\'t trust it. It\'s going to go rogue." I would say that it is something to be aware of, but in the grand scheme of things, things to be afraid of, there\'s things that I\'m much more terrified of

In [60]:
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)

In [61]:
prompt = PromptTemplate(
    template="""
      You are a computer scientist.
      Answer ONLY from the provided transcript context.
      If the context is insufficient, just say you don't know.

      {context}
      Question: {question}
    """,
    input_variables = ['context', 'question']
)

In [62]:
question          = "is the topic of agents discussed in this video? if yes then what was discussed"
retrieved_docs    = retriever.invoke(question)

In [63]:
retrieved_docs

[Document(id='95efc86d-feba-44f4-a994-9d4f96acb03d', metadata={}, page_content="limits built into the system. And I wonder if the race for AI dominance is a race to the bottom for safety. So that's one of my big worries actually is that of course all of this energy and racing and resources is great for progress but it might incentivize certain actors in in that to cut corners and one of the corners that can be shortcut would be safety and responsibility. Um so the question is is how can we uh coordinate more you know as leading players but also nation states even I think this is an international thing. AI is going to affect every country, everybody in the world. Um, so I think it's really important that the world uh and the international community has a say in this. Can you teach an AI agent morality? I think you can. They learn by demonstration. They learn by teaching. Um, and I think that's one of the things we have to do with these systems is to give them uh a value system and a and

In [64]:
context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
context_text

"limits built into the system. And I wonder if the race for AI dominance is a race to the bottom for safety. So that's one of my big worries actually is that of course all of this energy and racing and resources is great for progress but it might incentivize certain actors in in that to cut corners and one of the corners that can be shortcut would be safety and responsibility. Um so the question is is how can we uh coordinate more you know as leading players but also nation states even I think this is an international thing. AI is going to affect every country, everybody in the world. Um, so I think it's really important that the world uh and the international community has a say in this. Can you teach an AI agent morality? I think you can. They learn by demonstration. They learn by teaching. Um, and I think that's one of the things we have to do with these systems is to give them uh a value system and a and a guidance and some guard rails around that much in the way that you would\n\n

In [65]:
final_prompt = prompt.invoke({"context": context_text, "question": question})

In [66]:
final_prompt

StringPromptValue(text="\n      You are a computer scientist.\n      Answer ONLY from the provided transcript context.\n      If the context is insufficient, just say you don't know.\n\n      limits built into the system. And I wonder if the race for AI dominance is a race to the bottom for safety. So that's one of my big worries actually is that of course all of this energy and racing and resources is great for progress but it might incentivize certain actors in in that to cut corners and one of the corners that can be shortcut would be safety and responsibility. Um so the question is is how can we uh coordinate more you know as leading players but also nation states even I think this is an international thing. AI is going to affect every country, everybody in the world. Um, so I think it's really important that the world uh and the international community has a say in this. Can you teach an AI agent morality? I think you can. They learn by demonstration. They learn by teaching. Um, a

In [67]:
answer = llm.invoke(final_prompt)
print(answer.content)

Yes, the topic of agents is discussed in the video. Specifically, it mentions an artificial companion called Astra, which is a new generation of chatbot that can see and hear, and interpret the world with its own eyes. The discussion includes how Astra was challenged with virtual paintings and how it responds to questions about the emotions exhibited by subjects in those paintings.


building chain


In [68]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser

In [69]:
def format_docs(retrieved_docs):
  context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
  return context_text

In [70]:
parallel_chain = RunnableParallel({
    'context': retriever | RunnableLambda(format_docs),
    'question': RunnablePassthrough()
})

In [71]:
parallel_chain.invoke('who is futurepedia')

{'context': "openly advertised, easy to use, and as Franchesca Mani found out, there isn't much that's been done to stop them. When you first heard the rumor, you didn't know that there were photos or a photo of you? No, we didn't know. I think that was like the most chaotic day I've ever witnessed. In a school, somebody gets an inkling of something and it just spreads. It's like rapid fire. It just goes through everyone. And so then when someone hears hears this, it's like, wait, like AI? Like no one thinks that could like happen to you. Franchesca Mani knew nothing about Nutify websites when she discovered she and several of the girls at Westfield High School in New Jersey had been targeted. According to a lawsuit later filed by one of the other girls through her parents, a boy at the school uploaded photos from Instagram to a site called Clothoff. We're naming the site to raise awareness of its potential dangers. There are more than a hundred of these Nutify websites. A quick search

In [72]:
parser = StrOutputParser()

In [73]:
main_chain = parallel_chain | prompt | llm | parser

In [74]:
main_chain.invoke('Can you summarize the video')

'The video discusses the excitement and potential of new vision technology being developed by Sal Khan and his team, in collaboration with Greg Brockman from OpenAI. This technology aims to enhance educational tools like Kmigo, making them available to students and teachers within 2 to 3 years. The technology allows for real-time interaction through live video, enabling AI to assist with lessons, such as anatomy quizzes, by recognizing drawings and providing feedback. The overall sentiment is one of wonder at the capabilities of AI and its application in education.'