## Install libraries

In [1]:
%pip install -q youtube-transcript-api langchain-community langchain-huggingface sentence-transformers \
                faiss-cpu tiktoken python-dotenv

Note: you may need to restart the kernel to use updated packages.


In [None]:
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint
from dotenv import load_dotenv, find_dotenv
from langchain_core.messages import SystemMessage,AIMessage,HumanMessage
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings, HuggingFaceEndpoint
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate
import os
import re

dotenv_path = find_dotenv(filename=".env", raise_error_if_not_found=True)
load_dotenv(dotenv_path)


True

## Step 1a - Indexing (Document Ingestion)

In [3]:
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled

api = YouTubeTranscriptApi()

video_id = "Gfr50f6ZBvo"

try:
    transcript_list = api.fetch(video_id)
    
    transcript = " ".join([d.text for d in transcript_list])
    
    print("Transcript fetched successfully!")
    print(transcript[:1000] + "...") 

except TranscriptsDisabled:
    print(f"Transcripts are disabled for video: {video_id}")
except Exception as e:
    print(f"An error occurred: {e}")

Transcript fetched successfully!
the following is a conversation with demus hasabis ceo and co-founder of deepmind a company that has published and builds some of the most incredible artificial intelligence systems in the history of computing including alfred zero that learned all by itself to play the game of gold better than any human in the world and alpha fold two that solved protein folding both tasks considered nearly impossible for a very long time demus is widely considered to be one of the most brilliant and impactful humans in the history of artificial intelligence and science and engineering in general this was truly an honor and a pleasure for me to finally sit down with him for this conversation and i'm sure we will talk many times again in the future this is the lex friedman podcast to support it please check out our sponsors in the description and now dear friends here's demis hassabis let's start with a bit of a personal question am i an ai program you wrote to intervie

## Step 1b - Indexing (Text Splitting)

In [5]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.create_documents([transcript])

In [6]:
len(chunks)

168

In [7]:
chunks[100]

Document(metadata={}, page_content="and and kind of come up with descriptions of the electron clouds where they're gonna go how they're gonna interact when you put two elements together uh and what we try to do is learn a simulation uh uh learner functional that will describe more chemistry types of chemistry so um until now you know you can run expensive simulations but then you can only simulate very small uh molecules very simple molecules we would like to simulate large materials um and so uh today there's no way of doing that and we're building up towards uh building functionals that approximate schrodinger's equation and then allow you to describe uh what the electrons are doing and all materials sort of science and material properties are governed by the electrons and and how they interact so have a good summarization of the simulation through the functional um but one that is still close to what the actual simulation would come out with so what um how difficult is that to ask w

## Step 1c & 1d - Indexing (Embedding Generation and Storing in Vector Store)

In [8]:
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector_store = FAISS.from_documents(chunks, embeddings)

In [9]:
vector_store.index_to_docstore_id

{0: 'e1609d78-13de-40d3-bc35-7f4911679393',
 1: '208d7981-bfe5-4186-9130-2c1880d86a45',
 2: 'd68e78af-c3a7-4f6e-970f-0fdb91db9f61',
 3: '103bd24c-daeb-4566-9fcd-28b012bc9a2f',
 4: 'cd6f05ca-1518-44b0-bde4-8d8fd12f4a16',
 5: '947fbcf8-256a-4677-acee-7dc44554985b',
 6: 'e4b0d49c-0aad-412d-8ffd-6c6d3c081a31',
 7: '57916fa7-f1aa-4341-8994-11dc1504e0b3',
 8: '1565ade3-e430-4a6c-aed8-522151404a4c',
 9: '619b6a3f-8818-491e-8bd5-3c19871e717b',
 10: '030dd159-e999-45f1-8cfe-07be0ff87bfd',
 11: '04bbb67e-c282-4193-be49-d49855aa464d',
 12: 'e5a6b123-a93a-41d3-910b-8b1101ba66b6',
 13: 'b9ea1bce-4382-45d4-b881-f608ac82500e',
 14: '0d1ac1b3-e5c8-4175-bdeb-4adcff0b2743',
 15: '125e36fc-dcb3-4e26-bd88-22f0dfba9c96',
 16: '22171e95-ce50-4c9a-b6af-2a373b536111',
 17: '4f762880-797f-41ee-bb2c-7019a5a1a8de',
 18: 'fc6debb9-1839-49d6-b9c2-43c5de5a8d4c',
 19: '8937b4a8-a3e5-45ee-8063-6a17521078df',
 20: '9d51f309-dba4-4b37-a93b-a49920d69378',
 21: '58f84d3b-b776-4798-954c-ef6c05229b7b',
 22: '1fb0d2fc-149e-

In [10]:
vector_store.get_by_ids(['3b87214f-6e03-4d8b-aa95-eee12141f725'])

[Document(id='3b87214f-6e03-4d8b-aa95-eee12141f725', metadata={}, page_content="from the systems like all right how do i explain to the excuse me exactly all right let me i don't have time to explain uh maybe i'll draw you a picture that it is i mean how do you even begin um to answer that question well i think it would um what would you what would you think the answer could possibly look like i think it could it could start looking like uh uh more fundamental explanations of physics would be the beginning you know more careful specification of that taking you walking us through by the hand as to what one would do to maybe prove those things out maybe giving you glimpses of what things you totally missed in the physics of today exactly just here here's glimpses of no like there's a much uh a much more elaborate world or a much simpler world or something a much deeper maybe simpler explanation yes of things right than the standard model of physics which we know doesn't work but we still

## Step 2 - Retrieval

In [11]:
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 4})

In [12]:
retriever

VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x0000028340C11C60>, search_kwargs={'k': 4})

In [13]:
retriever.invoke('What is deepmind')

[Document(id='13a69bb9-9ff3-4e13-9199-5ae1457e2036', metadata={}, page_content="and how it works this is tough to uh ask you this question because you probably will say it's everything but let's let's try let's try to think to this because you're in a very interesting position where deepmind is the place of some of the most uh brilliant ideas in the history of ai but it's also a place of brilliant engineering so how much of solving intelligence this big goal for deepmind how much of it is science how much is engineering so how much is the algorithms how much is the data how much is the hardware compute infrastructure how much is it the software computer infrastructure yeah um what else is there how much is the human infrastructure and like just the humans interact in certain kinds of ways in all the space of all those ideas how much does maybe like philosophy how much what's the key if um uh if if you were to sort of look back like if we go forward 200 years look back what was the key 

## Step 3 - Augmentation

In [14]:
endpoint = HuggingFaceEndpoint(
    repo_id="mistralai/Mixtral-8x7B-Instruct-v0.1",
    temperature=0.7,
    max_new_tokens=512
)

llm = ChatHuggingFace(llm=endpoint)

In [15]:
prompt = PromptTemplate(
    template="""
      You are a helpful assistant.
      Answer ONLY from the provided transcript context.
      If the context is insufficient, just say you don't know.

      {context}
      Question: {question}
    """,
    input_variables = ['context', 'question']
)

In [16]:
question          = "is the topic of nuclear fusion discussed in this video? if yes then what was discussed"
retrieved_docs    = retriever.invoke(question)

In [17]:
retrieved_docs

[Document(id='aa863932-145c-4972-86be-b174429ea2fb', metadata={}, page_content="in this case in fusion we we collaborated with epfl in switzerland the swiss technical institute who are amazing they have a test reactor that they were willing to let us use which you know i double checked with the team we were going to use carefully and safely i was impressed they managed to persuade them to let us use it and um and it's a it's an amazing test reactor they have there and they try all sorts of pretty crazy experiments on it and um the the the what we tend to look at is if we go into a new domain like fusion what are all the bottleneck problems uh like thinking from first principles you know what are all the bottleneck problems that are still stopping fusion working today and then we look at we you know we get a fusion expert to tell us and then we look at those bottlenecks and we look at the ones which ones are amenable to our ai methods today yes right and and and then and would be intere

In [18]:
context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
context_text

"in this case in fusion we we collaborated with epfl in switzerland the swiss technical institute who are amazing they have a test reactor that they were willing to let us use which you know i double checked with the team we were going to use carefully and safely i was impressed they managed to persuade them to let us use it and um and it's a it's an amazing test reactor they have there and they try all sorts of pretty crazy experiments on it and um the the the what we tend to look at is if we go into a new domain like fusion what are all the bottleneck problems uh like thinking from first principles you know what are all the bottleneck problems that are still stopping fusion working today and then we look at we you know we get a fusion expert to tell us and then we look at those bottlenecks and we look at the ones which ones are amenable to our ai methods today yes right and and and then and would be interesting from a research perspective from our point of view from an ai point of\n\

In [19]:
final_prompt = prompt.invoke({"context": context_text, "question": question})

In [20]:
final_prompt

StringPromptValue(text="\n      You are a helpful assistant.\n      Answer ONLY from the provided transcript context.\n      If the context is insufficient, just say you don't know.\n\n      in this case in fusion we we collaborated with epfl in switzerland the swiss technical institute who are amazing they have a test reactor that they were willing to let us use which you know i double checked with the team we were going to use carefully and safely i was impressed they managed to persuade them to let us use it and um and it's a it's an amazing test reactor they have there and they try all sorts of pretty crazy experiments on it and um the the the what we tend to look at is if we go into a new domain like fusion what are all the bottleneck problems uh like thinking from first principles you know what are all the bottleneck problems that are still stopping fusion working today and then we look at we you know we get a fusion expert to tell us and then we look at those bottlenecks and we 

## Step 4 - Generation

In [21]:
answer = llm.invoke(final_prompt)
print(answer.content)

 Yes, the topic of nuclear fusion is discussed in this video. The speakers collaborated with EPFL in Switzerland, who have a test reactor that they allowed the team to use. They focused on identifying bottleneck problems that are currently preventing fusion from working and looked for problems that are amenable to AI methods to help accelerate progress in this field. They published a paper in Nature on holding plasma in specific shapes for a record amount of time using a controller, which was a significant step in solving fusion challenges. Now, they are talking to fusion startups to determine the next problem they can tackle in the fusion area. Additionally, they mentioned a paper on magnetic control of tokamak plasmas using deep reinforcement learning to solve nuclear fusion.


## Building a Chain

In [22]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser

In [23]:
def format_docs(retrieved_docs):
  context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
  return context_text

In [24]:
parallel_chain = RunnableParallel({
    'context': retriever | RunnableLambda(format_docs),
    'question': RunnablePassthrough()
})

In [25]:
parallel_chain.invoke('who is Demis')

{'context': "to get world peace because there's also other corrupting things like wanting power over people and this kind of stuff which is not necessarily satisfied by by just abundance but i think it will help um and i think uh but i think ultimately ai is not going to be run by any one person or one organization i think it should belong to the world belong to humanity um and i think maybe many there'll be many ways this will happen and ultimately um everybody should have a say in that do you have advice for uh young people in high school and college maybe um if they're interested in ai or interested in having a big impact on the world what they should do to have a career they can be proud of her to have a life they can be proud of i love giving talks to the next generation what i say to them is actually two things i i think the most important things to learn about and to find out about when you're when you're young is what are your true passions is first of all there's two things on

In [26]:
parser = StrOutputParser()

In [27]:
main_chain = parallel_chain | prompt | llm | parser

In [28]:
main_chain.invoke('Can you summarize the video')

" The speaker discusses the possibility of a more fundamental and simpler explanation of physics that goes beyond the standard model. This explanation would encompass mysteries that humans have wondered about for thousands of years, such as consciousness, life, and gravity. They also mention the importance of being able to explain things clearly and simply as a sign of intelligence, using Richard Feynman as an example. The conversation then shifts to a discussion of chess, computers, and AI, with a mention of Claude Shannon's first chess program and the eventual victory of IBM's Deep Blue over Garry Kasparov. The speaker expresses more admiration for Kasparov's mind, noting that he could play chess at a high level despite not having the computational power of Deep Blue."