In [1]:
%pip install -q youtube-transcript-api

Note: you may need to restart the kernel to use updated packages.


In [20]:
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace
from dotenv import load_dotenv

In [3]:
load_dotenv()

True

# 1. Indexing

- Document Ingestion

In [4]:
# video_id = "Vj2Q_11tol0"
video_id = "UclrVWafRAI"

try:
    # Try to fetch English transcript
    transcript_snippets = YouTubeTranscriptApi().fetch(video_id, languages=["en"])

    # Flatten into plain text
    transcript = " ".join(snippet.text for snippet in transcript_snippets)
    print(transcript[:500])

except TranscriptsDisabled:
    print("No captions available for this video.")
except Exception as e:
    print("Error:", e)

You've been working on AI safety for two decades at least. >> Yeah, I was convinced we can make safe AI, but the more I looked at it, the more I realized it's not something we can actually do. >> You have made a series of predictions about a variety of different states. So, what is your prediction for 2027? [Music] >> Dr. Roman Yimpolski is a globally recognized voice on AI safety and associate professor of computer science. He educates people on the terrifying truth of AI >> and what we need to


In [7]:
print(transcript_snippets[0])

FetchedTranscriptSnippet(text="You've been working on AI safety for two", start=0.16, duration=2.56)


- Text Splitting

In [8]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.create_documents([transcript])

In [9]:
len(chunks)

106

In [11]:
chunks[0]

Document(metadata={}, page_content="You've been working on AI safety for two decades at least. >> Yeah, I was convinced we can make safe AI, but the more I looked at it, the more I realized it's not something we can actually do. >> You have made a series of predictions about a variety of different states. So, what is your prediction for 2027? [Music] >> Dr. Roman Yimpolski is a globally recognized voice on AI safety and associate professor of computer science. He educates people on the terrifying truth of AI >> and what we need to do to save humanity. >> In 2 years, the capability to replace most humans in most occupations will come very quickly. I mean, in 5 years, we're looking at a world where we have levels of unemployment we never seen before. Not talking about 10% but 99%. And that's without super intelligence. A system smarter than all humans in all domains. So, it would be better than us at making new AI. But it's worse than that. We don't know how to make them safe and yet we 

- Embedding Generation & Storing in Vector Store

In [12]:
embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
vector_store = FAISS.from_documents(chunks, embeddings)

  embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
  from .autonotebook import tqdm as notebook_tqdm


In [13]:
vector_store.index_to_docstore_id

{0: '96419f9a-0375-448b-ac11-326a8505ee3e',
 1: '0300c138-fa27-4f83-9ec6-64e505d04a3d',
 2: 'f5b83793-a3e8-48d7-8e47-38f03bee6ddc',
 3: '210a9989-43e7-4480-b2d1-8d82f0b6df34',
 4: '35c0474e-2223-4796-96ed-a3a28ec704bb',
 5: '52d050bb-286b-4852-a3b1-4c13d5b8d408',
 6: '065f6207-e804-4578-99d4-b18f39cf1aa6',
 7: '3cc8cf33-473c-4a8f-8be6-8b9133236e4b',
 8: '02508239-2d11-41ae-aa8a-061cf8cb7184',
 9: '0a406d5e-df92-4e1d-87b4-7318b306b920',
 10: 'a66469c3-1c34-4de6-9a24-44b040688223',
 11: '6e47418a-61ba-475c-864b-caa1771c9003',
 12: 'a7a8c9c9-45c2-435d-9c2e-118b2b74ac99',
 13: 'bf1a195b-c4a1-4d63-8442-d6096ab7728e',
 14: '37e4dab8-0432-4e80-b1cb-9ddda1fe289e',
 15: 'd20fecf6-a619-4965-843e-696e87b26c28',
 16: 'eb73ecc7-5ce9-4f96-a9df-a416385c014a',
 17: '703eddef-09d7-413a-9d2e-8410921cec19',
 18: '27431a60-5f24-47bd-9011-c9509e1f6ffd',
 19: '0cf030e7-165e-4233-b212-4fddf0bbc1f1',
 20: '4bd8dd35-b3d5-4115-a6b2-7e55a9a9a15a',
 21: 'aaf20a67-4894-4405-9d4c-f445457fb035',
 22: '915b1eba-be77-

In [14]:
vector_store.get_by_ids(['311ef5c3-7d76-4660-921d-83721cd99183'])

[Document(id='311ef5c3-7d76-4660-921d-83721cd99183', metadata={}, page_content="the fact that we are all one that there's a a divine creator and maybe also they all seem to consequence beyond this life. So maybe I should be thinking more about how I behave in this life and and where I might end up thereafter. Roman, thank you. >> Amen. [Music] [Music]")]

# 2. Retrieval

In [15]:
retriever = vector_store.as_retriever(search_type='similarity', search_kwargs={'k':4})

In [16]:
retriever

VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x000002E23223DBB0>, search_kwargs={'k': 4})

In [18]:
retriever.invoke('what are the positive impacts of AI ?')

[Document(id='02508239-2d11-41ae-aa8a-061cf8cb7184', metadata={}, page_content="You go in and you find 10 more problems and then 100 more problems. And all of them are not just difficult. They're impossible to solve. There is no seinal work in this field where like we solved this, we don't have to worry about this. There are patches. There are little fixes we put in place and quickly people find ways to work around them. They drill break whatever safety mechanisms we have. So while progress in AI capabilities is exponential or maybe even hyper exponential, progress in AI safety is linear or constant. The gap is increasing. >> The gap between the >> how capable the systems are and how well we can control them, predict what they're going to do, explain their decision making. >> I think this is quite an important point because you said that we're basically patching over the issues that we find. So, we're developing this this core intelligence and then to stop it doing things or to stop it

# 3. Augmentation

In [19]:
llm = HuggingFaceEndpoint(
    model="meta-llama/Llama-3.3-70B-Instruct",
    task="text-generation",
    temperature=2.0
)

model = ChatHuggingFace(llm=llm)

In [21]:
prompt = PromptTemplate(
    template="""
        You are a helpful assistant.
        Answer ONLY from the provided transcript context.
        If the context is insufficient, just say you don't know.

        Context: {context} \n\n
        Question: {question}
""",
    input_variables=['context', 'question']
)

In [35]:
question = "Is the topic of aliens discussed in the video? If yes then what was discussed?"
# question = "Who was the first person to land on the moon ?"
retrieved_docs = retriever.invoke(question)
retrieved_docs

[Document(id='f5b83793-a3e8-48d7-8e47-38f03bee6ddc', metadata={}, page_content="simulation theory. >> I think we are in one. And there is a lot of agreement on this and this is what you should be doing in it so we don't shut it down. First, >> I see messages all the time in the comment section that some of you didn't realize you didn't subscribe. So, if you could do me a favor and double check if you're a subscriber to this channel, that would be tremendously appreciated. It's the simple, it's the free thing that anybody that watches this show frequently can do to help us here to keep everything going in this show in the trajectory it's on. So, please do double check if you've subscribed and uh thank you so much because in a strange way, you are you're part of our history and you're on this journey with us and I appreciate you for that. So, yeah, thank you, >> Dr. Roman Yimpolski. What is the mission that you're currently on? Cuz it's quite clear to me that you are on a bit of a missio

In [36]:
context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
context_text

"simulation theory. >> I think we are in one. And there is a lot of agreement on this and this is what you should be doing in it so we don't shut it down. First, >> I see messages all the time in the comment section that some of you didn't realize you didn't subscribe. So, if you could do me a favor and double check if you're a subscriber to this channel, that would be tremendously appreciated. It's the simple, it's the free thing that anybody that watches this show frequently can do to help us here to keep everything going in this show in the trajectory it's on. So, please do double check if you've subscribed and uh thank you so much because in a strange way, you are you're part of our history and you're on this journey with us and I appreciate you for that. So, yeah, thank you, >> Dr. Roman Yimpolski. What is the mission that you're currently on? Cuz it's quite clear to me that you are on a bit of a mission and you've been on this mission for I think the best part of two decades at\n

In [37]:
final_prompt = prompt.invoke({'context':context_text, 'question':question})
final_prompt

StringPromptValue(text="\n        You are a helpful assistant.\n        Answer ONLY from the provided transcript context.\n        If the context is insufficient, just say you don't know.\n\n        Context: simulation theory. >> I think we are in one. And there is a lot of agreement on this and this is what you should be doing in it so we don't shut it down. First, >> I see messages all the time in the comment section that some of you didn't realize you didn't subscribe. So, if you could do me a favor and double check if you're a subscriber to this channel, that would be tremendously appreciated. It's the simple, it's the free thing that anybody that watches this show frequently can do to help us here to keep everything going in this show in the trajectory it's on. So, please do double check if you've subscribed and uh thank you so much because in a strange way, you are you're part of our history and you're on this journey with us and I appreciate you for that. So, yeah, thank you, >>

# 4. Generation

In [None]:
answer = model.invoke(final_prompt)
answer.content

'Yes, the topic of aliens is briefly discussed in the video. The speaker uses an analogy of aliens coming to Earth in three years to illustrate the urgency and potential threat of advanced AI, saying "If aliens were coming to earth and you have three years to prepare, you would be panicking right now." This is used to convey the idea that the development of advanced AI is a significant and potentially existential risk that should be taken seriously.'

# Using Runnables

In [39]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser

In [40]:
def format_docs(retrieved_docs):
    context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
    return context_text

In [41]:
parallel_chain = RunnableParallel({
    'context': retriever | RunnableLambda(format_docs),
    'question': RunnablePassthrough()
})

In [42]:
parser = StrOutputParser()

In [43]:
main_chain = parallel_chain | prompt | model | parser

In [44]:
main_chain.invoke('Can you summarize the video')

'The video appears to be a discussion about simulation theory, artificial general intelligence (AGI), and the potential implications of advanced AI on society. The speaker mentions the concept of the singularity, predicted by Ray Kurzweil to occur by 2045, where AI progress becomes so rapid that humans can no longer keep up. They also discuss the potential capabilities of large language models and the limitations of human understanding in predicting the outcomes of advanced AI systems. Additionally, the speaker touches on the idea that many jobs may become obsolete in a world with AGI, but some tasks may still require human involvement. However, the video does not provide a clear conclusion or summary, and the conversation seems to be an ongoing discussion.'