In [1]:
from youtube_transcript_api import YouTubeTranscriptApi,TranscriptsDisabled

# import text-splitter
from langchain_text_splitters import RecursiveCharacterTextSplitter

# import embedding model
from langchain_ollama import OllamaEmbeddings

# import vector_stores
from langchain_community.vectorstores import FAISS

# import prompt template
from langchain_core.prompts import PromptTemplate

# import the LLM library
from langchain_groq import ChatGroq

### Step 1 : Indexing

Part 1: Document Ingestion

In [8]:
# take the video ID of the video you want
video_id = "FE-hM1kRK4Y"

transcript_list = YouTubeTranscriptApi.fetch(YouTubeTranscriptApi(),video_id=video_id,languages=["en"]) # this returns a list of dictionaries
# each dictionary contains info about the transcript, when it appears, and for how long it stays
for sentence in transcript_list:
    print(sentence.text)

I want to show you this simple simulation that I put together that has a mass on a
spring, but it's being influenced by an external force that oscillates back and forth.
Now, if there was no external force and you pull out this mass and you just let it go,
the spring has some kind of natural frequency that it wants to oscillate at.
But here, when I'm adding that external force, like a wind blowing back and forth,
it oscillates at a distinct, unrelated frequency.
What I want you to notice is how in the beginning,
you get this very irregular looking behavior.
It gets kind of stronger, and then weaker, and then stronger again,
before eventually it settles into a rhythm.
What specifically is going on there?
How could you mathematically analyze what exactly this weird,
wibbly startup trajectory is, and could you predict how long it takes before the
system hits its stride?
And when it does hit that stride, could you predict
exactly how big the swings back and forth are?
One of the most power

In [9]:
# now lets take the transcript (text only) into a list
try:
    transcript_list=YouTubeTranscriptApi.fetch(YouTubeTranscriptApi(),video_id=video_id,languages=["en"])
    
    # flatten the list to obtain only the text, into a list
    transcript = " ".join([chunk.text for chunk in transcript_list])
    print(transcript)
except TranscriptsDisabled:
    print("No captions available for this video.")

I want to show you this simple simulation that I put together that has a mass on a spring, but it's being influenced by an external force that oscillates back and forth. Now, if there was no external force and you pull out this mass and you just let it go, the spring has some kind of natural frequency that it wants to oscillate at. But here, when I'm adding that external force, like a wind blowing back and forth, it oscillates at a distinct, unrelated frequency. What I want you to notice is how in the beginning, you get this very irregular looking behavior. It gets kind of stronger, and then weaker, and then stronger again, before eventually it settles into a rhythm. What specifically is going on there? How could you mathematically analyze what exactly this weird, wibbly startup trajectory is, and could you predict how long it takes before the system hits its stride? And when it does hit that stride, could you predict exactly how big the swings back and forth are? One of the most power

Part 2 : Text Splitting, i.e split the transcript into smaller chunks, so that the context of each chunk can be captured more closely

In [18]:
splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
)

chunks = splitter.create_documents([transcript])
len(chunks)

31

In [19]:
# visualize the chunks
for i,doc in enumerate(chunks):
    if i==5:
        break
    print(f"Doc {i+1} :")
    print(doc.page_content)

Doc 1 :
I want to show you this simple simulation that I put together that has a mass on a spring, but it's being influenced by an external force that oscillates back and forth. Now, if there was no external force and you pull out this mass and you just let it go, the spring has some kind of natural frequency that it wants to oscillate at. But here, when I'm adding that external force, like a wind blowing back and forth, it oscillates at a distinct, unrelated frequency. What I want you to notice is how in the beginning, you get this very irregular looking behavior. It gets kind of stronger, and then weaker, and then stronger again, before eventually it settles into a rhythm. What specifically is going on there? How could you mathematically analyze what exactly this weird, wibbly startup trajectory is, and could you predict how long it takes before the system hits its stride? And when it does hit that stride, could you predict exactly how big the swings back and forth are? One of the mo

Part 3 : Convert each chunk into vector, and store the embeddings into vector store, along with original text and metadata

In [21]:
# create the embedding model
embedding_model = OllamaEmbeddings(model="all-minilm")

# create the vector store
faiss_vector_store = FAISS.from_documents(
    documents=chunks,
    embedding=embedding_model
)

In [26]:
faiss_vector_store.index_to_docstore_id

{0: '50574a01-020a-4d23-a0b6-b9852d991149',
 1: '95da54fc-a83d-47ee-a409-7ff44c650965',
 2: '27f0e751-3627-4e4b-8b08-949d6b503af8',
 3: '304acbcd-7155-4668-b1fd-89e05ba3d69f',
 4: '426cac92-3919-4858-8eea-eff694a2d81e',
 5: '89f44cff-3591-4d3d-876c-322d4c96c3e5',
 6: 'f56aa21e-b529-4113-b522-fef4d21c9c60',
 7: '3d241796-1976-45d3-9acf-ffb111eeee2f',
 8: '33ca40a0-7282-4299-90c1-18bc32974f69',
 9: '10151786-3673-43ed-8e64-98e3d3c877d2',
 10: '9f268a7e-256b-4344-92b3-34e8e22f31a1',
 11: 'cd7dc967-cc1d-4342-bcbe-18031671972c',
 12: '0a9fd2c8-19ce-400a-9e8e-0e1c879dc936',
 13: 'cfe91f9b-7c9e-45e3-bc1b-790454df2e4c',
 14: '9908d1cf-9973-409b-a9e0-6cd9116abc6a',
 15: '81dc2fa3-99b4-401c-b0e0-9c3406462f65',
 16: '7da0e1b8-d3e7-4ded-bc07-75126ccbf104',
 17: '9bc4b9ed-2325-4efe-9a81-248fddb8ce52',
 18: '36387e61-5d04-4346-a3a9-a0d0ae3d364c',
 19: '8271c4a3-9e65-4b75-9f70-b71d89c14771',
 20: '59976c89-9325-44c6-95ea-ce5c46506419',
 21: 'fc69cc05-559b-4a72-b3d5-7ebf3e287a9b',
 22: '1a6e498e-9ce9-

In [27]:
faiss_vector_store.get_by_ids(["50574a01-020a-4d23-a0b6-b9852d991149"])

[Document(id='50574a01-020a-4d23-a0b6-b9852d991149', metadata={}, page_content="I want to show you this simple simulation that I put together that has a mass on a spring, but it's being influenced by an external force that oscillates back and forth. Now, if there was no external force and you pull out this mass and you just let it go, the spring has some kind of natural frequency that it wants to oscillate at. But here, when I'm adding that external force, like a wind blowing back and forth, it oscillates at a distinct, unrelated frequency. What I want you to notice is how in the beginning, you get this very irregular looking behavior. It gets kind of stronger, and then weaker, and then stronger again, before eventually it settles into a rhythm. What specifically is going on there? How could you mathematically analyze what exactly this weird, wibbly startup trajectory is, and could you predict how long it takes before the system hits its stride? And when it does hit that stride, could 

### Step 2: Retriever

In [29]:
# create a retriever
retriever = faiss_vector_store.as_retriever(search_type='similarity',search_kwargs={"k":5})

# test with a query
query = "What is Laplace transform?"

responses = retriever.invoke(query)

for i,res in enumerate(responses):
    print(f"Result {i+1}:")
    print(res.page_content)

Result 1:
one that explains how exactly Laplace transforms can convert differential equations into algebra, hence making them easier to solve. Here's how it looks. If you take the derivative of some function, little f of t, with respect to time, and then you take a Laplace transform of that derivative, the effect is the same as if you had first applied the transform to the original function, and then multiplied that result by s, at least almost. There's also this additional term where you subtract off the initial condition, subtracting the value of your original function, little f, at the time t equals zero. So in other words, the transform turns differentiation in the time domain into multiplication over in the s domain. Now, this should feel very reminiscent of the fact that, for exponential functions, differentiation in time is the same as multiplication by s, and it's no coincidence that ultimately the underlying reason is the same. Now, at first glance, when you look at this rule,

### Step 4: Augmentation

Get the relevant chunks (context) and the user query, and now create a prompt, that will augment the context and the query into the prompt, and generate an relevant prompt that will be sent to the LLM

In [30]:
# create the prompt template
prompt = PromptTemplate(
    template="""
    You are a helpful assisstant.
    Answer the question ONLY from the context provided.
    If the context is insufficient to provide the answer, just say you don't know.
    
    Context: \n {context}
    Question : {question}
    """,
    input_variables=["context","question"]
)

In [36]:
# get a question and context text
question = "What does the speaker say about Laplace transform?"

retrieved_docs = retriever.invoke(question)

# join every document, separated by a paragraph
context_text = "\n\n".join([doc.page_content for doc in retrieved_docs])

In [37]:
# create the final prompt 
final_prompt = prompt.invoke({"context":context_text,"question":question})

### Step 4 : Generation of the final answer

In [34]:
# create the llm instance
llm = ChatGroq(
    model="llama-3.3-70b-versatile"
)

In [38]:
# send the final prompt to the llm and obtain the llm response
llm_response = llm.invoke(final_prompt)
print(llm_response.content)

The speaker explains that the Laplace transform can convert differential equations into algebra, making them easier to solve. It turns differentiation in the time domain into multiplication in the s domain, with an additional term that subtracts the initial condition. The speaker also mentions that the Laplace transform is related to Fourier transforms and that it can break down functions into exponential pieces, and that it takes a function of time and translates it into a new language, turning it into a new function whose input is a complex number s.


# Now we will build a chain (pipeline)

In [39]:
from langchain_core.runnables import RunnableParallel,RunnablePassthrough,RunnableLambda
from langchain_core.output_parsers import StrOutputParser

In [40]:
# create the parallel chain first
# question goes input to the retriever and PassthroughChain
# retriver will search, and generate similar documents for the query
# we need to extract the text from those documents

def get_text_from_docs(retrieved_docs):
    context_text = "\n\n".join([doc.page_content for doc in retrieved_docs])
    return context_text

In [41]:
# now create the parallel chain
parallel_chain = RunnableParallel({
 "question":RunnablePassthrough(),
 "context": retriever | RunnableLambda(get_text_from_docs)   
})

parallel_chain.invoke("What is frequency domain ?")

{'question': 'What is frequency domain ?',
 'context': "that you solve for, the exercise I want to leave you with as homework is to think deeply about how the amplitude of this final expression depends on the difference between the resonant frequency of the spring and the frequency of that external force. In particular, what happens as both of those frequencies get closer together? And how might this be relevant to anyone wishing to build a bridge that they don't want to wobble into ruin? Stepping out from the trees to look over the forest, you see what I mean about how Laplace transforms can turn a differential equation into algebra, and how it's all rooted in this third key property where a derivative in time turns into multiplication by s. So naturally, the burning question is, why is this property true in the first place? And like I said, I can think of three different ways to explain it. One that's elementary but limited, one that's general but a bit opaque, and then there's my fa

In [42]:
# for the second chain
# question and context goes into the prompt
# the prompt goes into llm
# llm output into parser
# and display final output

parser= StrOutputParser()

chain_2 = prompt | llm | parser

# final chain
final_chain = parallel_chain | chain_2

In [46]:
final_chain.invoke(input="Did the author explain the properties of Laplace transform?")

'Yes, the author explained some properties of the Laplace transform, specifically how it converts differentiation in the time domain into multiplication in the s domain, and the effect of taking the Laplace transform of a derivative.'