In [100]:
from dotenv import load_dotenv
load_dotenv()

True

In [101]:
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled, NoTranscriptFound
from langchain_classic.text_splitter import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings, GoogleGenerativeAI

# Step 1: Indexing

In [102]:
video_id = 'FuqNluMTIR8'

youtube_api = YouTubeTranscriptApi()
transcript_list = youtube_api.fetch(video_id=video_id,languages=['en'])

In [103]:
transcript_list[0].text
for chunk in transcript_list:
    print(chunk.text)

hey there my name is Demitri and in this video 
we're going to dive into the three simple hacks  
that you can use to get a YouTube video transcript 
and summarize it and what I often struggle with  
is the fact that there's a lot of really great 
information in YouTube videos but unfortunately it  
can be kind of difficult to wrap your head around 
it when it's all verbal and visual and sometimes  
you just want to have different components written 
out so that you can actually give advice to other  
people or remind yourself of key specific points 
from a YouTube video that were valuable and with  
AI nowadays you can really summarize it pretty 
easily if you have the text but getting the text  
is kind of hard so the first way that you can 
actually get this transcript generator is with  
a Chrome extension called YouTube summary with 
chat GPT and Claude I have it installed on my  
account and you'll see on the right hand side 
all it is is a very simple and easy to interact  
with

In [104]:
video_id = 'sDv4f4s2SB8'
try:
    # If you don't care which language, this returns the best one
    youtube_api = YouTubeTranscriptApi()
    transcript_list = youtube_api.fetch(video_id, languages=['en'])

    # Flatten it to plain text
    transcript = ' '.join(chunk.text for chunk in transcript_list)
    print(transcript)

except Exception as e:
    print('No Captions available for this video.')



Gradient Descent is decent at estimating parameters. StatQuest! Hello! I'm Josh Starmer and welcome to StatQuest. Today we're going to learn about Gradient Descent and we're going to go through the algorithm step by step. Note: this StatQuest assumes you already understand the basics of least squares and linear regression, so if you're not already down with that, check out the Quest. In statistics, machine learning, and other data science fields, we optimize a lot of stuff. When we fit a line with linear regression, we optimize the intercept and the slope. When we use logistic regression, we optimize a squiggle. And when we use t-SNE, we optimize clusters. These are just a few examples of the stuff we optimize, there are tons more. The cool thing is that Gradient Descent can optimize all these things, and much more. So, if we learn how to optimize this line using Gradient Descent, then we'll have learned the strategy that optimizes this squiggle, and these clusters, and many more of th

# Step 2: Indexing (Text Splitting)

In [105]:
from langchain_classic.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap=200)
chunks = splitter.create_documents([transcript])

In [106]:
len(chunks)

22

In [107]:
chunks

[Document(metadata={}, page_content="Gradient Descent is decent at estimating parameters. StatQuest! Hello! I'm Josh Starmer and welcome to StatQuest. Today we're going to learn about Gradient Descent and we're going to go through the algorithm step by step. Note: this StatQuest assumes you already understand the basics of least squares and linear regression, so if you're not already down with that, check out the Quest. In statistics, machine learning, and other data science fields, we optimize a lot of stuff. When we fit a line with linear regression, we optimize the intercept and the slope. When we use logistic regression, we optimize a squiggle. And when we use t-SNE, we optimize clusters. These are just a few examples of the stuff we optimize, there are tons more. The cool thing is that Gradient Descent can optimize all these things, and much more. So, if we learn how to optimize this line using Gradient Descent, then we'll have learned the strategy that optimizes this squiggle, an

# Step 3: Indexing (Embedding Generation and storing in Vector store)

In [108]:
from langchain_classic.vectorstores import FAISS
embedding = GoogleGenerativeAIEmbeddings(model='gemini-embedding-001')
vector_store = FAISS.from_documents(chunks, embedding=embedding)

In [109]:
vector_store.index_to_docstore_id

{0: '1432fbb2-2abf-402c-b9cb-e251397ae694',
 1: 'e81f3140-c38e-4494-9a85-037e685988de',
 2: '6690a87f-2c1e-43e8-b8f9-bf940e08b0d4',
 3: '78e0ad6c-64bb-459c-89cf-83415905a918',
 4: '6615cffd-797b-4f2f-bc41-53740afcdeb2',
 5: 'bc5ff951-efdf-4870-a4eb-60559d3b43a5',
 6: '7658c2a7-6262-442f-b838-4674ae0af4a1',
 7: '78b0b5ba-7e1e-41ba-8fcb-7b68e90ddc7a',
 8: '20ee93b5-b166-49b6-9ee8-51d6365b75ae',
 9: '8b01033d-f092-46f0-8282-c13b8db76ca6',
 10: '25bd94d0-16ce-4b27-b028-45eb753d1c57',
 11: '613881b6-9e7f-4fe3-821f-442598322310',
 12: '3df8cc61-6e71-4ec0-8064-a07217e8008a',
 13: '855540c4-bf8b-44c3-b572-993a9e7f6e52',
 14: '43c3fa27-39ca-418a-94c1-75ec3e8d0d60',
 15: 'ceabe6d6-8c08-4f1e-8e80-e9d3409a5a7b',
 16: '6bd335ae-0a84-41db-b841-30099ea4a4ab',
 17: 'b1c70390-6b66-4e34-9c4e-4a4c1bd01c9a',
 18: '326df5e9-2be6-40fc-81c3-d42ce3823015',
 19: '7fdf063a-f68f-49cf-9736-1b30474c8950',
 20: '854fd5ef-32b0-40a3-a25f-9acbc99fa01f',
 21: 'b153262a-e246-4f32-b306-710499470fb3'}

In [110]:
vector_store.get_by_ids(['79ca8801-2ba4-4234-a6fe-5aef87cd52ca'])

[]

# Step 4: Retriever 

In [111]:
retriever = vector_store.as_retriever(search_type='similarity', search_kwargs={'k':4})

In [135]:
retriever.invoke('is the topic of Gradient Descent is discussed in this video')

[Document(id='1432fbb2-2abf-402c-b9cb-e251397ae694', metadata={}, page_content="Gradient Descent is decent at estimating parameters. StatQuest! Hello! I'm Josh Starmer and welcome to StatQuest. Today we're going to learn about Gradient Descent and we're going to go through the algorithm step by step. Note: this StatQuest assumes you already understand the basics of least squares and linear regression, so if you're not already down with that, check out the Quest. In statistics, machine learning, and other data science fields, we optimize a lot of stuff. When we fit a line with linear regression, we optimize the intercept and the slope. When we use logistic regression, we optimize a squiggle. And when we use t-SNE, we optimize clusters. These are just a few examples of the stuff we optimize, there are tons more. The cool thing is that Gradient Descent can optimize all these things, and much more. So, if we learn how to optimize this line using Gradient Descent, then we'll have learned th

# Step 5: Augmentation

In [136]:
llm = GoogleGenerativeAI(model='gemini-2.5-flash', temperature=0.2)

In [137]:
from langchain_core.prompts import PromptTemplate
prompt = PromptTemplate(
    template=
    """
You are a helpful assistant. Use only the provided transcript context.

If the answer can be determined logically from the context, answer it.
If the context shows the topic was NOT discussed, answer "No".
If the context is truly insufficient to tell, say "I don't know."

Context:
{context}

Question: {question}
""",
input_variables=['context', 'question']
)

In [143]:
question = 'Gradient Descent algorithm explanation'
retrieved_docs = retriever.invoke(question)

In [144]:
context_text = '\n'.join(doc.page_content for doc in retrieved_docs)
print(context_text)

first sample. And this is the derivative of the first part. So we plug it in. Likewise, we replace these terms with their derivatives. Again, 2.3 and 2.9 are in bold to remind us that they are the weights of the second and third samples. Here's the derivative of the sum of the squared residuals with respect to the intercept, and here's the derivative with respect to the slope. Note: when you have two or more derivatives of the same function they are called a gradient. We will use this gradient to descend to the lowest point in the loss function, which, in this case, is the sum of the squared residuals. Thus, this is why the algorithm is called Gradient Descent. Bam! Just like before, we'll start by picking a random number for the intercept. In this case, we'll set the intercept to be equal to zero, and we'll pick a random number for the slope. In this case we'll set the slope to be 1. Thus, this line, with intercept equals 0 and slope equals 1, is where we will start. Now, let's plug
G

In [145]:
final_prompt = prompt.invoke({'context':context_text, 'question':question})

# Step 6 Generation

In [146]:
answer = llm.invoke(final_prompt)
print(answer)

The Gradient Descent algorithm is explained step by step:

1.  **Step 1:** Take the derivative of the loss function for each parameter in it (also called taking the gradient of the loss function).
2.  **Step 2:** Pick random values for the parameters. For example, the intercept might be set to zero and the slope to one.
3.  **Step 3:** Plug the parameter values into the derivatives (the gradient).
4.  **Step 4:** Calculate the step sizes.
5.  **Step 5:** Calculate the new parameters.
    Then, go back to step 3 and repeat until the step size is very small or the maximum number of steps is reached.

The algorithm uses the gradient to descend to the lowest point in the loss function, which is why it's called Gradient Descent. It can optimize various things like the intercept and slope in linear regression, a squiggle in logistic regression, or clusters in t-SNE.


# Building a Chain

In [148]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser

In [149]:
def format_docs(retrieved_docs):
    context_text = '\n\n'.join(doc.page_content for doc in retrieved_docs)
    return context_text

In [150]:
parallel_chain = RunnableParallel({
    'context':retriever | RunnableLambda(format_docs),
    'question' : RunnablePassthrough()
})

In [151]:
parallel_chain.invoke('What is the algorithm discussed')

{'context': "Gradient Descent is decent at estimating parameters. StatQuest! Hello! I'm Josh Starmer and welcome to StatQuest. Today we're going to learn about Gradient Descent and we're going to go through the algorithm step by step. Note: this StatQuest assumes you already understand the basics of least squares and linear regression, so if you're not already down with that, check out the Quest. In statistics, machine learning, and other data science fields, we optimize a lot of stuff. When we fit a line with linear regression, we optimize the intercept and the slope. When we use logistic regression, we optimize a squiggle. And when we use t-SNE, we optimize clusters. These are just a few examples of the stuff we optimize, there are tons more. The cool thing is that Gradient Descent can optimize all these things, and much more. So, if we learn how to optimize this line using Gradient Descent, then we'll have learned the strategy that optimizes this squiggle, and these clusters, and ma

In [152]:
parser = StrOutputParser()

In [153]:
main_chain = parallel_chain | prompt | llm | parser

In [155]:
print(main_chain.invoke('Can you summarize the video'))

This StatQuest video introduces Gradient Descent, an algorithm used to optimize parameters in statistics, machine learning, and data science. It assumes prior knowledge of least squares and linear regression. Gradient Descent can optimize various elements, such as the intercept and slope in linear regression, squiggles in logistic regression, and clusters in t-SNE.

The algorithm's steps involve:
1.  Deciding on a loss function, such as the sum of the squared residuals, to evaluate how well a line fits the data.
2.  Taking the derivative of this loss function.
3.  Picking random initial values for parameters (e.g., intercept and slope).
4.  Calculating the derivative at the current parameter values.
5.  Plugging that derivative into a step size calculation.
6.  Calculating new parameter values by subtracting the step size from the old values.
7.  Repeating these steps until the step size is close to zero or a maximum number of steps (typically 1000 or more) is reached.

The term "gradi