In [None]:
import os
os.environ["GROQ_API_KEY"] = ""

## Install libraries

In [26]:
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_groq import ChatGroq
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate

## Step 1a - Indexing (Document Ingestion)

**Note:** The original video ID "lWxv6jIEtms" only has Hindi transcripts available, not English. The improved code below handles this gracefully by trying multiple video options and providing fallbacks.

In [27]:
video_id = "ZzaPdXTrSb8" # only the ID, not full URL
try:
    # If you don’t care which language, this returns the “best” one
    transcript_list = YouTubeTranscriptApi.get_transcript(video_id, languages=["en"])

    # Flatten it to plain text
    transcript = " ".join(chunk["text"] for chunk in transcript_list)
    print(transcript)

except TranscriptsDisabled:
    print("No captions available for this video.")



In [28]:
# IMPROVED VERSION: Better error handling for YouTube transcripts
from youtube_transcript_api import NoTranscriptFound

def get_youtube_transcript(video_id):
    """Get YouTube transcript with better error handling"""
    try:
        # First, let's check what transcripts are available
        transcript_list_obj = YouTubeTranscriptApi.list_transcripts(video_id)
        print("Available transcripts:")
        for transcript in transcript_list_obj:
            print(f"- {transcript.language} ({transcript.language_code})")
        
        # Try to get English transcript first
        try:
            transcript_list = YouTubeTranscriptApi.get_transcript(video_id, languages=["en"])
            print("\nUsing English transcript")
        except NoTranscriptFound:
            # If English is not available, try other English variants
            try:
                transcript_list = transcript_list_obj.find_transcript(['en-US', 'en-GB', 'en-CA']).fetch()
                print("\nUsing English variant transcript")
            except:
                # If no English, get the first available transcript
                transcript_list = list(transcript_list_obj)[0].fetch()
                print("\nUsing first available transcript")

        # Flatten it to plain text
        transcript = " ".join(chunk["text"] for chunk in transcript_list)
        return transcript
        
    except Exception as e:
        print(f"Error getting transcript: {e}")
        return None

# Try different video IDs that are more likely to have English transcripts
video_options = [
    "aircAruvnKk",  # 3Blue1Brown - Neural Networks
    "bZQun8Y4L2A",  # Another educational video
    "R9OHn5ZF4Uo",  # TED talk
    "dQw4w9WgXcQ"   # Famous video (fallback)
]

transcript = None
for video_id in video_options:
    print(f"\nTrying video ID: {video_id}")
    transcript = get_youtube_transcript(video_id)
    if transcript:
        print(f"Success! Transcript length: {len(transcript)} characters")
        print(f"First 500 characters: {transcript[:500]}...")
        break
    else:
        print("Failed, trying next video...")

# If all videos fail, use sample text
if not transcript:
    print("\nAll videos failed. Using sample text for demonstration...")
    transcript = """Artificial intelligence is transforming the world around us. 
    Machine learning algorithms can now recognize images, understand speech, 
    and even generate human-like text. Deep learning, a subset of machine learning, 
    uses neural networks with multiple layers to learn complex patterns in data. 
    These systems have achieved remarkable success in various domains including 
    computer vision, natural language processing, and game playing. 
    The future of AI holds great promise for solving complex problems 
    and improving human lives. Neural networks are inspired by the human brain 
    and consist of interconnected nodes that process information. 
    Training these networks requires large amounts of data and computational power."""


Trying video ID: aircAruvnKk
Available transcripts:
- Arabic (ar)
- Bangla (bn)
- Chinese (zh)
- Chinese (China) (zh-CN)
- Chinese (Taiwan) (zh-TW)
- Czech (cs)
- English (en)
- Filipino (fil)
- French (fr)
- German (de)
- Greek (el)
- Hebrew (iw)
- Hindi (hi)
- Hungarian (hu)
- Italian (it)
- Japanese (ja)
- Korean (ko)
- Marathi (mr)
- Persian (fa)
- Persian (Iran) (fa-IR)
- Polish (pl)
- Portuguese (pt)
- Portuguese (Brazil) (pt-BR)
- Romanian (ro)
- Russian (ru)
- Spanish (es)
- Thai (th)
- Turkish (tr)
- Ukrainian (uk)
- Urdu (ur)
- English (auto-generated) (en)

Using English transcript
Success! Transcript length: 18430 characters
First 500 characters: This is a 3. It's sloppily written and rendered at an extremely low resolution of 28x28 pixels, but your brain has no trouble recognizing it as a 3. And I want you to take a moment to appreciate how crazy it is that brains can do this so effortlessly. I mean, this, this and this are also recognizable as 3s, even though the specifi

In [29]:
transcript_list

[{'text': 'welcome to the ultimate C++ course in',
  'start': 2.36,
  'duration': 4.399},
 {'text': "this course you're going to learn",
  'start': 5.359,
  'duration': 3.401},
 {'text': 'everything you need to know about C++',
  'start': 6.759,
  'duration': 3.721},
 {'text': 'from the basics to more advanced',
  'start': 8.76,
  'duration': 3.879},
 {'text': 'concepts so by the end of this course',
  'start': 10.48,
  'duration': 4.44},
 {'text': "you'll be able to write C++ code with",
  'start': 12.639,
  'duration': 4.161},
 {'text': "confidence if you're looking for a",
  'start': 14.92,
  'duration': 4.0},
 {'text': 'comprehensive easy to follow well',
  'start': 16.8,
  'duration': 4.36},
 {'text': 'organized and practical course that',
  'start': 18.92,
  'duration': 4.199},
 {'text': 'takes you from Zero to Hero this is the',
  'start': 21.16,
  'duration': 4.359},
 {'text': "right C++ course for you you don't need",
  'start': 23.119,
  'duration': 4.521},
 {'text': 'any pri

## Step 1b - Indexing (Text Splitting)

In [30]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.create_documents([transcript])

In [31]:
len(chunks)

23

In [32]:
chunks[12]

Document(metadata={}, page_content="of the weight's value. Now if we made the weights associated with almost all of the pixels zero except for some positive weights in this region that we care about, then taking the weighted sum of all the pixel values really just amounts to adding up the values of the pixel just in the region that we care about. And if you really wanted to pick up on whether there's an edge here, what you might do is have some negative weights associated with the surrounding pixels. Then the sum is largest when those middle pixels are bright but the surrounding pixels are darker. When you compute a weighted sum like this, you might come out with any number, but for this network what we want is for activations to be some value between 0 and 1. So a common thing to do is to pump this weighted sum into some function that squishes the real number line into the range between 0 and 1. And a common function that does this is called the sigmoid function, also known as a logis

## Step 1c & 1d - Indexing (Embedding Generation and Storing in Vector Store)

In [33]:
# Using free HuggingFace embeddings instead of OpenAI
# all-MiniLM-L6-v2 is a lightweight, fast, and effective sentence transformer model
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vector_store = FAISS.from_documents(chunks, embeddings)

In [34]:
vector_store.index_to_docstore_id

{0: '4f5fdfb1-d4e4-46e1-8cce-7321ae304ad8',
 1: '524ea9c2-fc61-475d-8eb5-bd72315336f4',
 2: 'ef122828-afd6-48ae-bbf9-5f8eecc1695f',
 3: 'f2ec454e-5331-41a0-b5c4-0b18051f2be0',
 4: '829d4b80-60f8-4310-96a7-7e17c9086a4d',
 5: '4ed932d8-d7ad-43fe-9bf4-8af14167bc78',
 6: 'c1e8de46-02a0-45c3-bcd7-f46c81b6ef1a',
 7: 'e98e759d-7337-4a95-94bb-15454a45d058',
 8: 'd672cda7-cd90-40ff-8f04-1329343246e0',
 9: '16b31b69-cba9-431c-857f-64f00e04da5d',
 10: '951aa993-3093-4375-8553-f84c09e9cbcf',
 11: '02780d9f-89e1-4046-8d4c-15bc27f38361',
 12: '4c2fe338-cf43-4b0f-b58b-477b2f6c57e5',
 13: '48b76bb4-b0ce-4fb1-b4e6-849b8324f0a4',
 14: '45e4342f-ac63-4fbd-af9a-b59534a1cb19',
 15: '35d1bcf3-7816-499e-a0d5-198b9f819dbb',
 16: '275fd578-bcdf-456a-a3ba-ae744b856726',
 17: 'a0212ee8-d5ee-494b-b04e-b8d0825afdcc',
 18: 'c0d8277c-76fa-4407-8a30-aa8098a92c23',
 19: 'c32f402b-5b18-4b54-8d58-95dbdf062129',
 20: 'f593923c-4b2d-4c1f-819b-e81f22a9e018',
 21: 'a4fce968-8506-4053-be0f-5b40b99e25fe',
 22: 'd9b514c6-486c-

In [35]:
vector_store.get_by_ids(['78a0b107-c31d-4cbc-90d5-e3080ec7b5a6'])

[]

## Step 2 - Retrieval

In [36]:
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 4})

In [37]:
retriever

VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x0000012041374D70>, search_kwargs={'k': 4})

In [38]:
retriever.invoke('What is c++')

[Document(id='d9b514c6-486c-430a-9a88-49ba4ce80ea2', metadata={}, page_content="But relatively few modern networks actually use sigmoid anymore. Yeah.\nIt's kind of old school right? Yeah or rather ReLU seems to be much easier to train. And ReLU, ReLU stands for rectified linear unit? Yes it's this kind of function where you're just taking a max of zero and a where a is given by what you were explaining in the video and what this was sort of motivated from I think was a partially by a biological analogy with how neurons would either be activated or not. And so if it passes a certain threshold it would be the identity function but if it did not then it would just not be activated so it'd be zero so it's kind of a simplification. Using sigmoids didn't help training or it was very difficult to train at some point and people just tried ReLU and it happened to work very well for these incredibly deep neural networks. All right thank you Lisha."),
 Document(id='f2ec454e-5331-41a0-b5c4-0b1805

## Step 3 - Augmentation

**Note about Groq Models:** The original model `llama-3.1-70b-versatile` has been decommissioned. We're now using `llama-3.1-8b-instant` which is currently supported. Other available models include:
- `llama-3.1-8b-instant` (fast, good for most tasks)
- `llama-3.2-1b-preview` (very fast, lighter model)
- `llama-3.2-3b-preview` (balanced speed and performance)
- `mixtral-8x7b-32768` (larger context window)

You can check the latest available models at: https://console.groq.com/docs/models

In [39]:
# Initialize the Groq LLM with a currently supported model
llm = ChatGroq(model="llama-3.1-8b-instant", temperature=0.2)

# Alternative models you can try:
# llm = ChatGroq(model="llama-3.2-3b-preview", temperature=0.2)  # Faster, smaller model
# llm = ChatGroq(model="mixtral-8x7b-32768", temperature=0.2)    # Larger context window

In [40]:
prompt = PromptTemplate(
    template="""
      You are a helpful assistant.
      Answer ONLY from the provided transcript context.
      If the context is insufficient, just say you don't know.

      {context}
      Question: {question}
    """,
    input_variables = ['context', 'question']
)

In [41]:
question          = "c++ discussed in this video? if yes then what was discussed"
retrieved_docs    = retriever.invoke(question)

In [42]:
retrieved_docs

[Document(id='d9b514c6-486c-430a-9a88-49ba4ce80ea2', metadata={}, page_content="But relatively few modern networks actually use sigmoid anymore. Yeah.\nIt's kind of old school right? Yeah or rather ReLU seems to be much easier to train. And ReLU, ReLU stands for rectified linear unit? Yes it's this kind of function where you're just taking a max of zero and a where a is given by what you were explaining in the video and what this was sort of motivated from I think was a partially by a biological analogy with how neurons would either be activated or not. And so if it passes a certain threshold it would be the identity function but if it did not then it would just not be activated so it'd be zero so it's kind of a simplification. Using sigmoids didn't help training or it was very difficult to train at some point and people just tried ReLU and it happened to work very well for these incredibly deep neural networks. All right thank you Lisha."),
 Document(id='c0d8277c-76fa-4407-8a30-aa8098

In [43]:
context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
context_text

"But relatively few modern networks actually use sigmoid anymore. Yeah.\nIt's kind of old school right? Yeah or rather ReLU seems to be much easier to train. And ReLU, ReLU stands for rectified linear unit? Yes it's this kind of function where you're just taking a max of zero and a where a is given by what you were explaining in the video and what this was sort of motivated from I think was a partially by a biological analogy with how neurons would either be activated or not. And so if it passes a certain threshold it would be the identity function but if it did not then it would just not be activated so it'd be zero so it's kind of a simplification. Using sigmoids didn't help training or it was very difficult to train at some point and people just tried ReLU and it happened to work very well for these incredibly deep neural networks. All right thank you Lisha.\n\nwe represent it by organizing all those biases into a vector, and adding the entire vector to the previous matrix vector pr

In [44]:
final_prompt = prompt.invoke({"context": context_text, "question": question})

In [45]:
final_prompt

StringPromptValue(text="\n      You are a helpful assistant.\n      Answer ONLY from the provided transcript context.\n      If the context is insufficient, just say you don't know.\n\n      But relatively few modern networks actually use sigmoid anymore. Yeah.\nIt's kind of old school right? Yeah or rather ReLU seems to be much easier to train. And ReLU, ReLU stands for rectified linear unit? Yes it's this kind of function where you're just taking a max of zero and a where a is given by what you were explaining in the video and what this was sort of motivated from I think was a partially by a biological analogy with how neurons would either be activated or not. And so if it passes a certain threshold it would be the identity function but if it did not then it would just not be activated so it'd be zero so it's kind of a simplification. Using sigmoids didn't help training or it was very difficult to train at some point and people just tried ReLU and it happened to work very well for th

## Step 4 - Generation

In [46]:
answer = llm.invoke(final_prompt)
print(answer.content)

I don't know.


## Building a Chain

In [47]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser

In [48]:
def format_docs(retrieved_docs):
  context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
  return context_text

In [49]:
parallel_chain = RunnableParallel({
    'context': retriever | RunnableLambda(format_docs),
    'question': RunnablePassthrough()
})

In [50]:
parallel_chain.invoke('who is c++')

{'context': "But relatively few modern networks actually use sigmoid anymore. Yeah.\nIt's kind of old school right? Yeah or rather ReLU seems to be much easier to train. And ReLU, ReLU stands for rectified linear unit? Yes it's this kind of function where you're just taking a max of zero and a where a is given by what you were explaining in the video and what this was sort of motivated from I think was a partially by a biological analogy with how neurons would either be activated or not. And so if it passes a certain threshold it would be the identity function but if it did not then it would just not be activated so it'd be zero so it's kind of a simplification. Using sigmoids didn't help training or it was very difficult to train at some point and people just tried ReLU and it happened to work very well for these incredibly deep neural networks. All right thank you Lisha.\n\nwe represent it by organizing all those biases into a vector, and adding the entire vector to the previous matr

In [51]:
parser = StrOutputParser()

In [52]:
main_chain = parallel_chain | prompt | llm | parser

In [53]:
main_chain.invoke('Can you summarize the video')

'The video is about explaining the structure of a neural network, specifically a network that can recognize handwritten digits. It starts by explaining how the task of recognizing handwritten digits is a classic challenge in machine learning. The speaker then explains that they will break down the structure of the network into layers of abstraction and focus on how the activations in one layer determine the activations in the next.\n\nThe speaker asks the question of what parameters the network should have to be expressive enough to capture different pixel patterns. They then discuss the sigmoid function, which is used in early networks to squish the weighted sum into an interval between zero and one, motivated by the biological analogy of neurons being either inactive or active.\n\nThe video is a precursor to explaining how the network learns to recognize handwritten digits, which will be covered in the next video. The speaker also mentions that they will be jumping back into a probab