## Installing Libraries

In [2]:
!pip install -q youtube-transcript-api langchain-community langchain-groq faiss-cpu tiktoken python-dotenv huggingface_hub langchain-huggingface

## Importing Libraries

In [4]:
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_groq import ChatGroq
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import ChatMessagePromptTemplate, MessagesPlaceholder, ChatPromptTemplate
from langchain_huggingface import HuggingFaceEndpointEmbeddings

## Data Loading

In [6]:
video_id = 'eMlx5fFNoYc'

try:
    transcript_list = YouTubeTranscriptApi.get_transcript(video_id, languages=['en'])
    transcript = " ".join(chunk["text"] for chunk in transcript_list)
    print(transcript)

except TranscriptsDisabled:
    print("NO CAPTION AVAILIBLE")

In the last chapter, you and I started to step through the internal workings of a transformer. This is one of the key pieces of technology inside large language models, and a lot of other tools in the modern wave of AI. It first hit the scene in a now-famous 2017 paper called Attention is All You Need, and in this chapter you and I will dig into what this attention mechanism is, visualizing how it processes data. As a quick recap, here's the important context I want you to have in mind. The goal of the model that you and I are studying is to take in a piece of text and predict what word comes next. The input text is broken up into little pieces that we call tokens, and these are very often words or pieces of words, but just to make the examples in this video easier for you and me to think about, let's simplify by pretending that tokens are always just words. The first step in a transformer is to associate each token with a high-dimensional vector, what we call its embedding. The most i

## Text Spilitting

In [8]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.create_documents([transcript])

In [9]:
len(chunks)

35

## Embedding & Vector Store

In [11]:
pip install google-genai




In [12]:
from google import genai

In [23]:
GEMINI_API_KEY="AIzaSyDakBPypQFQWUXb7fp8Fdbxjvi80PRTSoM"

In [27]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

In [35]:


# Set your API key directly
embedding = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001",
    google_api_key="AIzaSyDakBPypQFQWUXb7fp8Fdbxjvi80PRTSoM"
)

# Build FAISS vector store
vector_store = FAISS.from_documents(chunks, embedding)


In [37]:
vector_store.index_to_docstore_id

{0: '86d6695a-b58b-44bf-8af9-707e535988ad',
 1: '41e2dfb6-6a13-4a25-9244-76bd278599bf',
 2: 'd280a5a8-4f5a-4981-8c15-47ee5b5aa52c',
 3: '309e015d-6177-4874-b355-36356970410d',
 4: '8b26cf7b-0d3a-4a21-880e-1ee383e162a9',
 5: '9defbc5c-08ba-4fb1-a2a2-07cd44edc1d0',
 6: 'f10bf758-b29d-4e2a-8416-74247bd61706',
 7: '4eb41c3d-acce-4953-b9e7-1a1bec19212d',
 8: 'ca20a918-2563-44c9-b605-9ad1dc400d44',
 9: 'b835aa18-0669-4864-8258-c185ec5794f1',
 10: '4449a26e-63a4-4521-8016-49e3e0f012d5',
 11: '2e0b73eb-9bc3-4433-9557-277d5f13b880',
 12: '467c5440-fbef-4941-8c89-8b80c5a05119',
 13: '3adcd761-6721-417b-8e5b-dcfbb0c2a693',
 14: 'd233a5c3-21c3-4cff-96f2-9402f90f6358',
 15: '6492d9c1-2a61-4823-ae2e-f645b99ff55a',
 16: '680fcce2-0841-4b35-84df-f069cfaa7609',
 17: '0d1aa66f-4b81-457a-9ced-cc3abb7eca33',
 18: '993ca2fc-5192-4994-a91b-05dc7c05c797',
 19: '7c66e43d-fd9c-4a67-841e-d7101c6e072a',
 20: 'b6f3d1c6-93f4-4307-807e-2d841aeb9ed7',
 21: 'ce8e710c-5ce6-4c2b-b8cb-12c989b0b654',
 22: '6d66dc8a-2c31-

## Retrieval

In [39]:
retriever = vector_store.as_retriever(search_type='similarity', search_kwargs={"k": 4})

In [41]:
retriever

VectorStoreRetriever(tags=['FAISS', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x000001781C2A2480>, search_kwargs={'k': 4})

In [43]:
retriever.invoke('what is transformer?')

[Document(id='41e2dfb6-6a13-4a25-9244-76bd278599bf', metadata={}, page_content="about, let's simplify by pretending that tokens are always just words. The first step in a transformer is to associate each token with a high-dimensional vector, what we call its embedding. The most important idea I want you to have in mind is how directions in this high-dimensional space of all possible embeddings can correspond with semantic meaning. In the last chapter we saw an example for how direction can correspond to gender, in the sense that adding a certain step in this space can take you from the embedding of a masculine noun to the embedding of the corresponding feminine noun. That's just one example you could imagine how many other directions in this high-dimensional space could correspond to numerous other aspects of a word's meaning. The aim of a transformer is to progressively adjust these embeddings so that they don't merely encode an individual word, but instead they bake in some much, muc

## Augmentation

In [46]:
import os

with open("groqapi.txt", "r") as f:
    GROQ_API_KEY = f.read().strip()

os.environ["GROQ_API_KEY"] = GROQ_API_KEY


In [48]:
llm = ChatGroq(
    api_key=GROQ_API_KEY,
    model="llama-3.3-70b-versatile",
    temperature=0.2
)

In [49]:
print(GROQ_API_KEY)

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


In [50]:
prompt_template = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Answer ONLY from the provided transcript context. "
               "If the context is insufficient, just say you don't know."),
    ("human", "Context:\n{context}\n\nQuestion: {question}")
])

In [54]:
question = "Is transformer discussed in this video? If yes then what was discussed?"
retrieved_docs = retriever.invoke(question)

In [55]:
retrieved_docs

[Document(id='41e2dfb6-6a13-4a25-9244-76bd278599bf', metadata={}, page_content="about, let's simplify by pretending that tokens are always just words. The first step in a transformer is to associate each token with a high-dimensional vector, what we call its embedding. The most important idea I want you to have in mind is how directions in this high-dimensional space of all possible embeddings can correspond with semantic meaning. In the last chapter we saw an example for how direction can correspond to gender, in the sense that adding a certain step in this space can take you from the embedding of a masculine noun to the embedding of the corresponding feminine noun. That's just one example you could imagine how many other directions in this high-dimensional space could correspond to numerous other aspects of a word's meaning. The aim of a transformer is to progressively adjust these embeddings so that they don't merely encode an individual word, but instead they bake in some much, muc

In [58]:
context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
context_text

"about, let's simplify by pretending that tokens are always just words. The first step in a transformer is to associate each token with a high-dimensional vector, what we call its embedding. The most important idea I want you to have in mind is how directions in this high-dimensional space of all possible embeddings can correspond with semantic meaning. In the last chapter we saw an example for how direction can correspond to gender, in the sense that adding a certain step in this space can take you from the embedding of a masculine noun to the embedding of the corresponding feminine noun. That's just one example you could imagine how many other directions in this high-dimensional space could correspond to numerous other aspects of a word's meaning. The aim of a transformer is to progressively adjust these embeddings so that they don't merely encode an individual word, but instead they bake in some much, much richer contextual meaning. I should say up front that a lot of people find\n\

In [60]:
final_prompt = prompt_template.invoke({'context': context_text, 'question': question})

In [62]:
final_prompt

ChatPromptValue(messages=[SystemMessage(content="You are a helpful assistant. Answer ONLY from the provided transcript context. If the context is insufficient, just say you don't know.", additional_kwargs={}, response_metadata={}), HumanMessage(content="Context:\nabout, let's simplify by pretending that tokens are always just words. The first step in a transformer is to associate each token with a high-dimensional vector, what we call its embedding. The most important idea I want you to have in mind is how directions in this high-dimensional space of all possible embeddings can correspond with semantic meaning. In the last chapter we saw an example for how direction can correspond to gender, in the sense that adding a certain step in this space can take you from the embedding of a masculine noun to the embedding of the corresponding feminine noun. That's just one example you could imagine how many other directions in this high-dimensional space could correspond to numerous other aspect

## Generation

In [65]:
answer = llm.invoke(final_prompt)
print(answer.content)

Yes, the transformer is discussed in this video. The discussion includes:

1. The first step in a transformer, which is to associate each token (in this case, words) with a high-dimensional vector, called its embedding.
2. How directions in this high-dimensional space can correspond to semantic meaning, such as gender or other aspects of a word's meaning.
3. The aim of a transformer is to progressively adjust these embeddings to encode richer contextual meaning.
4. The attention mechanism, a key piece of technology inside transformers, which enables the model to understand the context of a word and its different meanings in different phrases.
5. The process of how data flows through a transformer, including multiple attention blocks and other operations called multi-layer perceptrons.

The video aims to explain the internal workings of a transformer, specifically the attention mechanism, and how it processes data to predict the next word in a piece of text.


## Chain

In [67]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser

In [68]:
def format_doc(retrieved_docs):
    context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
    return context_text

In [70]:
parallel_chain = RunnableParallel({
    'context': retriever | RunnableLambda(format_doc),
    'question': RunnablePassthrough()
})

In [74]:
parallel_chain.invoke("what is transformer?")

{'context': "about, let's simplify by pretending that tokens are always just words. The first step in a transformer is to associate each token with a high-dimensional vector, what we call its embedding. The most important idea I want you to have in mind is how directions in this high-dimensional space of all possible embeddings can correspond with semantic meaning. In the last chapter we saw an example for how direction can correspond to gender, in the sense that adding a certain step in this space can take you from the embedding of a masculine noun to the embedding of the corresponding feminine noun. That's just one example you could imagine how many other directions in this high-dimensional space could correspond to numerous other aspects of a word's meaning. The aim of a transformer is to progressively adjust these embeddings so that they don't merely encode an individual word, but instead they bake in some much, much richer contextual meaning. I should say up front that a lot of pe

In [75]:
parser = StrOutputParser()

In [78]:
main_chain = parallel_chain | prompt_template | llm | parser

In [80]:
main_chain.invoke('Can you summarize the video')

'The video discusses the internal workings of a transformer, a key piece of technology in large language models. It explains how the model takes in a piece of text, breaks it up into tokens (words or pieces of words), and associates each token with a high-dimensional vector (embedding). The video then delves into the attention mechanism, specifically the multi-headed attention block, and how it processes data to predict the next word in a sequence. It uses a simple example phrase to illustrate how the attention mechanism works, focusing on how adjectives adjust the meanings of their corresponding nouns. The video also touches on the technical nuances of implementing attention heads and the output matrix.'

## Run the App

In [91]:
!streamlit run app.py

^C
