In [3]:
!pip install pdfplumber langchain langchain_community langchain_core langchain_experimental faiss_cpu langchain_experimental



In [4]:
from langchain_community.document_loaders import PDFPlumberLoader
from langchain_experimental.text_splitter import SemanticChunker
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate
from langchain.chains.llm import LLMChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains import RetrievalQA

### **Data Loading and Splitting**

In [5]:
loader = PDFPlumberLoader("attention.pdf")
docs = loader.load()

In [8]:
text_splitter = SemanticChunker(HuggingFaceEmbeddings())
documents = text_splitter.split_documents(docs)

  text_splitter = SemanticChunker(HuggingFaceEmbeddings())


### **Embedding and Vector Database**

In [9]:
embedder = HuggingFaceEmbeddings()

  embedder = HuggingFaceEmbeddings()


In [10]:
db = FAISS.from_documents(documents, embedder)
retriever = db.as_retriever(search_type = "similarity", search_kwargs = {"k":3})

In [11]:
!pip install langchain_groq

Collecting langchain_groq
  Downloading langchain_groq-0.2.4-py3-none-any.whl.metadata (3.0 kB)
Collecting groq<1,>=0.4.1 (from langchain_groq)
  Downloading groq-0.18.0-py3-none-any.whl.metadata (14 kB)
Collecting langchain-core<0.4.0,>=0.3.33 (from langchain_groq)
  Downloading langchain_core-0.3.33-py3-none-any.whl.metadata (6.3 kB)
Downloading langchain_groq-0.2.4-py3-none-any.whl (14 kB)
Downloading groq-0.18.0-py3-none-any.whl (121 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m121.9/121.9 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading langchain_core-0.3.33-py3-none-any.whl (412 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m412.7/412.7 kB[0m [31m11.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: groq, langchain-core, langchain_groq
  Attempting uninstall: langchain-core
    Found existing installation: langchain-core 0.3.32
    Uninstalling langchain-core-0.3.32:
      Successfully uninstalled langchai

In [21]:
from google.colab import userdata

In [24]:
from langchain_groq import ChatGroq

llm = ChatGroq(
    groq_api_key=userdata.get('groq'),
    temperature = 0,
    model_name = "deepseek-r1-distill-llama-70b"
)

In [26]:
print(llm.invoke("What is AI Agents").content)

<think>

</think>

AI agents are software programs that use artificial intelligence to perform tasks that typically require human intelligence. These agents are designed to operate autonomously, making decisions and taking actions based on their programming, the data they are trained on, and the inputs they receive from their environment. AI agents can be simple or complex, depending on their intended application.

### Key Characteristics of AI Agents:
1. **Autonomy**: They operate independently without constant human intervention.
2. **Reactivity**: They respond to changes in their environment or inputs.
3. **Proactivity**: They take initiative to achieve specific goals.
4. **Social Ability**: They can interact with other agents or humans to accomplish tasks.
5. **Learning**: Some agents can improve their performance over time through machine learning.

### Types of AI Agents:
1. **Simple Reflex Agents**: React to the current state of the environment without considering past events or

### **Prompt**

In [30]:
prompt = """
1. Use the following piece of context to answer the question at the end.
2. If you don't know the answer, just say that "I dont know" but don't make answer on your own.\n
3. Keep the answer crisp and limited to 3, 4 sentences

Context: {context}
Question: {question}
Helpful Answer: """

QA_CHAIN_PROMPT= PromptTemplate.from_template(prompt)

In [33]:
llm_chain = LLMChain(
    llm = llm,
    prompt = QA_CHAIN_PROMPT,
    callbacks = None,
    verbose = True
)

In [32]:
document_prompt = PromptTemplate(
    input_variables=["page_content", "source"],
    template = "Context: \ncontent: {page_content}\nsource: {source}"
)

In [34]:
combine_documents_chain = StuffDocumentsChain(
    llm_chain = llm_chain,
    document_variable_name="context",
    document_prompt = document_prompt,
    callbacks = None
)

  combine_documents_chain = StuffDocumentsChain(


In [36]:
qa = RetrievalQA(combine_documents_chain = combine_documents_chain,
            verbose = True,
            retriever = retriever,
            return_source_documents = True)

In [37]:
user_input = "What is Scaled dot product ?"
response = qa(user_input)["result"]

  response = qa(user_input)["result"]




[1m> Entering new RetrievalQA chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
1. Use the following piece of context to answer the question at the end. 
2. If you don't know the answer, just say that "I dont know" but don't make answer on your own.

3. Keep the answer crisp and limited to 3, 4 sentences

Context: Context: 
content: ScaledDot-ProductAttention Multi-HeadAttention
Figure 2: (left) Scaled Dot-Product Attention. (right) Multi-Head Attention consists of several
attentionlayersrunninginparallel. ofthevalues,wheretheweightassignedtoeachvalueiscomputedbyacompatibilityfunctionofthe
querywiththecorrespondingkey. 3.2.1 ScaledDot-ProductAttention
Wecallourparticularattention"ScaledDot-ProductAttention"(Figure2). Theinputconsistsof
queriesandkeysofdimensiond k,a√ndvaluesofdimensiond v. Wecomputethedotproductsofthe
querywithallkeys,divideeachby d ,andapplyasoftmaxfunctiontoobtaintheweightsonthe
k
values. Inpractice,wecomputetheattentio

In [38]:
print(response)

<think>
Okay, so I need to figure out what "Scaled dot product" is based on the provided context. Let me start by reading through the context carefully.

First, I see that the context mentions "ScaledDot-ProductAttention" and "Multi-HeadAttention." It explains that in the Scaled Dot-Product Attention, the input consists of queries, keys, and values with specific dimensions. The process involves computing dot products of the query with all keys, then dividing each by the square root of the dimension (d_k). After that, a softmax function is applied to get the weights, which are then used to compute the output by multiplying with the values matrix.

The context also compares this with additive attention, noting that dot-product attention is faster and more space-efficient because it uses matrix multiplications. However, without scaling, the dot products can become too large, causing issues with the softmax function's gradients. Scaling by the square root of d_k helps mitigate this problem