In [3]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("attention.pdf")
docs = loader.load()
docs[2]

Document(metadata={'source': 'attention.pdf', 'page': 2}, page_content='Figure 1: The Transformer - model architecture.\nThe Transformer follows this overall architecture using stacked self-attention and point-wise, fully\nconnected layers for both the encoder and decoder, shown in the left and right halves of Figure 1,\nrespectively.\n3.1 Encoder and Decoder Stacks\nEncoder: The encoder is composed of a stack of N = 6 identical layers. Each layer has two\nsub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, position-\nwise fully connected feed-forward network. We employ a residual connection [11] around each of\nthe two sub-layers, followed by layer normalization [ 1]. That is, the output of each sub-layer is\nLayerNorm(x + Sublayer(x)), where Sublayer(x) is the function implemented by the sub-layer\nitself. To facilitate these residual connections, all sub-layers in the model, as well as the embedding\nlayers, produce outputs of dimension dmodel

In [5]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=20)
text_splitter.split_documents(docs)[:5]


[Document(metadata={'source': 'attention.pdf', 'page': 0}, page_content='Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗ †\nUniversity of Toronto\naidan@cs.toronto.edu\nŁukasz Kaiser∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗ ‡\nillia.polosukhin@gmail.com\nAbstract\nThe dominant sequence transduction models are based on complex recurrent or\nconvolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network architecture, the Tran

In [8]:
documents = text_splitter.split_documents(docs)

In [10]:
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings

db = FAISS.from_documents(documents[:20],OllamaEmbeddings(model="llama3.2"))

In [13]:
from langchain_community.llms import Ollama

### load Ollama Llama2 llm
llm = Ollama(model="llama3.2")
llm


Ollama(model='llama3.2')

### prompt template

In [15]:
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_template(""" Answer the following question based only on the provided context. 
Think step by step before providing a detailed answer. 
<context> {context} </context> Question: {input}""")

### stuff document chain

In [23]:
from langchain.chains.combine_documents import  create_stuff_documents_chain
document_chain = create_stuff_documents_chain(llm,prompt)

In [17]:
"""
Retrievers: A retriever is an interface that returns documents given
 an unstructured query. It is more general than a vector store.
 A retriever does not need to be able to store documents, only to 
 return (or retrieve) them. Vector stores can be used as the backbone
 of a retriever, but there are other types of retrievers as well. 
 https://python.langchain.com/docs/modules/data_connection/retrievers/   
"""

'\nRetrievers: A retriever is an interface that returns documents given\n an unstructured query. It is more general than a vector store.\n A retriever does not need to be able to store documents, only to \n return (or retrieve) them. Vector stores can be used as the backbone\n of a retriever, but there are other types of retrievers as well. \n https://python.langchain.com/docs/modules/data_connection/retrievers/   \n'

In [20]:
retriever = db.as_retriever()
retriever

VectorStoreRetriever(tags=['FAISS', 'OllamaEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x0000020B656ECB00>, search_kwargs={})

In [21]:
"""
Retrieval chain:This chain takes in a user inquiry, which is then
passed to the retriever to fetch relevant documents. Those documents 
(and original inputs) are then passed to an LLM to generate a response
https://python.langchain.com/docs/modules/chains/
"""

'\nRetrieval chain:This chain takes in a user inquiry, which is then\npassed to the retriever to fetch relevant documents. Those documents \n(and original inputs) are then passed to an LLM to generate a response\nhttps://python.langchain.com/docs/modules/chains/\n'

In [24]:
from langchain.chains import create_retrieval_chain
retrieval_chain = create_retrieval_chain(retriever,document_chain)

In [25]:
response=retrieval_chain.invoke({"input":"Scaled Dot-Product Attention"})

In [29]:
response['answer']

'To answer this question based on the provided context, I will break down the information step by step:\n\n1. The question asks about Scaled Dot-Product Attention, but the context actually discusses Additive Attention and Dot-Product Attention.\n\n2. However, there is a mention of Scaled Dot-Product Attention in the text: "While for small values of dk the two mechanisms perform similarly, additive attention outperforms dot product attention without scaling for larger values of dk [3]. We suspect that for large values of dk, the dot products grow large in magnitude, pushing the softmax function into regions where it has extremely small gradients 4. To counteract this effect, we scale the dot products by 1√dk."\n\n3. Based on this context, Scaled Dot-Product Attention refers to the scaling factor applied to the dot product attention mechanism when the dimensionality of the key and value vectors (dk) is large.\n\nIn summary, according to the provided context, Scaled Dot-Product Attention 