In [1]:
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("attention.pdf")
docs = loader.load()

In [2]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 200)
documents = text_splitter.split_documents(docs)
documents[:5]

[Document(metadata={'source': 'attention.pdf', 'page': 0}, page_content='Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.comNoam Shazeer∗\nGoogle Brain\nnoam@google.comNiki Parmar∗\nGoogle Research\nnikip@google.comJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.comAidan N. Gomez∗ †\nUniversity of Toronto\naidan@cs.toronto.eduŁukasz Kaiser∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗ ‡\nillia.polosukhin@gmail.com\nAbstract\nThe dominant sequence transduction models are based on complex recurrent or\nconvolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network architecture, the Transformer,\n

In [3]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import FAISS

llm = OllamaEmbeddings(model="tinyllama")
db = FAISS.from_documents(documents[:10], llm)

In [4]:
query = "An attention function can be described as mapping a query"
result = db.similarity_search(query)
result[0].page_content

'textual entailment and learning task-independent sentence representations [4, 27, 28, 22].\nEnd-to-end memory networks are based on a recurrent attention mechanism instead of sequence-\naligned recurrence and have been shown to perform well on simple-language question answering and\nlanguage modeling tasks [34].\nTo the best of our knowledge, however, the Transformer is the first transduction model relying\nentirely on self-attention to compute representations of its input and output without using sequence-\naligned RNNs or convolution. In the following sections, we will describe the Transformer, motivate\nself-attention and discuss its advantages over models such as [17, 18] and [9].\n3 Model Architecture\nMost competitive neural sequence transduction models have an encoder-decoder structure [ 5,2,35].\nHere, the encoder maps an input sequence of symbol representations (x1, ..., x n)to a sequence\nof continuous representations z= (z1, ..., z n). Given z, the decoder then generates an

In [5]:
from langchain_community.llms import Ollama

llm = Ollama(model="tinyllama")

In [6]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template(
    """
        Answer the following question based only on the provided context. 
        Think step by step before providing a detailed answer. 
        The user should find the answer helpful.
        
        <context>
            {context}
        </context>
        
        Question: {input}
    """
)


In [7]:
#Chain Introduction
#Create Stuff Document Chain

from langchain.chains.combine_documents import create_stuff_documents_chain

document_chain = create_stuff_documents_chain(llm, prompt)

In [8]:
retriever = db.as_retriever()
retriever

VectorStoreRetriever(tags=['FAISS', 'OllamaEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x0000024F208ADB70>, search_kwargs={})

In [9]:
from langchain.chains import create_retrieval_chain
retrievel_chain = create_retrieval_chain(retriever, document_chain)

In [10]:
response = retrievel_chain.invoke({"input":"Scaled Dot-Product Attention"})

In [11]:
response['answer']

"The Transformer architecture proposed in the context article relies solely on attention mechanisms to generate output sequences. This means that the network is not recurrent and does not require continuous input from previous steps, making it more efficient for inference and visualizations. The design of the Transformer also improves results in a range of machine translation tasks, including those involving English-to-English and English-to-French translation, as demonstrated by experiments on the WMT 2014 task. In addition to its effectiveness at these tasks, the Transformer also achieves state-of-the-art results in a range of other machine translation tasks, including those involving German-to-English translation and constituency parsing with limited training data. The model's use of attention mechanisms allows it to draw global dependencies between input and output sequences, as demonstrated by the fact that it achieves improvements in computational efficiency compared to previous 