# Retriever And Chain With Langchain

In [5]:
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("attention.pdf")
docs = loader.load()

In [8]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
text_splitter.split_documents(docs)[:5]
documents=text_splitter.split_documents(docs)

In [10]:
from langchain_community.embeddings import OpenAIEmbeddings
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import FAISS

db_faiss = FAISS.from_documents(documents[:30], OpenAIEmbeddings())


In [11]:
query = "An attention function can be described as mapping a query "
result = db_faiss.similarity_search(query)
result[0].page_content

'3.2 Attention\nAn attention function can be described as mapping a query and a set of key-value pairs to an output,\nwhere the query, keys, values, and output are all vectors. The output is computed as a weighted sum\n3'

In [13]:
from langchain_community.llms import Ollama
## Load Ollama LAMA2 LLM model
llm = Ollama(model="llama2")

In [15]:
## Design ChatPrompt Template
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_template("""
Answer the following question based only on the provided context. 
Think step by step before providing a detailed answer. 
I will tip you $1000 if the user finds the answer helpful. 
<context>
{context}
</context>
Question: {input}""")

In [16]:
## Chain Introduction
## Create Stuff Docment Chain

from langchain.chains.combine_documents import create_stuff_documents_chain

document_chain=create_stuff_documents_chain(llm, prompt)

In [18]:
"""
Retrievers: A retriever is an interface that returns documents given
 an unstructured query. It is more general than a vector store.
 A retriever does not need to be able to store documents, only to 
 return (or retrieve) them. Vector stores can be used as the backbone
 of a retriever, but there are other types of retrievers as well. 
 https://python.langchain.com/docs/modules/data_connection/retrievers/   
"""

retriever = db_faiss.as_retriever()
retriever

VectorStoreRetriever(tags=['FAISS', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x77cd950a4a60>)

In [19]:
from langchain.chains import create_retrieval_chain

retrieval_chain = create_retrieval_chain(retriever, document_chain)

In [20]:
response = retrieval_chain.invoke({"input":"Scaled Dot-Product Attention"})

In [22]:
response

{'input': 'Scaled Dot-Product Attention',
 'context': [Document(metadata={'source': 'attention.pdf', 'page': 3}, page_content='Scaled Dot-Product Attention\n Multi-Head Attention\nFigure 2: (left) Scaled Dot-Product Attention. (right) Multi-Head Attention consists of several\nattention layers running in parallel.\nof the values, where the weight assigned to each value is computed by a compatibility function of the\nquery with the corresponding key.\n3.2.1 Scaled Dot-Product Attention\nWe call our particular attention "Scaled Dot-Product Attention" (Figure 2). The input consists of\nqueries and keys of dimension dk, and values of dimension dv. We compute the dot products of the\nquery with all keys, divide each by√dk, and apply a softmax function to obtain the weights on the\nvalues.\nIn practice, we compute the attention function on a set of queries simultaneously, packed together\ninto a matrix Q. The keys and values are also packed together into matrices KandV. We compute\nthe matrix

In [21]:
response['answer']

'Scaled dot-product attention is a type of attention mechanism used in the Transformer architecture. It is called "scaled dot-product attention" because it uses the dot product of the query and key vectors, but scales the result by 1√dk before applying the softmax function to obtain the weights on the value vectors.\n\nThe formula for scaled dot-product attention is:\nAttention(Q, K, V) = softmax(QKT√dk)V (1)\n\nIn this formula, Q, K, and V are matrices of query, key, and value vectors, respectively. T is a matrix of dot products between the query and key vectors. The softmax function is applied to the dot product matrix to obtain the weights on the value vectors. The scaling factor of 1√dk is added to the dot products before applying the softmax function to prevent the gradients of the softmax function from becoming too small as the magnitude of the dot products grows.\n\nMulti-head attention is a variation of scaled dot-product attention that allows the model to jointly attend to inf