Retriever and Chain Tutorial with LangChain

In [1]:
from langchain_community.document_loaders import PyPDFLoader

In [2]:
loader = PyPDFLoader("attention.pdf")
document = loader.load()

In [3]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [5]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=64)
splitted_text = text_splitter.split_documents(documents=document)

In [10]:
from langchain.embeddings import HuggingFaceEmbeddings
from llama_index.embeddings.langchain import LangchainEmbedding
from langchain_community.vectorstores import FAISS

In [12]:
# Sentence transformers from HF
# Link: https://huggingface.co/models?library=sentence-transformers&sort=downloads
langchain_embed_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


In [14]:
# Faiss is a library for efficient similarity search and clustering of dense vectors. 
#It contains algorithms that search in sets of vectors of any size, 
#up to ones that possibly do not fit in RAM
db = FAISS.from_documents(splitted_text[:40], langchain_embed_model)

In [17]:
query = "What are the differences between the k, q and the v?"
result = db.similarity_search(query)

In [18]:
result

[Document(page_content='output values. These are concatenated and once again projected, resulting in the ﬁnal values, as\ndepicted in Figure 2.\nMulti-head attention allows the model to jointly attend to information from different representation\nsubspaces at different positions. With a single attention head, averaging inhibits this.\n4To illustrate why the dot products get large, assume that the components of qandkare independent random\nvariables with mean 0and variance 1. Then their dot product, q·k=∑dk\ni=1qiki, has mean 0and variance dk.\n4', metadata={'source': 'attention.pdf', 'page': 3}),
 Document(page_content='Scaled Dot-Product Attention\n Multi-Head Attention\nFigure 2: (left) Scaled Dot-Product Attention. (right) Multi-Head Attention consists of several\nattention layers running in parallel.\nquery with all keys, divide each by√dk, and apply a softmax function to obtain the weights on the\nvalues.\nIn practice, we compute the attention function on a set of queries simultan

In [19]:
# Load the Llama2 LLM model
from langchain_community.llms import Ollama

In [21]:
llama2 = Ollama(model="llama2")

In [23]:
from langchain_core.prompts import ChatPromptTemplate

In [24]:
prompt = ChatPromptTemplate.from_template(template="""
                                          Answer the question based only on the given context.
                                          Think step by step before providing a detailed and accurate answer.
                                          Your answer will be helpful for the user and not harm anyone.
                                          <context>
                                          {context}
                                          </context>
                                          Question: {input}
                                          """
                            )

In [25]:
# Chain introduction
# Create a Stuff Document Chain

In [None]:
# What is Chain:
# Chains refer to sequences of calls - whether to an LLM, a tool, or a data 
#preprocessing step. 
# The primary supported way to do this is with LCEL.

# What is Stuff Document Chain:
# This chain takes a list of documents and formats them all into a prompt, 
#then passes that prompt to an LLM. 
# It passes ALL documents, so you should make sure it fits within the context 
#window of the LLM you are using.

In [27]:
from langchain.chains.combine_documents import create_stuff_documents_chain

document_chain = create_stuff_documents_chain(llama2, prompt)

In [28]:
# Retriever in LangChain
# A retriever is an interface that returns documents given an unstructured query.
# It is more general than a vector store. 
# A retriever does not need to be able to store documents, 
#only to return (or retrieve) them. 
# Vector stores can be used as the backbone #of a retriever, 
#but there are other types of retrievers as well.

# Retrievers accept a string query as input and return a list of Document's as output.

In [32]:
# db is our vector store
# we have connected it to an interface
retriever = db.as_retriever()

In [29]:
from langchain.chains import create_retrieval_chain

In [33]:
retrieval_chain = create_retrieval_chain(retriever, document_chain)

In [39]:
response = retrieval_chain.invoke(input={"input":"What is Scale Product Attention?"})

In [40]:
response["answer"]

'Scaled Dot-Product Attention (SDPA) is a type of attention mechanism used in neural networks, specifically in the Transformer architecture. It is called "scaled dot-product attention" because it computes the attention weights by taking the dot product of the query and key vectors, scaling each vector by 1/√dk, and applying a softmax function to obtain the final weights.\n\nIn SDPA, the input consists of queries, keys, and values, all of which are vectors in dk dimensions. The attention function is computed as:\n\nAttention(Q, K, V) = softmax(QKT√dk)V (1)\n\nwhere Q, K, and V are matrices of queries, keys, and values, respectively. The scaling factor of 1/√dk is introduced to stabilize the computation when dealing with large values of dk.\n\nThe two most common attention mechanisms are additive attention and dot-product (multi-plicative) attention. While both mechanisms perform similarly in small dimensions, dot-product attention outperforms additive attention without scaling for large