## Retriever And Chain With Langchain

In [1]:
# Load a PDF document
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("NIPS-2017-attention-is-all-you-need-Paper.pdf")
docs = loader.load()
docs

  from .autonotebook import tqdm as notebook_tqdm


[Document(metadata={'producer': 'PyPDF2', 'creator': 'PyPDF', 'creationdate': '', 'subject': 'Neural Information Processing Systems http://nips.cc/', 'publisher': 'Curran Associates, Inc.', 'language': 'en-US', 'created': '2017', 'eventtype': 'Poster', 'description-abstract': 'The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms.  We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On 

In [2]:
# Split the document into chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
text_splitter.split_documents(docs)[:5]

[Document(metadata={'producer': 'PyPDF2', 'creator': 'PyPDF', 'creationdate': '', 'subject': 'Neural Information Processing Systems http://nips.cc/', 'publisher': 'Curran Associates, Inc.', 'language': 'en-US', 'created': '2017', 'eventtype': 'Poster', 'description-abstract': 'The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms.  We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On 

In [3]:
print(text_splitter.split_documents(docs)[0].page_content)

Attention Is All You Need
Ashish Vaswani∗
Google Brain
avaswani@google.com
Noam Shazeer∗
Google Brain
noam@google.com
Niki Parmar∗
Google Research
nikip@google.com
Jakob Uszkoreit∗
Google Research
usz@google.com
Llion Jones∗
Google Research
llion@google.com
Aidan N. Gomez∗†
University of Toronto
aidan@cs.toronto.edu
Łukasz Kaiser ∗
Google Brain
lukaszkaiser@google.com
Illia Polosukhin∗‡
illia.polosukhin@gmail.com
Abstract
The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks that include an encoder and a decoder. The best
performing models also connect the encoder and decoder through an attention
mechanism. We propose a new simple network architecture, the Transformer,
based solely on attention mechanisms, dispensing with recurrence and convolutions
entirely. Experiments on two machine translation tasks show these models to
be superior in quality while being more parallelizable and requiring signiﬁcantly


In [4]:
documents=text_splitter.split_documents(docs)
documents

[Document(metadata={'producer': 'PyPDF2', 'creator': 'PyPDF', 'creationdate': '', 'subject': 'Neural Information Processing Systems http://nips.cc/', 'publisher': 'Curran Associates, Inc.', 'language': 'en-US', 'created': '2017', 'eventtype': 'Poster', 'description-abstract': 'The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms.  We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On 

In [5]:
len(documents)

39

In [6]:
# if you have openai key set in env variable use OpenAIEmbeddings
#from langchain_community.embeddings import OpenAIEmbeddings
# Imports
from langchain_ollama import OllamaEmbeddings
from langchain_community.vectorstores import FAISS

# Step 1: Initialize embedding model
embedding_model = OllamaEmbeddings(model="nomic-embed-text")  # or "llama3"

# Step 2: Create a FAISS vector store from the first 30 documents
db = FAISS.from_documents(documents[:30], embedding_model)

print("Vector store created successfully!")
print("Number of documents embedded:", len(documents[:30]))




Vector store created successfully!
Number of documents embedded: 30


In [7]:
db

<langchain_community.vectorstores.faiss.FAISS at 0x2bc7aea9c40>

In [8]:
len(db.docstore._dict)

30

In [9]:
print(len(db.index.reconstruct(0)), db.index.reconstruct(0))

768 [-1.94168109e-02  3.13537978e-02 -1.81279927e-01 -3.46602090e-02
  6.71132505e-02 -6.85188174e-02  5.87481707e-02  4.52597141e-02
 -2.31436305e-02  1.88947860e-02 -5.47440536e-02  2.62301397e-02
  9.16537791e-02  4.68105674e-02  2.20408347e-02  1.91228688e-02
 -3.02617475e-02  8.07264820e-03 -4.69784327e-02 -1.60172153e-02
 -5.38251325e-02 -7.79333804e-03  2.26295646e-02 -6.96132798e-03
  4.99532968e-02  2.51040198e-02 -4.18427959e-02 -1.93873849e-02
 -5.34961037e-02  5.12158610e-02  6.00285046e-02 -7.97178075e-02
 -3.12147606e-02 -2.65110321e-02 -5.81589863e-02 -1.50357038e-02
  2.54552756e-02 -3.55681665e-02 -7.15214666e-03  9.27589610e-02
  1.22143198e-02  1.88296586e-02 -2.40291916e-02 -4.70993146e-02
  6.04599938e-02 -1.49125066e-02  1.35369478e-02  5.99929597e-03
  4.35428917e-02 -5.46701737e-02  8.21378268e-03  2.13072691e-02
 -5.12650609e-03  1.22928852e-02  1.03318684e-01  2.50043441e-02
 -1.32362694e-02  5.08860871e-03  1.00416699e-02 -4.30074707e-02
  6.05815798e-02  6.2

In [10]:
query="An attention function can be described as mapping a query "
result=db.similarity_search(query)
result[0].page_content

'Scaled Dot-Product Attention\n Multi-Head Attention\nFigure 2: (left) Scaled Dot-Product Attention. (right) Multi-Head Attention consists of several\nattention layers running in parallel.\nquery with all keys, divide each by √dk, and apply a softmax function to obtain the weights on the\nvalues.\nIn practice, we compute the attention function on a set of queries simultaneously, packed together\ninto a matrix Q. The keys and values are also packed together into matrices Kand V. We compute\nthe matrix of outputs as:\nAttention(Q,K,V ) = softmax(QKT\n√dk\n)V (1)\nThe two most commonly used attention functions are additive attention [2], and dot-product (multi-\nplicative) attention. Dot-product attention is identical to our algorithm, except for the scaling factor\nof 1√dk\n. Additive attention computes the compatibility function using a feed-forward network with\na single hidden layer. While the two are similar in theoretical complexity, dot-product attention is'

In [11]:
dict(result[0])

{'id': '90f6f586-b757-43cd-827b-ed68d3032914',
 'metadata': {'producer': 'PyPDF2',
  'creator': 'PyPDF',
  'creationdate': '',
  'subject': 'Neural Information Processing Systems http://nips.cc/',
  'publisher': 'Curran Associates, Inc.',
  'language': 'en-US',
  'created': '2017',
  'eventtype': 'Poster',
  'description-abstract': 'The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms.  We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving o

In [12]:
print(result)

[Document(id='90f6f586-b757-43cd-827b-ed68d3032914', metadata={'producer': 'PyPDF2', 'creator': 'PyPDF', 'creationdate': '', 'subject': 'Neural Information Processing Systems http://nips.cc/', 'publisher': 'Curran Associates, Inc.', 'language': 'en-US', 'created': '2017', 'eventtype': 'Poster', 'description-abstract': 'The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms.  We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existi

In [13]:
from langchain_community.llms import Ollama
## Load Ollama LAMA2 LLM model
llm=Ollama(model="llama2")
llm

  llm=Ollama(model="llama2")


Ollama()

In [14]:
#Design a Chat PromptTemplate
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_template(
"""
Answer the following question based only on the provided context. 
Think step by step before providing a detailed answer. 
I will tip you $1000 if the user finds the answer helpful.
<context>
{context}
<context>
Question: {input}
"""
)

In [15]:
## Chain Introduction
## Create Stuff Docment Chain

from langchain.chains.combine_documents import create_stuff_documents_chain

document_chain=create_stuff_documents_chain(llm,prompt)

In [16]:
"""
Retrievers: A retriever is an interface that returns documents given
 an unstructured query. It is more general than a vector store.
 A retriever does not need to be able to store documents, only to 
 return (or retrieve) them. Vector stores can be used as the backbone
 of a retriever, but there are other types of retrievers as well. 
 https://python.langchain.com/docs/modules/data_connection/retrievers/   
"""

retriever=db.as_retriever()
retriever

VectorStoreRetriever(tags=['FAISS', 'OllamaEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x000002BC7AEA9C40>, search_kwargs={})

In [17]:
"""
Retrieval chain:This chain takes in a user inquiry, which is then
passed to the retriever to fetch relevant documents. Those documents 
(and original inputs) are then passed to an LLM to generate a response
https://python.langchain.com/docs/modules/chains/
"""
from langchain.chains import create_retrieval_chain
retrieval_chain=create_retrieval_chain(retriever,document_chain)

In [18]:
response=retrieval_chain.invoke({"input":"Scaled Dot-Product Attention"})

In [19]:
response['answer']

'The question is asking about the difference between two types of attention mechanisms used in Transformer models: Scaled Dot-Product Attention and Multi-Head Attention.\n\nScaled Dot-Product Attention is a type of attention mechanism that computes the attention weights by taking the dot product of the query and key vectors, scaling the result by 1√dk, and applying a softmax function to obtain the final attention weights. The input consists of queries, keys, and values, all of dimension dk, and the output is a vector of dimension dv.\n\nMulti-Head Attention, on the other hand, performs multiple attention functions in parallel, each with their own set of learnable linear projections of the queries, keys, and values to dimensions dk, dk, and dv, respectively. The outputs of these attention functions are then concatenated and projected to obtain the final output.\n\nThe question asks why Multi-Head Attention is faster and more space-efﬁcient than Scaled Dot-Product Attention in practice, 

In [20]:
print(response)

{'input': 'Scaled Dot-Product Attention', 'context': [Document(id='90f6f586-b757-43cd-827b-ed68d3032914', metadata={'producer': 'PyPDF2', 'creator': 'PyPDF', 'creationdate': '', 'subject': 'Neural Information Processing Systems http://nips.cc/', 'publisher': 'Curran Associates, Inc.', 'language': 'en-US', 'created': '2017', 'eventtype': 'Poster', 'description-abstract': 'The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms.  We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEng