# Langchain's Chain and Retriever

#### Agent-Based Approach vs Rag Chain-Based Approach

Agent-Based Approach  -> use an openai agent when there are multiple tools to include as data sources e.x. wikipedia, arxiv, websites, pdfs, etc. 
from langchain.agents import create_openai_tools_agent
agent = create_openai_tools_agent(llm=chat_ollama , tools=tools, prompt=premade_prompt)
from langchain.agents import AgentExecutor
agent_executor=AgentExecutor(agent=agent, tools=tools ,verbose=True)
agent_executor.invoke({"input": "Hi"})

and 

RAG Chain-Based Approach -> an openai agent isn't necessary when there is a single data source 
from langchain.chains.combine_documents import create_stuff_documents_chain
docs_chain = create_stuff_documents_chain(llm=llm_model, prompt=prompt)
retriever = db.as_retriever()
from langchain.chains import create_retrieval_chain
retrieval_chain = create_retrieval_chain(retriever, docs_chain)
retrieval_chain.invoke({"input": "For language generation tasks, we find that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline."})


In [6]:
# Data Ingestion (load data source)
from langchain_community.document_loaders import PyPDFLoader, TextLoader, WebBaseLoader

In [11]:
pdf_loader = PyPDFLoader(file_path="rag_research_paper.pdf")
pdf_loader

<langchain_community.document_loaders.pdf.PyPDFLoader at 0x1125d90d0>

In [12]:
pdf_docs = pdf_loader.load()
pdf_docs

[Document(metadata={'source': 'rag_research_paper.pdf', 'page': 0}, page_content='Retrieval-Augmented Generation for\nKnowledge-Intensive NLP Tasks\nPatrick Lewis†‡, Ethan Perez⋆,\nAleksandra Piktus†, Fabio Petroni†, Vladimir Karpukhin†, Naman Goyal†, Heinrich Küttler†,\nMike Lewis†, Wen-tau Yih†, Tim Rocktäschel†‡, Sebastian Riedel†‡, Douwe Kiela†\n†Facebook AI Research;‡University College London;⋆New York University;\nplewis@fb.com\nAbstract\nLarge pre-trained language models have been shown to store factual knowledge\nin their parameters, and achieve state-of-the-art results when ﬁne-tuned on down-\nstream NLP tasks. However, their ability to access and precisely manipulate knowl-\nedge is still limited, and hence on knowledge-intensive tasks, their performance\nlags behind task-speciﬁc architectures. Additionally, providing provenance for their\ndecisions and updating their world knowledge remain open research problems. Pre-\ntrained models with a differentiable access mechanism to

Chunk Text to a higher level of granularity 

What is chunk size and chunk overlap? (RecursiveCharacterTextSplitter)

- Chunk size is just the number of characters to be inlcluded in each chunk 
- Chunk overlap: the last number of characters to be used from the previous chunk. This helps the model better understand the context of the corpus. 
- Example 1:
    - Chunk 0: 10 characters 
    - Chunk 1: 5 last characters from previous chunk 0 + 5 new characters
    - Chunk 2: 5 last characters from previous chunk 1 + 5 new characters
    - Chunk 3: 5 last characters from previous chunk 2 + 5 new characters
- Example 2: input="The dog ran up the hill.", chunk_size=10, chunk_overlap=5
    - Chunk 0: "The dog ra"
    - Chunk 1: "dog ran up"
    - Chunk 2: "ran up the"
    - Chunk 3: "up the hil"
    - Chunk 4: "the hill."


In [15]:
# Split Documents
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [20]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=10)

In [34]:
# Text Splitter via documents 
pdf_docs_split = text_splitter.split_documents(pdf_docs[:10])
pdf_docs_split

[Document(metadata={'source': 'rag_research_paper.pdf', 'page': 0}, page_content='Retrieval-Augmented Generation for\nKnowledge-Intensive NLP Tasks\nPatrick Lewis†‡, Ethan Perez⋆,'),
 Document(metadata={'source': 'rag_research_paper.pdf', 'page': 0}, page_content='Aleksandra Piktus†, Fabio Petroni†, Vladimir Karpukhin†, Naman Goyal†, Heinrich Küttler†,'),
 Document(metadata={'source': 'rag_research_paper.pdf', 'page': 0}, page_content='Mike Lewis†, Wen-tau Yih†, Tim Rocktäschel†‡, Sebastian Riedel†‡, Douwe Kiela†'),
 Document(metadata={'source': 'rag_research_paper.pdf', 'page': 0}, page_content='†Facebook AI Research;‡University College London;⋆New York University;\nplewis@fb.com\nAbstract'),
 Document(metadata={'source': 'rag_research_paper.pdf', 'page': 0}, page_content='Abstract\nLarge pre-trained language models have been shown to store factual knowledge'),
 Document(metadata={'source': 'rag_research_paper.pdf', 'page': 0}, page_content='in their parameters, and achieve state-of-t

In [35]:
# Embeddings and save to vector store
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import Chroma, FAISS, LanceDB

db = LanceDB.from_documents(documents=pdf_docs_split, embedding=OllamaEmbeddings())
db

<langchain_community.vectorstores.lancedb.LanceDB at 0x28aa81af0>

In [36]:
# Query the vector store via applying a similarity search given an input query
result = db.similarity_search(query="What is a rag system?")
result[0].page_content

'BART better 7.1% 16.8%\nRAG better 42.7% 37.4%\nBoth good 11.7% 11.8%\nBoth poor 17.7% 6.9%'

In [40]:
# Instatiate Ollama model
''' 
To install models, run `ollama run insert_your_model_name`
'''
from langchain_community.llms import Ollama
llm_model = Ollama(model='gemma2')
llm_model

Ollama(model='gemma2')

Question-Answering Chatbot

Design Chat Prompt Template 

Instead of finding a similarity search from a given query, can provide a prompt with instructions and a question

In [88]:
from langchain_core.prompts import ChatPromptTemplate

prompt  = ChatPromptTemplate.from_template(
    """
    Use your knowledge to find the correct answer to the question
    given the context. 
    Do this please and I will give you a gold star.
    <context>
    {context}
    </context>
    Question: {input} 
    """
)

Chains [https://python.langchain.com/v0.1/docs/modules/chains/]

In [89]:
# Chain Introduction: sequence of calls, multiple components tied together
# Create Document Chain

# create_stuff_documents_chain: formats ALL the documents to be included insdide the prompt
from langchain.chains.combine_documents import create_stuff_documents_chain

In [90]:
docs_chain = create_stuff_documents_chain(llm=llm_model, prompt=prompt)
docs_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), config={'run_name': 'format_inputs'})
| ChatPromptTemplate(input_variables=['context', 'input'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'input'], template='\n    Use your knowledge to find the correct answer to the question\n    given the context. \n    Do this please and I will give you a gold star.\n    <context>\n    {context}\n    </context>\n    Question: {input} \n    '))])
| Ollama(model='gemma2')
| StrOutputParser(), config={'run_name': 'stuff_documents_chain'})

Retrievers [https://python.langchain.com/v0.1/docs/modules/data_connection/retrievers/]

Retrievers: Interface of vector store for retrieving a list of documents given a query
The interface has a backend that connects with the Vector Store. 
It does not store the docs, just retreives them.


In [91]:
retriever = db.as_retriever()
retriever

VectorStoreRetriever(tags=['LanceDB', 'OllamaEmbeddings'], vectorstore=<langchain_community.vectorstores.lancedb.LanceDB object at 0x28aa81af0>)

Retrieval Chain: combination of a Chain and a Retriever. Basically, the chain passes the input query to the retriever to output relevant documents. Those documents and input query are then fed into a LLM to generate responses.

Retriever = interface for retrieving quick responses from Vector Store

Chain (Stuff document chain) = llm + prompt

Chain Retrieval = Chain + Retriever 

---

User (query) -> Retriever -> Vector Store

Retriever -> Chain (stuff document chain (LLM -> Prompt))

Chain (stuff document chain) -> invoke response







create_retrieval_chain vs create_retrival_tool

create_retrieval_chain: build complex retrieval processes involving multiple steps
create_retrival_tool: simple retrieval tool mechanism

In [None]:
from langchain.chains import create_retrieval_chain

In [92]:
retrieval_chain = create_retrieval_chain(retriever, docs_chain)
retrieval_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(...)
           | VectorStoreRetriever(tags=['LanceDB', 'OllamaEmbeddings'], vectorstore=<langchain_community.vectorstores.lancedb.LanceDB object at 0x28aa81af0>), config={'run_name': 'retrieve_documents'})
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), config={'run_name': 'format_inputs'})
            | ChatPromptTemplate(input_variables=['context', 'input'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'input'], template='\n    Use your knowledge to find the correct answer to the question\n    given the context. \n    Do this please and I will give you a gold star.\n    <context>\n    {context}\n    </context>\n    Question: {input} \n    '))])
            | Ollama(model='gemma2')
            | StrOutputParser(), config={'r

In [93]:
retrieval_chain.invoke({"input": "For language generation tasks, we find that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline."})

{'input': 'For language generation tasks, we find that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.',
 'context': [Document(metadata={'page': 1, 'source': 'rag_research_paper.pdf'}, page_content='extractive tasks, we ﬁnd that unconstrained generation outperforms previous extractive approaches.'),
  Document(metadata={'page': 6, 'source': 'rag_research_paper.pdf'}, page_content='improves results on all other tasks, especially for Open-Domain QA, where it is crucial.'),
  Document(metadata={'page': 4, 'source': 'rag_research_paper.pdf'}, page_content='re-ranker nor extractive reader is necessary for state-of-the-art performance.'),
  Document(metadata={'page': 8, 'source': 'rag_research_paper.pdf'}, page_content='language model could achieve strong performance across both discriminative and generative tasks.')],
 'answer': "The provided context does not contain the answer to your question about language generati

In [97]:
response = retrieval_chain.invoke({"input": "Are BART or RAG models better?"})

In [98]:
response

{'input': 'Are BART or RAG models better?',
 'context': [Document(metadata={'page': 7, 'source': 'rag_research_paper.pdf'}, page_content='BART better 7.1% 16.8%\nRAG better 42.7% 37.4%\nBoth good 11.7% 11.8%\nBoth poor 17.7% 6.9%'),
  Document(metadata={'page': 6, 'source': 'rag_research_paper.pdf'}, page_content='novelbythis\nauthorof”A\nFarewellto\nArms”Doc 1\nDoc 2\nDoc 3\nDoc 4'),
  Document(metadata={'page': 7, 'source': 'rag_research_paper.pdf'}, page_content='52.1 41.8 52.6 11.8 19.6 56.7 47.3'),
  Document(metadata={'page': 7, 'source': 'rag_research_paper.pdf'}, page_content='between these dates and use a template “Who is {position}?” (e.g. “Who is the President of Peru?”)')],
 'answer': "Based on the context provided, **RAG models are better**.  \n\nHere's why:\n\n* The context shows performance percentages for BART and RAG.\n* RAG has a higher percentage in both categories (7.1% vs 42.7%) indicating better performance. \n\n\nLet me know if you have any other questions! 😊  \n

In [99]:
response['answer']

"Based on the context provided, **RAG models are better**.  \n\nHere's why:\n\n* The context shows performance percentages for BART and RAG.\n* RAG has a higher percentage in both categories (7.1% vs 42.7%) indicating better performance. \n\n\nLet me know if you have any other questions! 😊  \n"


There are other data (Documents) that is not relavant to the RAG research paper that was used as input. Reason being, I'm using a pre-trained llama2 model. Llama 2 uses a collection of foundation language models ranging from 7B to 70B parameters.


In [103]:
response2 = retrieval_chain.invoke({"input": "I like Natural Language Processing!"})
response2

{'input': 'I like Natural Language Processing!',
 'context': [Document(metadata={'page': 6, 'source': 'rag_research_paper.pdf'}, page_content='responses. ‘?’ indicates factually incorrect responses, * indicates partially correct responses.'),
  Document(metadata={'page': 6, 'source': 'rag_research_paper.pdf'}, page_content='when generating “A Farewell to Arms" and for document 2 when generating “The Sun Also Rises".'),
  Document(metadata={'page': 4, 'source': 'rag_research_paper.pdf'}, page_content='Natural Questions'),
  Document(metadata={'page': 1, 'source': 'rag_research_paper.pdf'}, page_content='Fact V eriﬁcation: Fact Querysupports \t(y)\nQuestion GenerationFact V eriﬁcation:')],
 'answer': 'That\'s great!  I\'m glad you like Natural Language Processing. 😊\n\nHowever, your question doesn\'t relate to the context provided about "A Farewell to Arms" and "The Sun Also Rises".  To earn a gold star, try asking a question about those books! 📚  \n\n\nLet me know if you have another qu

In [102]:
response2["answer"]


'The provided context doesn\'t contain information about RAG models. It talks about the currency used in Scotland.  \n\nTherefore, there\'s no way to answer your question "I like RAG models!" based on the given context. \n\n\nLet me know if you have a question related to Scottish currency! 😊  \n'

In [108]:
response3 = retrieval_chain.invoke({"input": "RAG models"})
response3

{'input': 'RAG models',
 'context': [Document(metadata={'page': 8, 'source': 'rag_research_paper.pdf'}, page_content='architecture'),
  Document(metadata={'page': 8, 'source': 'rag_research_paper.pdf'}, page_content='evidence documents'),
  Document(metadata={'page': 2, 'source': 'rag_research_paper.pdf'}, page_content='target class'),
  Document(metadata={'page': 8, 'source': 'rag_research_paper.pdf'}, page_content='the model')],
 'answer': 'RAG models are a type of **Retrieval-Augmented Generation** model.  They combine the strengths of traditional language models with the ability to access and retrieve information from external knowledge sources (like the "evidence documents" mentioned in your context). \n\nEssentially, they use a retriever to find relevant information from a knowledge base and then feed that information into a generator, which uses it to construct a more accurate and informative response.\n\n\nLet me know if you\'d like more details about how RAG models work!  ✨  \

In [109]:
response3["answer"]

'RAG models are a type of **Retrieval-Augmented Generation** model.  They combine the strengths of traditional language models with the ability to access and retrieve information from external knowledge sources (like the "evidence documents" mentioned in your context). \n\nEssentially, they use a retriever to find relevant information from a knowledge base and then feed that information into a generator, which uses it to construct a more accurate and informative response.\n\n\nLet me know if you\'d like more details about how RAG models work!  ✨  \n'