<a href="https://colab.research.google.com/github/arnabd64/Aadhar-Card-Entity-Extract/blob/main/Langchain_Day_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Langchain Document Retrieval Query

In this notebook we will build a chatbot that is capable of Retrieval Augmented Generation or RAG. It is a process where we feed the Large Language Modelwith data it has not seen during it's training process and ask questions based off that unseen data.

There are a lot of components involved in building the chain and we will be covering only the important ones.

1. Document Loaders
2. Text Splitter
3. Embeddings & Document Embeddings
4. Vector Store
5. Retriever

# Install Libraries

In [2]:
! pip install --progress-bar=off --no-cache-dir \
    langchain==0.2.10 \
    langchain-community==0.2.10 \
    langchain-chroma \
    langchain-text-splitters \
    chromadb \
    pypdf \
    python-dotenv \
> install.log

In [3]:
import os
import dotenv
assert dotenv.load_dotenv('./.env'), 'Unable to load ./.env'

# Load the Components

In [52]:
from langchain.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.chat_message_histories.file import FileChatMessageHistory
from langchain_chroma import Chroma
from langchain_community.embeddings.ollama import OllamaEmbeddings
from langchain_community.chat_models.ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
import chromadb

# 1. Document Loader

A document loader is langchain module that helps to load and process documents in langchain. There are several document loaders ranging from PDF, Plaintext, Marekdown, HTML Webpages and more. A `Document` according to langchain is a piece of text along with optional metadata.

Langchain Documentation: [Document Loaders](https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/)

Here we are going to query from the famous 2017 Academic paper [Attention is All you need](https://arxiv.org/pdf/1706.03762). You can download the PDF from the link or can run the folloiwng command in Google Colab:

```bash
wget -O Attention-is-all-you-need.pdf https://arxiv.org/pdf/1706.03762
```

In [5]:
pdf_file = PyPDFLoader('/content/Attention-is-all-you-need.pdf')

# 2. Text Splitter

One problem with Large Language Models is that if we feed an entire document to them then there arises two issues, First higher computation times due to the large amounts of text sent as input and second which is that the input text is longer than the model's context window which results in the model hallucinating.

The solution to this is issue is to split the document into smaller chunks and instead of feeding the entire document to the LLM, we only feed the chunks that contain relevant information needed to answer the user's question.

Langchain Documentation: [Text Splitter](https://python.langchain.com/v0.2/docs/concepts/#text-splitters)

We will be using the `RecursiveCharacterTextSplitter` which allows the user to choose the character separators. Each chunk will have $256$ characters in length and $16$ chartacters of overlap. The overlap is used so that a chunk can contain text from the previous chunk. The number of characters to be included from the previous chunk is specified by the `chunk_overlap` parameter.

In [6]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=256, chunk_overlap=16)
documents = pdf_file.load_and_split(text_splitter)

print(f"Total Chunks: {len(documents)}")

Total Chunks: 193


# 3. Embeddings

Once the document has been splitted into smaller chunks, it is time to generate embeddings. _Embeddings are a vector representation of a text_.

1. For each chunk of text, an embedding is generated using an embeddings model.
2. All the text chunk and their corresponding embedding is stored as a key value pair in an __Embedding Store__ or __Vector Store__
3. An embedding is generated for the user's query.
4. A similarity search is performed between the user's query and text chunks based off their embeddings where each texty chunk is assigned a similarity score.
5. All chunks that score below a threshold are rejected and the top $k$ text chunks are used as a context for the LLM input.

Langchain Documentation: [Embeddings Model](https://python.langchain.com/v0.2/docs/concepts/#embedding-models)

Ollama provides embedding model: [`nomic-embed-text`](https://ollama.com/library/nomic-embed-text) we will be using this model to generate our embeddings. To download the model run `ollama pull nomic-embed-text`

In [7]:
embedding_model = OllamaEmbeddings(
    base_url = os.getenv('HOST'),
    model = os.getenv('EMBED')
)

# 4. Vector Store

A vector store is a database that stores the pairs of text chunks and their embeddings. Langchain provides quite a variety of vector stores ranging from local in memory vector stores to  vector databases hosted on the cloud.

Langchain Documentation: [Vector Stores](https://python.langchain.com/v0.2/docs/concepts/#vector-stores)

We will be using [Chroma](https://python.langchain.com/v0.2/docs/integrations/vectorstores/chroma/) DB with persistent storage, which means that all the key value pairs will be storage in the local storage as `sqlite3` database.

In [8]:
DOCUMENT_STORE_NAME = 'my_documents'

# create the vector store
vector_store = Chroma(
    collection_name = DOCUMENT_STORE_NAME,
    client = chromadb.PersistentClient(path=DOCUMENT_STORE_NAME)
)

# add documets
vector_store = vector_store.from_documents(documents, embedding_model)

# 5. Retriever

A retriever as the name suggests retrieves relevant documents from the vector store. it is tasked with performing a similarity search and assigning a score to each text chunk using their embedding.

Langchain Documentation: [Retriever](https://python.langchain.com/v0.2/docs/concepts/#retrievers)

In [37]:
search_settings = {
    'search_type': 'mmr',
    'search_kwargs': {
        'k': 5,
        'score_threshold': 0.2
    }
}
retriever = vector_store.as_retriever(**search_settings)

# Conversational Chain

Once the document has been loaded, splitted, embedded and stored in a vector store, it is time to build the chain that will carry out the conversation with the documents and the LLM. The chain will have basic conversation capabilities along with chat mmessage history which will be stored as a JSON file.

## 1. Initialize Large Language Model

In [33]:
chat_llm = ChatOllama(
    base_url = os.getenv('HOST'),
    model = os.getenv('LLM'),
    temperature = 0.8,
    timeout = 600,
    keep_alive = 3600
)

## 2. History Aware Retriever

First we need a chain that takes in the user's query and previous chat history and based off that information generates a query that will be used for similarity search on the vector store.

In [64]:
# Chain to contextualize the question
system_prompt_question_contextualize_template = """
Given a chat history and the latest user's question which might reference context
from the chat history. Formulate a standalone question which can be understood
without the chat history. Do NOT answer the question, just formulate it if needed
and otherwise return it as it.
"""
contextualize_prompt = ChatPromptTemplate.from_messages([
    ('system', system_prompt_question_contextualize_template.replace('\n','')),
    ('placeholder', '{chat_history}'),
    ('human', '{input}')
])

history_aware_retriever = create_history_aware_retriever(chat_llm, retriever, contextualize_prompt)

## 3. Question Answer chain

The QA chain is our main chain which deals with answering the user's questions using the documents from the previous chain as an input.

In [65]:
# Chain to answer the question
system_prompt = """
You are an assistant who is tasked for question answering tasks.
Use the following pieces of retrieved context to answer the question.
\m\n
{context}
"""
qa_prompt = ChatPromptTemplate.from_messages([
    ('system', system_prompt),\
    ('placeholder', '{chat_history}'),
    ('human', '{input}')
])

combine_documents = create_stuff_documents_chain(chat_llm, qa_prompt)

rag_chain = create_retrieval_chain(history_aware_retriever, combine_documents)

## 4. Add chat memory to the chain

We will be using `FileChatMessageHistory` module that will store the chat history as a `JSON` file.

In [66]:
def get_chat_history(session_id: str):
    filename = f"{session_id}.json"
    return FileChatMessageHistory(filename, encoding='utf-8')

In [67]:
conversational_chain = RunnableWithMessageHistory(
    rag_chain,
    get_chat_history,
    input_messages_key='input',
    history_messages_key='chat_history',
    output_messages_key='answer'
)

conversational_chain = (
    {'input': RunnablePassthrough()}
    | conversational_chain
    | RunnableLambda(lambda x: x['answer'])
)

# Run the Final Chain

In [74]:
question = 'Explain me attention mechanism?'

In [None]:
response = conversational_chain.invoke(
    question,
    config={'configurable': {'session_id': '3481'}}
)

In [None]:
print(response)