## RAG using MergerRetriever and LongContextReorder

**Introduction to RAG using MergerRetriever and LongContextReorder**

"RAG using MergerRetriever and LongContextReorder" enhances the Retrieval-Augmented Generation (RAG) framework by combining a hybrid retrieval system (MergerRetriever) and long context management (LongContextReorder). MergerRetriever blends sparse and dense retrieval methods to gather diverse, relevant documents, while LongContextReorder optimizes and prioritizes important sections of long documents. This approach improves the accuracy and coherence of model-generated responses by ensuring the most relevant information is used effectively.

## Key Differences Between "RAG using MergerRetriever and LongContextReorder" and "RAG using Hybrid Retrievers and Reranking"

### 1. Focus:
- **MergerRetriever and LongContextReorder**: Focuses on `handling and optimizing long documents` and ensuring `diverse retrieval`.
- **Hybrid Retrievers and Reranking**: Focuses on improving `retrieval accuracy` and `document relevance` through `reranking`.

### 2. Document Management:
- **MergerRetriever and LongContextReorder**: Uses `LongContextReorder` to prioritize and optimize `long context documents`, ensuring the most relevant sections are passed to the model.
- **Hybrid Retrievers and Reranking**: Uses `reranking` to refine the choice of documents after retrieval, enhancing the quality of the final input based on relevance.

### 3. Retrieval Process:
- **MergerRetriever and LongContextReorder**: Combines multiple retrieval methods and optimizes long context data for more effective document handling.
- **Hybrid Retrievers and Reranking**: Combines multiple retrieval methods and uses an additional `reranking step` to prioritize the most relevant documents.

### 4. Objective:
- **MergerRetriever and LongContextReorder**: Primarily aims to manage `long and diverse contexts` for improved output quality.
- **Hybrid Retrievers and Reranking**: Primarily aims to refine the retrieval process and prioritize `relevant documents` for the best possible generation output.

### Step-1: Required Package Installation

These dependencies will set up a complete environment for working on a RAG system using Flash Reranker.

In [1]:
#!pip install langchain langchain_community huggingface-hub langchain-huggingface langchain-text-splitters chromadb langchain_groq pypdf

In [2]:
#!pip install sentence-transformers==2.2.2 InstructorEmbedding==1.0.1

### Step-2: Imports

These imports set up an environment for out tryout.

In [None]:
import os
import pypdf
import langchain
from dotenv import load_dotenv
from langchain_groq import ChatGroq
from langchain.document_loaders import PyPDFLoader
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.retrievers.merger_retriever import MergerRetriever
from langchain.retrievers.document_compressors import DocumentCompressorPipeline
from langchain.retrievers import ContextualCompressionRetriever
from langchain.document_transformers import LongContextReorder
from langchain.document_transformers import EmbeddingsClusteringFilter, EmbeddingsRedundantFilter

### Step-3: LLM Setup

In this step, we will be using the `llama-3.1-8b-instant` model from GROQ. To access and use the model, you will need to create an API key. 
Need steps for generate your API key, visit the following link: [GROQ API_Key_Generation](https://github.com/AryanKarumuri/Gen-AI-Projects/blob/main/README.md#api-key-generation-guide) 

In [None]:
load_dotenv()
GROQ_API_KEY = os.getenv("GROQ_API_KEY")
if GROQ_API_KEY:
    llm=ChatGroq(groq_api_key=GROQ_API_KEY,model_name="llama-3.1-8b-instant")
    print(GROQ_API_KEY)
else:
    print("Add Groq API Key")

### Step-4: Documents Loading

#### First document loading

In [15]:
orca_loader = PyPDFLoader("./data/Orca_paper.pdf")
orca_document = orca_loader.load()

In [16]:
print(len(orca_document))

51


#### Second document loading

In [17]:
ssra_loader = PyPDFLoader("./data/semantic_search_&_recommendation_algorithms.pdf")
ssra_document = ssra_loader.load()

In [18]:
print(len(ssra_document))

6


### Step-5: Documents Splitting

In [19]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500,chunk_overlap=100)

#### First document splitting

In [20]:
text_orca = text_splitter.split_documents(orca_document)
print(len(text_orca))

382


#### Second document splitting

In [21]:
text_ssra = text_splitter.split_documents(ssra_document)
print(len(text_ssra))

67


### Step-6: Loading Embedding Model

- **`hf_embeddings`** is an instance of `HuggingFaceEmbeddings` using the `sentence-transformers/all-MiniLM-L6-v2` model, which generates dense vector representations for sentences. This model is efficient, offering high-quality embeddings for tasks like semantic similarity and clustering.

In [22]:
#Embeddings
hf_embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

### Step-7: Database Creation and DB Retriever

- **Database Creation** (`Chroma.from_documents()`): Creates a vector store using **Chroma**, which indexes the document chunks and stores their embeddings in a collection.
- **DB Retriever** (`vector_store.as_retriever()`): Converts the vector store into a retriever, allowing for efficient querying and retrieval of relevant documents based on vector similarity.

#### db_collection-1

In [40]:
orca_vector_store = Chroma.from_documents(
    documents=text_orca,       
    embedding=hf_embeddings,    
    collection_name="db_collection-1"  
)

#### retriever-1 for db_collection-1

In [43]:
retriever_orca = orca_vector_store.as_retriever(search_type="mmr",search_kwargs={"k": 5})

#### db_collection-2

In [41]:
ssra_vector_store = Chroma.from_documents(
    documents=text_ssra,       
    embedding=hf_embeddings,    
    collection_name="db_collection-2"  
)

#### retriever-2 for db_collection-2

In [44]:
retriever_ssra = ssra_vector_store.as_retriever(search_type="mmr",search_kwargs={"k": 5})

### Step-8: [LOTR(Lord Of Retriever)](https://python.langchain.com/docs/integrations/retrievers/merger_retriever/)

Lord of the Retrievers (LOTR), or MergerRetriever, combines the results of multiple retrievers'. It enhances retrieval accuracy by reducing bias and prioritizing the most relevant documents.

In [45]:
lotr = MergerRetriever(retrievers=[retriever_orca, retriever_ssra])

#### Let's ask question from `orca.pdf` and check whether it is able retrieve or not

In [47]:
for chunks in lotr.invoke("what is Instruction Tuning?"):
    print(chunks.page_content)

Figure 4: Instruction-tuning with GPT-49. Given user instructions for a task and an input,
the system generates a response. Existing works like Alpaca [7], Vicuna [9] and variants
follow a similar template to train small models with⟨{user instruction, input}, output⟩.
2 Preliminaries
2.1 Instruction Tuning
Instruction tuning [22] is a technique that allows pre-trained language models to learn
from input (natural language descriptions of the task) and response pairs, for example,
latency of the search process, enabling real-time data retrieval
even with large-scale datasets.
E. Evaluation and Testing
To assess the performance of the system, several key metrics
are used to evaluate both the accuracy of the results and the
efficiency of the retrieval process. These metrics include:
Precision: Measures the accuracy of relevant results re-
trieved. Recall: Evaluates the completeness of the retrieved
results. F1-Score: Combines precision and recall to provide a
Contents
1 Introduction 4
1.1 

#### Let's ask question from `semantic_search_&_recommendation_algorithms.pdf` and check whether it is able retrieve or not

In [49]:
for chunks in lotr.invoke("Explain about Word2Vec Model for Semantic Understanding?"):
    print(chunks.page_content)

Data Biases:Large language models, trained on extensive data, can inadvertently carry
biases present in the source data. Consequently, the models may generate outputs that could
be potentially biased or unfair.
Lack of Contextual Understanding:Despite their impressive capabilities in language un-
derstanding and generation, these models exhibit limited real-world understanding, resulting
in potential inaccuracies or nonsensical responses.
vector representations.
B. Word2Vec Model for Semantic Understanding
Word2Vec is used to convert textual data into vector em-
beddings, capturing the semantic relationships between words
based on their usage in context. By training the model on a
large corpus of data, Word2Vec allows the semantic meaning
of terms to be represented in a dense vector space, where
similar words are clustered closer together.
from gensim.models import Word2Vec from
nltk.tokenize import word_tokenize
[33] Tommaso Caselli, Valerio Basile, Jelena Mitrovic, and M. Granitzer. 

### Step-9: Retriever Setup

1. **Embedding Filter:**
   The `EmbeddingsRedundantFilter` is used to filter out redundant embeddings from a set of embeddings. This step helps in eliminating unnecessary or duplicate information, ensuring that the embeddings contain only the most relevant features for downstream processing.

2. **Reordering:**
   The `LongContextReorder` component is responsible for reordering the content within a document. This process considers the long-term context of the document, ensuring that the information is structured in a more optimal way for better comprehension and usage.

3. **Document Compression Pipeline:**
   A `DocumentCompressorPipeline` is created by chaining together the `embedding_filter` and `reordering` steps. This pipeline compresses documents by first filtering out redundant embeddings and then reordering the content to preserve the most critical information while reducing unnecessary parts.

4. **Contextual Compression Retriever:**
   The `ContextualCompressionRetriever` utilizes the previously defined compression pipeline and combines it with a base retriever (`lotr`). It is designed to retrieve the most relevant documents after they have been compressed. The retrieval process is customized with search parameters, specifying that the top 3 results should be returned based on the compressed context.

In [52]:
embedding_filter = EmbeddingsRedundantFilter(embeddings=hf_embeddings)
reordering = LongContextReorder()
pipeline = DocumentCompressorPipeline(transformers=[embedding_filter, reordering])
compression_retriever_reordered = ContextualCompressionRetriever(
            base_compressor=pipeline, base_retriever=lotr,search_kwargs={"k": 3}
)

### Step-10: Chain Setup
The code uses a **RetrievalQA** chain to perform a question-answering task using a compressed retriever.

In [56]:
qa = RetrievalQA.from_chain_type(
      llm=llm,
      chain_type="stuff",
      retriever = compression_retriever_reordered,
      return_source_documents = True
)

In [57]:
query ="Explain about Comparison of Search Accuracy?"
results = qa(query)
print(results['result'])

  results = qa(query)


In the given context, the Comparison of Search Accuracy refers to the assessment of the proposed search method against traditional search algorithms. The search accuracy was evaluated by measuring precision, recall, and F1-scores across datasets of increasing complexity.

Here's a breakdown of the key points related to the Comparison of Search Accuracy:

1. **Assessment Metrics**: The search accuracy was assessed using precision, recall, and F1-scores. Precision measures the accuracy of relevant results retrieved, recall evaluates the completeness of the retrieved results, and F1-score combines precision and recall to provide a comprehensive measure of accuracy.
2. **Dataset Complexity**: The assessment was conducted across datasets of increasing complexity, indicating that the proposed method was tested on a range of data scenarios.
3. **Comparison with Traditional Algorithms**: The proposed method outperformed traditional algorithms in terms of search accuracy, precision, recall, and

In [None]:
print(results["source_documents"])