# **Optimizing RAG System Performance with Ensemble Retriever**

BLOG:
https://amitvkulkarni.medium.com/optimizing-rag-system-performance-with-ensemble-retriever-0e39e91bed7b

The performance of Retrieval-Augmented Generation (RAG) systems is crucial in the evolving AI landscape. These systems blend retrieval and generation capabilities, requiring accuracy and efficiency in handling diverse queries. The challenge lies in finding a balance between retrieving precise information and understanding complex questions, requiring advanced retrieval methods to push AI’s boundaries.

EnsembleRetriever is an innovative method to improve RAG system performance by combining the strengths of various retrieval algorithms. It integrates results from different retrievers and reranks them using the Reciprocal Rank Fusion algorithm, prioritizing the most relevant documents. This approach leverages the strengths of both sparse and dense retrievers, making the system more robust and effective.

In [63]:
# ! pip -q install tiktoken pypdf sentence-transformers InstructorEmbedding langchain_community huggingface_hub langchain-huggingface chromadb rank_bm25

# **Import the neccessary libraries**

In [2]:
import pandas as pd
import os
from langchain_huggingface import HuggingFaceEndpoint
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain.document_loaders import PyPDFLoader
from langchain.document_loaders import DirectoryLoader, TextLoader
from InstructorEmbedding import INSTRUCTOR
from langchain.embeddings import HuggingFaceInstructEmbeddings
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

  from tqdm.autonotebook import trange


In [3]:
from google.colab import userdata

userdata.get("HF_API_KEY")
userdata.get("HF_TOKEN")

print("API's loaded")

API's loaded


# **Loading the data**

In [4]:
loader = DirectoryLoader(f"/content/data", glob="*.txt", loader_cls=TextLoader)
documents = loader.load()
documents

[Document(metadata={'source': '/content/data/Politics.txt'}, page_content='The G7 is a club of Western nations (with Japan given that status as an ally of the West and a major economy) that have dominated the world and its institutions, in some cases for centuries, and retain the ambition to maintain that position by policy coordination amongst themselves and by co-opting rising powers, including India, given the shifts in global power in recent decades.\n\nThe G7 recognised that they could not manage the 2008 financial crisis on their own and needed a wider international partnership, but one under their aegis. With this in mind, the G20 forum hitherto at the finance minister level was raised to the summit level. The G20 agenda is, however, shifting increasingly towards the interests and priorities of the developing countries (now being referred to as the Global South). During India’s G20 presidency, with India holding the Voice of the Global South summits before presiding over the G20

# **Splitting the document using RecursiveCharacterTextSplitter**

In [5]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=10)
texts = text_splitter.split_documents(documents)
texts

[Document(metadata={'source': '/content/data/Politics.txt'}, page_content='The G7 is a club of Western nations (with Japan given that status as an ally of the West and a major'),
 Document(metadata={'source': '/content/data/Politics.txt'}, page_content='a major economy) that have dominated the world and its institutions, in some cases for centuries,'),
 Document(metadata={'source': '/content/data/Politics.txt'}, page_content='and retain the ambition to maintain that position by policy coordination amongst themselves and by'),
 Document(metadata={'source': '/content/data/Politics.txt'}, page_content='and by co-opting rising powers, including India, given the shifts in global power in recent'),
 Document(metadata={'source': '/content/data/Politics.txt'}, page_content='in recent decades.'),
 Document(metadata={'source': '/content/data/Politics.txt'}, page_content='The G7 recognised that they could not manage the 2008 financial crisis on their own and needed a'),
 Document(metadata={'sourc

In [68]:
# ! pip install sentence-transformers==2.2.2

In [6]:
from langchain.embeddings import HuggingFaceInstructEmbeddings
instructor_embeddings = HuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-xl")

load INSTRUCTOR_Transformer
max_seq_length  512


# **Using Chroma as Vector DB**

We’ll utilize Chroma for storing and retrieving documents. Our plan includes creating a retriever object that fetches the top 5 relevant chunks based on the question.

First, we will use a standard way of retrieving information i.,e using the semantic search feature and fetching 5 more relevant document chunks.

In [7]:
from langchain.vectorstores import Chroma
from langchain.retrievers import BM25Retriever, EnsembleRetriever

vectorstore = Chroma.from_documents(texts, instructor_embeddings)
vectorstore_retreiver = vectorstore.as_retriever(search_kwargs={"k": 5})

Now we will set up the second retriever using sparse retrievers, like BM25, that are precise for simple queries, while dense retrievers, based on semantic similarity, understand and retrieve documents based on context and deeper meaning, making them ideal for complex information needs, as they capture the deeper meaning behind queries.

In [None]:
keyword_retriever = BM25Retriever.from_documents(texts)
keyword_retriever.k =  5

# **Ensemble Retriever**

In [8]:
ensemble_retriever = EnsembleRetriever(retrievers=[vectorstore_retreiver,
                                                   keyword_retriever],
                                       weights=[0.5, 0.5])

In [9]:
def get_rag_response(retriever_choice, question):
    repo_id = "mistralai/Mistral-7B-Instruct-v0.2"
    llm = HuggingFaceEndpoint(repo_id=repo_id, max_length=128, temperature=0.7, token='HF_TOKEN')

    template = """Answer the question based ONLY on the following context:
    {context}
    Question: {question}
    """

    # Create a prompt from the template
    prompt = ChatPromptTemplate.from_template(template)

    chain = (
        {"context": retriever_choice, "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )

    response = chain.invoke(question)
    return response

In [10]:
question = "Why are all the countries meeting and what is it about?"

# **Building a basic retrieval**

In [11]:
doc_basic = vectorstore_retreiver.get_relevant_documents(question)
doc_basic

  warn_deprecated(


[Document(metadata={'source': '/content/data/Politics.txt'}, page_content='for promoting multipolarity, a greater role of developing countries in global governance, more'),
 Document(metadata={'source': '/content/data/Politics.txt'}, page_content='India holding the Voice of the Global South summits before presiding over the G20 and at the'),
 Document(metadata={'source': '/content/data/Politics.txt'}, page_content='countries (now being referred to as the Global South). During India’s G20 presidency, with India'),
 Document(metadata={'source': '/content/data/Politics.txt'}, page_content='permanent member at India’s initiative, the pro-Global South content of the G20 agenda has got'),
 Document(metadata={'source': '/content/data/Politics.txt'}, page_content='needed a wider international partnership, but one under their aegis. With this in mind, the G20')]

In [12]:
response = get_rag_response(vectorstore_retreiver, question)
print(response)

                    max_length was transferred to model_kwargs.
                    Please make sure that max_length is what you intended.
                    token was transferred to model_kwargs.
                    Please make sure that token is what you intended.



Answer: The countries mentioned in the context are referred to as the Global South. They are meeting to promote multipolarity, which means a balance of power among multiple entities. They aim for a greater role of developing countries in global governance. India, one of these countries, has been hosting the Voice of the Global South summits before presiding over the G20. During India's G20 presidency, the pro-Global South content of the G20 agenda has gained significance. These countries are seeking a wider international partnership, but under their leadership. Therefore, the meetings are about strengthening the collective voice and influence of the Global South in global governance.


The response from the model looks good. The system has understood the context and generated relevant content. Let’s check if we can better this response with a hybrid search in the next section.

# **Building with Ensemble Retriever**

In [13]:
doc_ensemble = ensemble_retriever.get_relevant_documents(question)
doc_ensemble

[Document(metadata={'source': '/content/data/Politics.txt'}, page_content='countries (now being referred to as the Global South). During India’s G20 presidency, with India'),
 Document(metadata={'source': '/content/data/Politics.txt'}, page_content='for promoting multipolarity, a greater role of developing countries in global governance, more'),
 Document(metadata={'source': '/content/data/Politics.txt'}, page_content='India holding the Voice of the Global South summits before presiding over the G20 and at the'),
 Document(metadata={'source': '/content/data/Politics.txt'}, page_content='the hegemony of the West that is still expressing itself in the form of sanctions, the weaponising'),
 Document(metadata={'source': '/content/data/Politics.txt'}, page_content='The G7 is a club of Western nations (with Japan given that status as an ally of the West and a major'),
 Document(metadata={'source': '/content/data/Politics.txt'}, page_content='permanent member at India’s initiative, the pro-Gl

In [14]:
# Example usage:
response = get_rag_response(ensemble_retriever, question)
print(response)

                    max_length was transferred to model_kwargs.
                    Please make sure that max_length is what you intended.
                    token was transferred to model_kwargs.
                    Please make sure that token is what you intended.



Answer: The countries, now referred to as the Global South, are meeting under the framework of the G20 during India's presidency. The primary focus of the meetings is to promote multipolarity, a greater role of developing countries in global governance, and to resist the hegemony of the West. India has been organizing Voice of the Global South summits before presiding over the G20. The G20 agenda has been influenced by pro-Global South content on global issues. Additionally, BRICS, a group of non-Western countries, is expanding to further this cause. The G7, a club of Western nations, is also involved, with Japan as a permanent member at India's initiative.


The ensemble approach generates a response with the right context and captures additional points that the basic version missed, such as BRICS and G7. Although the document’s content is not extensive, there are differences in responses from both approaches. In real-world projects, it is important to carry out these experiments to determine the most suitable approach for implementation.

# **Conclusion**
The EnsembleRetriever enhances RAG system performance by combining the precision of sparse retrievers with the contextual understanding of dense retrievers ensuring a balanced and robust retrieval process. This hybrid approach improves both the accuracy and relevance of generated responses and as AI evolves, utilizing such techniques will be crucial for advancing information retrieval and generation. Embrace the EnsembleRetriever to make your RAG systems smarter and more efficient in handling complex queries.