# RAG with Langchain, Reranking and MergerRetriever (LOTR)

This notebook discusses two enhancements of RAG:
1. When you have different kind of documents, you can't create a single index. you have to create multiple indexes and have to index via all of them separately to perform the RAG. we are going to see how we can do a better RAG in this case using Merger Retriever (LOTR)

2. When the context is long, we organize the documents as high-rank (high similarity) documents in top and bottom and put low / average documents in the middle using `LongContextReorder`.

## Install dependencies

In [1]:
!pip install langchain==0.1 langchain_openai python-dotenv chromadb huggingface_hub -q


[notice] A new release of pip is available: 23.2.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip


## Environment setup
Before executing the following cells, make sure to set the following environment variables in the `.env` file or export them:
* `AZURE_OPENAI_KEY`
* `AZURE_OPENAI_ENDPOINT`
* `MODEL_DEPLOYMENT_NAME`
* `EMBEDDING_DEPLOYMENT_NAME`
* `COHERE_API_KEY` (create one [here](https://dashboard.cohere.com/api-keys))

<br/>
<img src="../assets/keys_endpoint.png" width="800"/>

In [2]:
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())  # read local .env file

## Merge Retriever - Lord of All the Retrievers (LOTR)

Lord of the Retrievers, also known as `MergerRetriever`, takes a list of retrievers as input and merges the results of their `get_relevant_documents()` methods into a single list. The merged results will be a list of documents that are relevant to the query and that have been ranked by the different retrievers.

The `MergerRetriever` class can be used to improve the accuracy of document retrieval in a number of ways.

First, it can combine the results of multiple retrievers, which can help to reduce the risk of bias in the results. Second, it can rank the results of the different retrievers, which can help to ensure that the most relevant documents are returned first.

In [34]:
import os
import torch
from langchain.embeddings import HuggingFaceBgeEmbeddings
from langchain_openai import AzureOpenAIEmbeddings


openai_api_version = "2023-08-01-preview"

openai_embedding = AzureOpenAIEmbeddings(
    deployment=os.getenv('EMBEDDING_DEPLOYMENT_NAME'),
    openai_api_version=openai_api_version,
)

device = "cuda" if torch.cuda.is_available() else "cpu"
hf_bge_embeddings = HuggingFaceBgeEmbeddings(
    model_name="BAAI/bge-large-en",
    model_kwargs={"device": device},
    encode_kwargs = {'normalize_embeddings': False}
)

## Create the Data Stores
We will be using two data sources:
- langchain documentation, scraped in the following cell
- langchain blogposts (same used in the previous notebook)

The first one will be embedded using text-embeddings-ada-002 (OpenAI), while the second one with an open-source model (`bge-large-en`).

In [4]:
from langchain_docs_loader import load_langchain_docs_splitted
# This might be slow the very first time you run it. Hang in there!
langchain_docs = load_langchain_docs_splitted()

In [5]:
print(len(langchain_docs))

3139


### First retriever

In [6]:
from langchain.vectorstores import Chroma

persist_directory = '../data/embeddings_aoi_docs/'

# This is going to be slow the first time you run it (it is embedding 3139 chunks, in batches of 16).
if not os.path.exists(persist_directory):
    vectordb_aoi = Chroma.from_documents(
        documents=langchain_docs,
        embedding=openai_embedding,
        persist_directory=persist_directory
    )
    vectordb_aoi.persist()
else:
    # load from disk
    vectordb_aoi = Chroma(
        embedding_function=openai_embedding,
        persist_directory=persist_directory
    )

### Second retriever

In [7]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import DirectoryLoader, TextLoader

text_loader_kwargs={'autodetect_encoding': True}

loader = DirectoryLoader(
    '../data/langchain_blog_posts/',
    glob="**/*.txt",
    loader_cls=TextLoader,
    loader_kwargs=text_loader_kwargs,
    #silent_errors=True
)

documents = loader.load()

#splitting the text into
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200
)
texts = text_splitter.split_documents(documents)

In [35]:
len(texts)

429

In [8]:
persist_directory = '../data/embeddings_hf_bge_docs/'

# This is going to be slow the first time you run it, unless you have a GPU supporting CUDA.
# This is encoding the same texts used for the reranking notebook, but with a different embedding model.
if not os.path.exists(persist_directory):
    vectordb_hf = Chroma.from_documents(
        documents=texts,
        embedding=hf_bge_embeddings,
        persist_directory=persist_directory
    )
    vectordb_hf.persist()
else:
    # load from disk
    vectordb_hf = Chroma(
        embedding_function=hf_bge_embeddings,
        persist_directory=persist_directory
    )

## Merge the embeddings of the documents


In [9]:
from langchain.retrievers import MergerRetriever

n_docs = 10
retriever_aoi = vectordb_aoi.as_retriever(
    search_type = "similarity",
    search_kwargs = {"k": n_docs}
)

retriever_hf = vectordb_hf.as_retriever(
    search_type = "similarity",
    search_kwargs = {"k": n_docs}
)
lotr = MergerRetriever(retrievers=[retriever_aoi, retriever_hf])

In [None]:
query = "What is the difference between a MergerRetriever and EnsembleRetriever in langchain?"
docs = lotr.get_relevant_documents(query)
docs

# Remove duplicate information and reorder with LITM (Lost in the Middle).

After merging the indexes, we perform the following three transformations:
- Remove duplicate information
- Reorder the results with `LongContextReorder`, which implements LITM (Lost in the Middle)[^1]
- Rerank the results with Cohere ranker.

[^1]: LITM is a method for reordering the results of a search engine to improve the relevance of the results. It is based on the idea that the most relevant results are likely to be found in the middle of the list of results, rather than at the beginning or the end. The method works by reordering the results so that the most relevant results are in the middle of the list, and the least relevant results are at the beginning and the end. ([paper](https://arxiv.org/abs//2307.03172))

In [31]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import DocumentCompressorPipeline
from langchain_community.document_transformers import EmbeddingsRedundantFilter, LongContextReorder

# We can remove redundant results from both retrievers using yet another embedding.
# Using multiples embeddings in diff steps could help reduce biases.
filter_emb = EmbeddingsRedundantFilter(embeddings=openai_embedding)
reorder = LongContextReorder()

pipeline = DocumentCompressorPipeline(transformers=[filter_emb, reorder])

compression_retriever = ContextualCompressionRetriever(
    base_compressor=pipeline, base_retriever=lotr
)

## Question-Answering

In [32]:
from langchain_core.prompts import PromptTemplate
from langchain_openai import AzureChatOpenAI
from langchain.chains import RetrievalQA, LLMChain

llm = AzureChatOpenAI(
    deployment_name=os.getenv('MODEL_DEPLOYMENT_NAME'),
    openai_api_version=openai_api_version,
    temperature=0.,
    max_tokens=1024
)


qa_endpoint = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type='stuff',
    retriever=compression_retriever
)

template = """User question: {question}

possible Answer: Provide the best answer refering to the context."""

prompt = PromptTemplate(
    template=template,
    input_variables=["question"]
)

llm_chain = LLMChain(
    prompt=prompt,
    llm=llm,
    verbose=True
)

In [33]:
llm_chain.invoke(query)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mUser question: What is the difference between a MergerRetriever and EnsembleRetriever in langchain?

possible Answer: Provide the best answer refering to the context.[0m

[1m> Finished chain.[0m


{'question': 'What is the difference between a MergerRetriever and EnsembleRetriever in langchain?',
 'text': "In Langchain, a MergerRetriever and EnsembleRetriever are both retrievers used for information retrieval tasks, but they have different functionalities.\n\nA MergerRetriever is a retriever that combines multiple retrievers into a single retriever. It takes the outputs of multiple retrievers and merges them to provide a unified set of results. This can be useful when you want to leverage the strengths of different retrievers and get a more comprehensive set of information.\n\nOn the other hand, an EnsembleRetriever is a retriever that combines multiple retrievers by aggregating their scores. Instead of merging the results, it assigns weights to each retriever's scores and combines them to produce a final ranking. This approach allows you to benefit from the different strengths of each retriever while still maintaining a single ranked list of results.\n\nIn summary, a MergerRetr