# Merger Retriever a.k.a. Lord of the Retrievers (LOTR)

https://python.langchain.com/v0.2/docs/integrations/retrievers/merger_retriever/

* Takes a list of retrievers
* Merges the results into a single list

The startegy improves the accuracy.

* Reduces the risk of bias
* Ranks the results

#### Demo:
* ChromaDB vector store
* WikipediaRetriever
* RedundantFilterCompressor
* LongContet

**PS:** Requires the Wikipedia package
https://python.langchain.com/v0.1/docs/integrations/retrievers/wikipedia/

## Import packages

In [1]:
from langchain_community.document_loaders import DirectoryLoader
from langchain_core.documents import Document
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Retrievers & transformers
from langchain_community.document_transformers import EmbeddingsRedundantFilter
from langchain.retrievers.document_compressors import DocumentCompressorPipeline
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers import MergerRetriever
from langchain_community.document_transformers import EmbeddingsClusteringFilter
from langchain_community.document_transformers import LongContextReorder

# Embeddings
from langchain_cohere import CohereEmbeddings
from langchain_community.document_loaders import DirectoryLoader
from langchain_core.documents import Document
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_text_splitters import RecursiveCharacterTextSplitter

## 1. Create an LLM
The LLM will be used by the retrievers

* Cohere command model
* Cohere embedding model

#### Note
* You must adjust the location of the API key file

In [2]:
from dotenv import load_dotenv
import sys
import json

# Load the file that contains the API keys - OPENAI_API_KEY
load_dotenv('C:\\Users\\raj\\.jupyter\\.env')

# setting path
sys.path.append('../')

from utils.create_chat_llm import create_gpt_chat_llm, create_cohere_chat_llm

# Try with GPT
llm = create_cohere_chat_llm()

llm_embeddings = CohereEmbeddings()

## 2. Utility function

* Prints the size information
* Pretty prints the documents

In [3]:
def print_documents(docs):
    for i, doc in enumerate(docs):
        print("#",i)
        print(doc.page_content)

def dump_results_info(result):
    print("Doc count = ", len(result))
    page_content_length=0
    for doc in result:
        page_content_length = page_content_length + len(doc.page_content)
    print("Context size = ", page_content_length)
    print_documents(result)

## 3. Setup VectorDB retrievers

* Create 2 vector stores with different chunks sizes (200 & 500)
* using ChromaDB as a retriever 

#### Chunk size = 200, search_type=similarity

In [4]:
# Create the Chroma vector store #1 
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
vector_store_1 = Chroma(collection_name="rag_documents", embedding_function=embedding_function) 

# Load sample docs
loader = DirectoryLoader('./util', glob="**/*.txt")
docs = loader.load()

# Chunking
doc_splitter_1 = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)
chunked_documents_1 = doc_splitter_1.split_documents(docs)

# Add to vector DB
vector_store_1.add_documents(chunked_documents_1)

# Base retrievers
vector_store_retriever_1 = vector_store_1.as_retriever(search_type="similarity", search_kwargs={"k": 5})

#### Chunk size = 500, search_type=mmr

In [5]:
# Create the Chroma vector store #2 
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
vector_store_2 = Chroma(collection_name="rag_documents", embedding_function=embedding_function) 

# Chunking
doc_splitter_2 = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunked_documents_2 = doc_splitter_2.split_documents(docs)

# Add to vector DB
vector_store_2.add_documents(chunked_documents_2)

# Base retrievers
vector_store_retriever_2 = vector_store_2.as_retriever(search_type="mmr", search_kwargs={"k": 5})

## 4. Wikipedia Retriever

* Use Wikipedia to get the documents of interest

In [6]:
from langchain_community.retrievers import WikipediaRetriever

wikipedia_retriever = WikipediaRetriever(search_kwargs={"k": 5}) 

## 5. Combine retrievers using Merger Retriver

https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.merger_retriever.MergerRetriever.html


In [7]:

# The 2 vector retriever and 1 wiki retriever
merger_retriever = MergerRetriever(retrievers=[vector_store_retriever_1,vector_store_retriever_2, wikipedia_retriever])


## 6. Apply document compressor

https://api.python.langchain.com/en/latest/document_transformers/langchain_community.document_transformers.embeddings_redundant_filter.EmbeddingsClusteringFilter.html

In [12]:

# Create embedding clustering filter
filter_ordered_by_retriever = EmbeddingsClusteringFilter(
    embeddings=llm_embeddings,
    num_clusters=5,
    num_closest=1,
    sorted=True,
)

# Create document compressor pipeline
pipeline = DocumentCompressorPipeline(transformers=[filter_ordered_by_retriever])

# Create compression retriever
compression_retriever = ContextualCompressionRetriever(
    base_compressor=pipeline, base_retriever=merger_retriever
)

## 7. Test

##### Output from merger_retriever

In [13]:
question = "what is rag in generative ai?"
bef = merger_retriever.invoke(question)
dump_results_info(bef)

Doc count =  13
Context size =  9672
# 0
Retrieval augmented generation (RAG)

Retrieval augmented generation, or RAG, helps ensure model outputs are grounded on your data. Instead of relying on the model’s training knowledge, AI apps architected for RAG can search your data for information relevant to a query, then pass that information into the prompt. This is similar to prompt engineering, except that the system can find and retrieve new context from your data with each interaction.
# 1
Retrieval augmented generation (RAG)

Retrieval augmented generation, or RAG, helps ensure model outputs are grounded on your data. Instead of relying on the model’s training knowledge, AI apps architected for RAG can search your data for information relevant to a query, then pass that information into the prompt. This is similar to prompt engineering, except that the system can find and retrieve new context from your data with each interaction.
# 2
Prompt engineering is the process of structuring an

##### Output from pipeline

In [14]:
aft = compression_retriever.invoke(question)

dump_results_info(aft)

Doc count =  5
Context size =  4836
# 0
Prompt engineering is the process of structuring an instruction that can be interpreted and understood by a generative AI model. A prompt is natural language text describing the task that an AI should perform.
A prompt for a text-to-text language model can be a query such as "what is Fermat's little theorem?", a command such as "write a poem about leaves falling", or a longer statement including context, instructions, and conversation history. Prompt engineering may involve phrasing a query, specifying a style, providing relevant context or assigning a role to the AI such as "Act as a native French speaker". A prompt may include a few examples for a model to learn from, such as asking the model to complete "maison → house, chat → cat, chien →" (the expected response being dog), an approach called few-shot learning.
When communicating with a text-to-image or a text-to-audio model, a typical prompt is a description of a desired output such as "a hi

