### 📖 Where We Are

**In the previous notebooks**, we explored powerful retrieval strategies like **Hybrid Search** (combining semantic and keyword search) and **Reranking** (using an LLM to refine results). These techniques help improve the *relevance* and *precision* of our retrieved documents.

**In this notebook**, we'll tackle another common challenge in retrieval: **redundancy**. We will learn how to use **Maximal Marginal Relevance (MMR)**, a sophisticated retrieval method designed to increase the **diversity** of the documents we retrieve, providing the LLM with a broader and more comprehensive context.

### 1. Maximal Marginal Relevance (MMR)

**MMR (Maximal Marginal Relevance)** is a powerful diversity-aware retrieval technique. It's designed to solve the problem where a standard similarity search returns multiple chunks that are all very similar to each other, offering little new information.

MMR works by selecting documents that are both:
1.  **Relevant** to the user's query.
2.  **Diverse** (dissimilar) from the documents that have already been selected.

This ensures the final context passed to the LLM is comprehensive and covers different aspects of the topic, rather than repeating the same point.

In [None]:
# --- LangChain Imports ---
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chat_models import init_chat_model
from langchain.prompts import PromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain

In [None]:
# --- Environment Setup ---
import os
from dotenv import load_dotenv
load_dotenv()
os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")

In [None]:
# --- 1. Load and Chunk Documents ---
# This dataset contains several related but distinct points about LangChain.
loader = TextLoader("langchain_rag_dataset.txt")
raw_docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = splitter.split_documents(raw_docs)

In [None]:
# --- 2. Create Vector Store ---
# We'll use a FAISS vector store for its speed in this example.
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(chunks, embedding_model)

### 3. Creating a Standard vs. MMR Retriever
To see the effect of MMR, we will create two retrievers: a standard similarity retriever and an MMR retriever. This will allow us to directly compare their outputs.

In [None]:
# --- Standard Similarity Retriever ---
standard_retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

# --- MMR Retriever ---
mmr_retriever = vectorstore.as_retriever(
    # To enable MMR, we set the search_type to "mmr".
    search_type="mmr",
    search_kwargs={
        "k": 4,          # The final number of documents to return.
        "fetch_k": 10,   # The number of documents to initially fetch for MMR to consider.
        "lambda_mult": 0.5 # The diversity factor (0 for max diversity, 1 for max relevance).
    }
)

### 4. Comparing the Retrieval Results

In [None]:
# Define our query.
query = "How does LangChain support agents and memory?"

# Retrieve with standard similarity search.
standard_results = standard_retriever.invoke(query)

# Retrieve with MMR search.
mmr_results = mmr_retriever.invoke(query)

print("--- STANDARD SIMILARITY RESULTS ---")
for i, doc in enumerate(standard_results):
    print(f"{i+1}. {doc.page_content}\n")

print("\n--- MMR RESULTS ---")
for i, doc in enumerate(mmr_results):
    print(f"{i+1}. {doc.page_content}\n")

### 5. Building the RAG Pipeline with the MMR Retriever
Now, we can simply plug our `mmr_retriever` into our standard RAG chain to leverage its diverse results for answer generation.

In [None]:
# --- Prompt and LLM Setup ---
prompt = PromptTemplate.from_template("""
Answer the question based on the context provided.

Context:
{context}

Question: {input}
""")
llm = init_chat_model("groq:gemma2-9b-it")

In [None]:
# --- RAG Pipeline with MMR Retriever ---
document_chain = create_stuff_documents_chain(llm=llm, prompt=prompt)
# We use our `mmr_retriever` here.
rag_chain = create_retrieval_chain(retriever=mmr_retriever, combine_docs_chain=document_chain)

In [None]:
# --- Query the RAG Chain ---
response = rag_chain.invoke({"input": query})

print("✅ Final Answer:\n", response["answer"])
print("\n📚 Context Used:")
for doc in response['context']:
    print(f"- {doc.page_content}\n")

### 🔑 Key Takeaways

* **MMR Fights Redundancy**: The primary goal of Maximal Marginal Relevance (MMR) is to reduce redundancy in search results and increase the diversity of the retrieved documents.
* **Balances Relevance and Diversity**: MMR works by iteratively selecting documents that are both relevant to the query and different from the documents already selected, providing a more comprehensive context.
* **Easy to Implement in LangChain**: You can enable MMR in most vector store retrievers by simply setting `search_type="mmr"`.
* **Key Parameters to Tune**: You can control MMR's behavior with `k` (the final number of documents), `fetch_k` (the initial number of candidates to consider), and `lambda_mult` (the diversity factor).
* **Improves Answer Quality**: By providing the LLM with a broader, less repetitive set of facts, MMR can lead to more complete and insightful answers in a RAG system.