# **Maximal Marginal Relevance**
MMR (Maximal Marginal Relevance) is a powerful diversity-aware retrieval technique used in information retrieval and RAG pipelines to balance relevance and novelty when selecting documents.

In [None]:
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chat_models import init_chat_model
from langchain.prompts import PromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain

In [None]:
import os
from dotenv import load_dotenv
load_dotenv()
os.environ["OPENAI_API_KEY"]=os.getenv("OPENAI_API_KEY")
os.environ["GROQ_API_KEY"]=os.getenv("GROQ_API_KEY")

In [9]:
# Step 1: Load and chunk the document
loader = TextLoader("langchain_rag_dataset.txt")
raw_docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = splitter.split_documents(raw_docs)
chunks

[Document(metadata={'source': 'langchain_rag_dataset.txt'}, page_content='LangChain is an open-source framework designed to simplify the development of applications using large language models (LLMs).\nLangChain provides abstractions for working with prompts, chains, memory, and agents, making it easier to build complex LLM-based systems.'),
 Document(metadata={'source': 'langchain_rag_dataset.txt'}, page_content='The framework supports integration with various vector databases like FAISS and Chroma for semantic retrieval.\nLangChain enables Retrieval-Augmented Generation (RAG) by allowing developers to fetch relevant context before generating responses.'),
 Document(metadata={'source': 'langchain_rag_dataset.txt'}, page_content='Memory in LangChain helps models retain previous interactions, making multi-turn conversations more coherent.\nAgents in LangChain can use tools like calculators, search APIs, or custom functions based on the instructions they receive.'),
 Document(metadata={'

In [None]:
# Step 2: FAISS Vector Store with HuggingFace Embeddings
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(chunks, embedding_model)

In [11]:
### Step 3: Create MMR Retirever
retriever=vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"k":3}
)

In [None]:
# Step 4: Prompt and LLM
prompt = PromptTemplate.from_template("""
Answer the question based on the context provided.

Context:
{context}

Question: {input}
""")
llm=init_chat_model("groq:gemma2-9b-it")

In [12]:
# Step 5: RAG Pipeline
document_chain = create_stuff_documents_chain(llm=llm, prompt=prompt)
rag_chain = create_retrieval_chain(retriever=retriever, combine_docs_chain=document_chain)

In [13]:
# Step 6: Query
query = {"input": "How does LangChain support agents and memory?"}
response = rag_chain.invoke(query)

print("✅ Answer:\n", response["answer"])

✅ Answer:
 LangChain supports agents by allowing LLMs to act as decision-makers that choose and use tools like calculators, search APIs, or custom functions based on given instructions. 

It supports memory in two ways:

* **Conversational memory:** Using `ConversationBufferMemory`, agents can retain previous interactions within a conversation, enabling more coherent multi-turn dialogues.
* **Summarization memory:**  `ConversationSummaryMemory` allows agents to remember key points from past interactions, providing a condensed view of the conversation history. 


LangChain also enables agents to interact with external APIs and databases, further expanding their capabilities. 




In [14]:
response

{'input': 'How does LangChain support agents and memory?',
 'context': [Document(id='27a1531f-1d37-4619-ad03-70b21cdf56b2', metadata={'source': 'langchain_rag_dataset.txt'}, page_content='Memory in LangChain helps models retain previous interactions, making multi-turn conversations more coherent.\nAgents in LangChain can use tools like calculators, search APIs, or custom functions based on the instructions they receive.'),
  Document(id='fcd7afd5-b72a-4822-8ebe-c75d4e9250e0', metadata={'source': 'langchain_rag_dataset.txt'}, page_content='LangChain agents can interact with external APIs and databases, enhancing the capabilities of LLM-powered applications.\nRAG pipelines in LangChain involve document loading, splitting, embedding, retrieval, and LLM-based response generation.'),
  Document(id='e755e7c1-f863-4f8c-b97b-7b6d7281e29f', metadata={'source': 'langchain_rag_dataset.txt'}, page_content='LangChain allows LLMs to act as agents that decide which tool to call and in what order duri

## **Maximal Marginal Relevance (MMR)**


**Maximal Marginal Relevance (MMR)** is a powerful algorithm used in **information retrieval** and **RAG (Retrieval-Augmented Generation)** systems to improve the **diversity and relevance** of search results.
It ensures that the selected documents are **highly relevant to the query** while also being **minimally redundant** with each other.

In essence, MMR strikes a balance between **query relevance** and **result diversity**, which helps avoid repetition and provides the **most informative subset of documents** for downstream tasks like question answering or summarization.

---

**1. The Intuition Behind MMR**

Imagine you retrieve 100 similar documents for the query:

> “What are the effects of climate change on agriculture?”

Many of them might contain overlapping information (e.g., about crop yields). MMR ensures you get **a variety of perspectives** — for example:

* Crop yield impacts,
* Soil degradation,
* Pest proliferation,
* Water scarcity.

By doing so, MMR avoids redundancy and ensures **comprehensive coverage** of the topic.

---

**2. The MMR Formula**

The MMR score for a document *D<sub>i</sub>* is computed as:

$$\text{MMR} = \arg\max_{D_i \in R \setminus S} \left[ \lambda \cdot \text{Sim}(D_i, Q) - (1 - \lambda) \cdot \max_{D_j \in S} \text{Sim}(D_i, D_j) \right]$$

Where:

| Symbol          | Meaning                                                                                               |
| --------------- | ----------------------------------------------------------------------------------------------------- |
| **Q**           | The query or question                                                                                 |
| **R**           | The set of all retrieved documents                                                                    |
| **S**           | The set of already selected documents                                                                 |
| **Sim(Dᵢ, Q)**  | Similarity between document *Dᵢ* and query *Q* (relevance score)                                      |
| **Sim(Dᵢ, Dⱼ)** | Similarity between document *Dᵢ* and another document *Dⱼ* (redundancy)                               |
| **λ (lambda)**  | Trade-off parameter (0 ≤ λ ≤ 1):<br>• High λ → prioritize relevance<br>• Low λ → prioritize diversity |

---

**3. How MMR Works Step-by-Step**

1. **Compute relevance scores** between all documents and the query.
2. **Select the first document** with the highest relevance score.
3. **Iteratively select next documents** by maximizing the MMR equation — i.e., choosing documents that are still relevant but add new information compared to already selected ones.
4. Continue until you have the top *K* diverse documents.

---

**4. Benefits of Using MMR**

* ✅ **Reduces Redundancy:** Prevents repeating nearly identical documents.
* ✅ **Improves Context Variety:** Useful for summarization or RAG contexts where multiple viewpoints matter.
* ✅ **Balances Relevance and Novelty:** Adjustable via λ.
* ✅ **Lightweight and Model-Agnostic:** Works with any embedding model or retriever.

---

**5. Example: Implementing MMR with LangChain**

Here’s how MMR is used in a LangChain retriever:

```python
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings

# Create embeddings and vector store
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(
    ["Doc 1 text...", "Doc 2 text...", "Doc 3 text..."],
    embedding=embeddings
)

# Create an MMR retriever
retriever = vectorstore.as_retriever(
    search_type="mmr",  # Use Maximal Marginal Relevance
    search_kwargs={"k": 5, "fetch_k": 20, "lambda_mult": 0.7}
)

# Query the retriever
results = retriever.get_relevant_documents("climate change and agriculture")
for i, doc in enumerate(results, 1):
    print(f"{i}. {doc.page_content}")
```

**Parameters Explained:**

* `fetch_k`: Number of top candidates retrieved before MMR filtering.
* `k`: Final number of documents returned after applying MMR.
* `lambda_mult`: Controls the relevance–diversity trade-off.

---

**6. MMR in RAG Pipelines**

In **Retrieval-Augmented Generation (RAG)**, MMR helps select documents that:

* Are **semantically relevant** to the user’s query.
* Provide **non-redundant context** to the language model.

This improves the **completeness and factual quality** of generated responses by covering multiple subtopics or perspectives.

Example LangChain RAG setup:

```python
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

response = qa_chain.invoke({"query": "Impacts of climate change on agriculture"})
print(response["result"])
```

---

**7. Tuning the λ (Lambda) Parameter**

| λ Value       | Effect                    | Use Case                         |
| ------------- | ------------------------- | -------------------------------- |
| **0.9 – 1.0** | Focus purely on relevance | FAQ search, exact query matching |
| **0.6 – 0.8** | Balanced trade-off        | General RAG or QA systems        |
| **0.3 – 0.5** | Emphasize diversity       | Summarization, topic exploration |
| **0.0 – 0.2** | Focus purely on novelty   | Brainstorming, idea generation   |

---

**8. Comparison: MMR vs. Reranking**

| Feature        | MMR                             | Reranking                              |
| -------------- | ------------------------------- | -------------------------------------- |
| **Purpose**    | Promote diversity among results | Improve accuracy of relevance          |
| **Algorithm**  | Balances relevance & redundancy | Re-scores candidates using deep models |
| **Efficiency** | Lightweight and fast            | Computationally expensive              |
| **Use Case**   | RAG, summarization              | Precision-focused retrieval            |

They can even be **combined** — first use MMR to pick diverse candidates, then apply **reranking** (e.g., cross-encoder) to fine-tune the final order.

---

**9. Real-World Applications**

* **Conversational AI:** Prevents the model from repeating similar context passages.
* **Document Summarization:** Ensures multiple aspects of a topic are represented.
* **Recommendation Systems:** Promotes content diversity for better user engagement.
* **Legal and Research Domains:** Retrieves unique precedents or study cases.

---

**10. Key Takeaways**

* MMR = **Relevance - Redundancy**
* It’s ideal for systems that need **variety + relevance** (like RAG).
* It enhances both **user experience** and **model performance** by presenting the most **useful, non-repetitive** information.
* Can be tuned easily and combined with reranking or hybrid retrieval pipelines for **state-of-the-art retrieval quality**.

---

**Summary Formula:**

$$\boxed{\text{MMR} = \lambda \cdot \text{Relevance} - (1-\lambda) \cdot \text{Redundancy}}$$


