# Maximal Marginal Relevance (MMR) Retriever

Purpose: Demonstrate using MMR (Maximal Marginal Relevance) to retrieve diverse and relevant documents from a vector store. MMR helps balance relevance with diversity so top results are not redundantly similar.

When to use MMR:
- You want diverse results (e.g., supporting documents from different angles).  
- You want to avoid multiple near-duplicate passages in the top-k results.

## Prerequisites & install

- Install required packages (first cell).  
- For FAISS, `faiss-cpu` is required for CPU-only environments.

In [None]:
!pip install langchain chromadb openai tiktoken pypdf langchain_google_genai langchain-community wikipedia

In [74]:
from google.colab import userdata
gemini_api_key = userdata.get('GEMINI_API_KEY')

In [None]:
!pip install faiss-cpu

## Imports & data

This cell imports FAISS, embeddings, and the Document type. Next we define sample documents to index.

In [76]:
from langchain_community.vectorstores import FAISS
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_core.documents import Document

In [77]:
# Step 1: Your source documents
documents = [
    Document(page_content="Football is the most popular sport in the world, played by over 250 million players across more than 200 countries."),
    Document(page_content="Lionel Messi is known for his incredible dribbling, vision, and goal-scoring ability, earning multiple Ballon d'Or awards."),
    Document(page_content="The FIFA World Cup is held every four years and is the most prestigious international football tournament."),
    Document(page_content="Tactics in football involve formations, pressing strategies, and player roles that determine how a team controls the game."),
    Document(page_content="Cristiano Ronaldo is a legendary footballer celebrated for his athleticism, goal-scoring, and leadership on the field."),
    Document(page_content="Football clubs like FC Barcelona, Real Madrid, and Manchester United have millions of fans worldwide."),
    Document(page_content="The UEFA Champions League is an annual club competition that brings together the best European teams."),
    Document(page_content="Youth development programs and football academies play a key role in nurturing the next generation of football stars."),
]


## Create vector store

Build the vector index (FAISS) from the documents using embeddings. This index will be used with MMR-enabled retrieval.

In [78]:
# Step 2: Initialize embedding model
embeddings = GoogleGenerativeAIEmbeddings(
    model="models/gemini-embedding-001",
    google_api_key=gemini_api_key
)


In [79]:
# Step 3: Create Chroma vector store in memory
vectorstore = FAISS.from_documents(
    documents=documents,
    embedding=embeddings
)

## Enable MMR on the retriever

Set `search_type="mmr"` and `search_kwargs` (`k` and `lambda_mult`) to tune tradeoff between relevance and diversity. Lower `lambda_mult` gives more diversity.

In [80]:
# Enable MMR in the retriever
retriever = vectorstore.as_retriever(
    search_type="mmr",                   # <-- This enables MMR
    search_kwargs={"k": 3, "lambda_mult": 0.5}  # k = top results, lambda_mult = relevance-diversity balance
)

## Example query & results

Run a query and compare MMR results (`retriever.invoke`) with standard similarity search (`vectorstore.similarity_search`). Observe diversity differences.

In [81]:
query = "Who are some of the most famous football players and what are their achievements?"

In [82]:
results = retriever.invoke(query)

In [83]:
for i, doc in enumerate(results):
    print(f"\n--- Result {i+1} ---")
    print(doc.page_content)


--- Result 1 ---
Youth development programs and football academies play a key role in nurturing the next generation of football stars.

--- Result 2 ---
Cristiano Ronaldo is a legendary footballer celebrated for his athleticism, goal-scoring, and leadership on the field.

--- Result 3 ---
Football clubs like FC Barcelona, Real Madrid, and Manchester United have millions of fans worldwide.


## Compare with similarity search

Use `vectorstore.similarity_search` to compare the top-k results without MMR and observe redundancy vs diversity.

In [84]:
results = vectorstore.similarity_search(query, k=2)

In [85]:
for i, doc in enumerate(results):
    print(f"\n--- Result {i+1} ---")
    print(doc.page_content)


--- Result 1 ---
Youth development programs and football academies play a key role in nurturing the next generation of football stars.

--- Result 2 ---
Football is the most popular sport in the world, played by over 250 million players across more than 200 countries.
