# Multi-Query Retriever (LangChain)

Purpose: show how to use `MultiQueryRetriever` together with a vector store to retrieve diverse, targeted results by decomposing a query into multiple sub-queries.

This notebook demonstrates:
- Preparing documents with metadata,  
- Creating embeddings and a FAISS index,  
- Building similarity and multi-query retrievers,  
- Comparing results.

Run cells top-to-bottom.

In [None]:
!pip install langchain chromadb openai tiktoken pypdf langchain_google_genai langchain-community wikipedia

## Prerequisites

- Install packages (first cell).  
- Provide `GEMINI_API_KEY` or replace with your preferred embeddings/LLM provider.
- `faiss-cpu` is recommended for CPU-only environments.

In [100]:
from google.colab import userdata
gemini_api_key = userdata.get('GEMINI_API_KEY')

In [None]:
!pip install faiss-cpu

## Imports & libraries

This cell imports FAISS, embeddings, Document, MultiQueryRetriever and the LLM used to generate sub-queries for the multi-query retriever.

In [102]:
from langchain_community.vectorstores import FAISS
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_core.documents import Document
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_google_genai import ChatGoogleGenerativeAI

## Step 1 — Documents with metadata

Store metadata-rich documents (topic, source, year) so you can filter or rank by metadata in downstream flows.

In [103]:
# Step 1: Your source documents with metadata
documents = [
    Document(
        page_content="Football is the most popular sport in the world, played by over 250 million players across more than 200 countries.",
        metadata={"topic": "General", "source": "FIFA Stats", "year": 2023}
    ),
    Document(
        page_content="Lionel Messi is known for his incredible dribbling, vision, and goal-scoring ability, earning multiple Ballon d'Or awards.",
        metadata={"topic": "Player", "player": "Lionel Messi", "source": "Sports Illustrated", "year": 2022}
    ),
    Document(
        page_content="The FIFA World Cup is held every four years and is the most prestigious international football tournament.",
        metadata={"topic": "Tournament", "name": "FIFA World Cup", "source": "FIFA", "year": 2022}
    ),
    Document(
        page_content="Tactics in football involve formations, pressing strategies, and player roles that determine how a team controls the game.",
        metadata={"topic": "Tactics", "source": "Coaching Manual", "year": 2021}
    ),
    Document(
        page_content="Cristiano Ronaldo is a legendary footballer celebrated for his athleticism, goal-scoring, and leadership on the field.",
        metadata={"topic": "Player", "player": "Cristiano Ronaldo", "source": "ESPN", "year": 2022}
    ),
    Document(
        page_content="Football clubs like FC Barcelona, Real Madrid, and Manchester United have millions of fans worldwide.",
        metadata={"topic": "Club", "source": "BBC Sport", "year": 2023}
    ),
    Document(
        page_content="The UEFA Champions League is an annual club competition that brings together the best European teams.",
        metadata={"topic": "Tournament", "name": "UEFA Champions League", "source": "UEFA", "year": 2023}
    ),
    Document(
        page_content="Youth development programs and football academies play a key role in nurturing the next generation of football stars.",
        metadata={"topic": "Development", "source": "The Guardian", "year": 2021}
    ),
]


## Step 2 — Embeddings & index

Create embeddings and build a FAISS index from the documents. Replace with your provider if required.

In [104]:
# Step 2: Initialize embedding model
embeddings = GoogleGenerativeAIEmbeddings(
    model="models/gemini-embedding-001",
    google_api_key=gemini_api_key
)

In [105]:
# Step 3: Create Chroma vector store in memory
vectorstore = FAISS.from_documents(
    documents=documents,
    embedding=embeddings
)

## Step 3 — Build similarity retriever

Create a standard similarity retriever to compare with the multi-query retriever.

In [106]:
# Create retrievers
similarity_retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 5})

In [107]:
model = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",
    google_api_key=gemini_api_key
)

## Step 4 — Create MultiQueryRetriever

`MultiQueryRetriever` uses an LLM to rewrite or generate sub-queries and merges results for broader coverage. Use a capable LLM for best results.

In [108]:
multiquery_retriever = MultiQueryRetriever.from_llm(
    retriever=vectorstore.as_retriever(search_kwargs={"k": 5}),
    llm=model
)

## Query & compare

Run both the similarity retriever and multi-query retriever on the same query and compare results for coverage and diversity.

In [109]:
query = "Who are the legends of the game and what makes them famous?"

In [110]:
# Retrieve results
similarity_results = similarity_retriever.invoke(query)
multiquery_results= multiquery_retriever.invoke(query)

In [111]:
for i, doc in enumerate(similarity_results):
    print(f"\n--- Result {i+1} ---")
    print(doc.page_content)

print("*"*150)

for i, doc in enumerate(multiquery_results):
    print(f"\n--- Result {i+1} ---")
    print(doc.page_content)


--- Result 1 ---
Cristiano Ronaldo is a legendary footballer celebrated for his athleticism, goal-scoring, and leadership on the field.

--- Result 2 ---
Youth development programs and football academies play a key role in nurturing the next generation of football stars.

--- Result 3 ---
Football is the most popular sport in the world, played by over 250 million players across more than 200 countries.

--- Result 4 ---
The FIFA World Cup is held every four years and is the most prestigious international football tournament.

--- Result 5 ---
Football clubs like FC Barcelona, Real Madrid, and Manchester United have millions of fans worldwide.
******************************************************************************************************************************************************

--- Result 1 ---
Cristiano Ronaldo is a legendary footballer celebrated for his athleticism, goal-scoring, and leadership on the field.

--- Result 2 ---
Youth development programs and football ac