In [None]:
!pip install langchain chromadb faiss-cpu  tiktoken  langchain-community wikipedia sentence-transformers langchain-huggingface

Collecting langchain-huggingface
  Downloading langchain_huggingface-0.3.1-py3-none-any.whl.metadata (996 bytes)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_

#**WikipediaRetriever**

# 📚 LangChain WikipediaRetriever – Theoretical Overview

The **WikipediaRetriever** in LangChain is a built-in tool that allows your application to fetch content from **Wikipedia** dynamically using its API. It is especially useful in RAG (Retrieval-Augmented Generation) systems where factual, up-to-date knowledge is needed on-the-fly.

---

## 🚀 What is WikipediaRetriever?

- A retriever that **fetches live data from Wikipedia** instead of relying on static documents.
- It uses the `wikipedia` Python package internally to interact with Wikipedia’s search and content.
- Returns relevant summaries or full articles as **LangChain Document** objects.

---

## 🧠 Key Features

- ✅ No need to store documents or build vector indexes.
- 🌐 Real-time retrieval from Wikipedia’s current content.
- 💡 Great for answering **open-domain**, **factual**, or **historical** questions.

---

## 🛠️ How It Works (Theoretical Steps)

### 1. **Receive User Query**
- A natural language question or keyword (e.g., "What is LangChain?") is passed to the retriever.

### 2. **Search Wikipedia**
- The retriever sends the query to Wikipedia’s API.
- Wikipedia returns a list of **most relevant page titles** matching the query.

### 3. **Select Best Match**
- The retriever picks the top result (or multiple, depending on config).
- It fetches the **summary** or **full content** of the Wikipedia article.

### 4. **Create Document Objects**
- The content is formatted into LangChain `Document` objects:
  - `page_content`: the fetched text
  - `metadata`: includes title, URL, etc.

### 5. **Return to Application**
- The document is returned to the chain (e.g., `RetrievalQA`) to be used as context for answering questions.
- Can be combined with LLMs to generate responses grounded in Wikipedia content.

---

## 🧩 Integration with LangChain Chains

The WikipediaRetriever is often used with:

- `RetrievalQA`: A chain that takes a retriever and an LLM to answer user queries.
- `ConversationalRetrievalChain`: For chatbot-style interfaces with memory.

---

## 📌 Use Cases

| Use Case                        | Description                                                                 |
|----------------------------------|-----------------------------------------------------------------------------|
| Chatbots with live facts         | Answering real-time questions using Wikipedia data                         |
| Educational assistants           | Providing historical or scientific explanations                           |
| Lightweight RAG applications     | No vector DB needed; works out-of-the-box                                 |
| Research tools                   | Quickly retrieve Wikipedia content for study or writing support            |

---

## ⚖️ Pros and Cons

| Pros                              | Cons                                          |
|----------------------------------|-----------------------------------------------|
| ✅ No setup needed                | ❌ Requires internet access                   |
| ✅ Always updated knowledge       | ❌ Limited to what Wikipedia contains         |
| ✅ Fast and simple                | ❌ May return irrelevant or too general data  |

---

## 🔚 Summary

- The WikipediaRetriever is a **powerful, plug-and-play retriever** for accessing general knowledge from Wikipedia.
- It is perfect for building **quick prototypes**, **chatbots**, or **Q&A systems** without needing to maintain a document store.
- By combining it with LangChain LLM chains, you can **enhance factual accuracy** and provide informative answers.

---


In [None]:
from langchain_community.retrievers import WikipediaRetriever

In [None]:
# Initialize the retriever (optional: set language and top_k)
retriever = WikipediaRetriever(top_k_results=2, lang="en")

In [None]:

# Define your query
query = "Existence of Yeti in nepal and himalayas"

# Get relevant Wikipedia documents
docs = retriever.invoke(query)

In [None]:
docs

[Document(metadata={'title': 'Yeti', 'summary': 'The Yeti () is an ape-like creature purported to inhabit the Himalayan mountain range in Asia. In Western popular culture, the creature is commonly referred to as the Abominable Snowman. Many dubious articles have been offered in an attempt to prove the existence of the Yeti, including anecdotal visual sightings, disputed video recordings, photographs, and plaster casts of large footprints. Some of these are speculated or known to be hoaxes.\nFolklorists trace the origin of the Yeti to a combination of factors, including Sherpa folklore and misidentified fauna such as bear or yak. The Yeti is commonly compared to Bigfoot of North America, as the two subjects often have similar physical descriptions.\n\n', 'source': 'https://en.wikipedia.org/wiki/Yeti'}, page_content='The Yeti () is an ape-like creature purported to inhabit the Himalayan mountain range in Asia. In Western popular culture, the creature is commonly referred to as the Abomin

In [None]:
# Print retrieved content
i = 1
for doc in docs:
    print(f"\n--- Result {i} ---")
    print(f"Content:\n{doc.page_content[:500]}...")  # Truncate to first 500 characters
    i += 1


--- Result 1 ---
Content:
The Yeti () is an ape-like creature purported to inhabit the Himalayan mountain range in Asia. In Western popular culture, the creature is commonly referred to as the Abominable Snowman. Many dubious articles have been offered in an attempt to prove the existence of the Yeti, including anecdotal visual sightings, disputed video recordings, photographs, and plaster casts of large footprints. Some of these are speculated or known to be hoaxes.
Folklorists trace the origin of the Yeti to a combinat...

--- Result 2 ---
Content:
The 1960–61 Silver Hut expedition, formally known as the Himalayan Scientific and Mountaineering Expedition, was initiated by Edmund Hillary and Griffith Pugh with John Dienhart of World Books in America (producers of a children’s encyclopedia). The expedition lasted from September 1960 to June 1961.
In 1958 Hillary and Pugh had discussed whether Everest could be climbed without oxygen; with improved acclimatising by wintering at, say, 20

## Vector Store Retriever

# 📦 LangChain VectorStoreRetriever – Theoretical Overview

The **VectorStoreRetriever** in LangChain is a powerful component that enables your application to retrieve the most relevant documents based on **vector similarity search**. It is typically used in **RAG pipelines** where the documents are first embedded and stored in a **vector database**.

---

## 🧠 What is VectorStoreRetriever?

- A retriever that searches **pre-embedded documents** in a vector database (e.g., FAISS, Chroma, Pinecone).
- It finds the most semantically similar chunks to a given query using **cosine similarity**, **dot product**, or other distance metrics.
- Ideal for use cases where you have a **custom document store** such as PDFs, notes, or transcripts.

---

## 🔧 How It Works – Step-by-Step

### 1. **Document Ingestion**
- Raw documents (PDFs, text, websites, etc.) are loaded.
- These are split into chunks (using `TextSplitters`) for better granularity.
- Each chunk is embedded into a **vector representation** using an **embedding model** (e.g., OpenAI, HuggingFace, etc.).
- These vectors are stored in a **VectorStore** like FAISS, Chroma, or others.

---

### 2. **Retriever Initialization**
- A `VectorStoreRetriever` is created from the vector store.
- This retriever knows how to **search for vectors** most similar to a query.

---

### 3. **Query Execution**
- When a query is made (e.g., "Explain reinforcement learning"), the retriever:
  - Embeds the query using the same embedding model used during storage.
  - Searches the vector store for the top-k most similar document vectors.

---

### 4. **Return Relevant Documents**
- The top-k documents are returned as LangChain `Document` objects.
- These can then be fed into an LLM as context for generating accurate answers.

---

## 🧩 Integration with LangChain Chains

VectorStoreRetriever is usually plugged into:

- `RetrievalQA` – For question-answering from documents.
- `ConversationalRetrievalChain` – For chatbots with memory + vector search.
- `MultiQueryRetriever`, `ContextualCompressionRetriever` – For smarter retrieval.

---

## 🧰 Commonly Used Vector Stores

| Vector Store | Description                      |
|--------------|----------------------------------|
| FAISS        | Fast, local vector DB (free)     |
| Chroma       | Lightweight open-source DB       |
| Pinecone     | Scalable cloud vector DB         |
| Weaviate     | AI-native vector DB with schema  |
| Qdrant       | Open-source, REST-ready vector DB|

---

## ✅ Use Cases

| Use Case                 | Description                                                    |
|--------------------------|----------------------------------------------------------------|
| Custom RAG chatbot       | Querying your own PDFs, notes, or scraped content              |
| Private document search  | Answering from internal reports, legal docs, etc.              |
| Semantic search systems  | Searching by meaning, not keywords                             |
| Domain-specific QA       | Grounding LLMs in industry-specific knowledge                  |

---

## ⚖️ Pros and Cons

| Pros                              | Cons                                         |
|----------------------------------|----------------------------------------------|
| ✅ Fast and scalable retrieval    | ❌ Requires preprocessing and setup           |
| ✅ Supports custom document data | ❌ Needs consistent embedding usage           |
| ✅ Works offline (with FAISS)     | ❌ No live knowledge like WikipediaRetriever  |

---

## 🧠 Summary

- `VectorStoreRetriever` is essential for building **custom RAG applications**.
- It provides semantic retrieval from your **own curated knowledge base**.
- Best suited for use cases where **privacy**, **custom content**, or **offline use** is required.
- Can be combined with summarization, filtering, compression, or re-ranking for improved performance.

---


In [None]:
!pip install langchain_chroma

Collecting langchain_chroma
  Downloading langchain_chroma-0.2.5-py3-none-any.whl.metadata (1.1 kB)
Downloading langchain_chroma-0.2.5-py3-none-any.whl (12 kB)
Installing collected packages: langchain_chroma
Successfully installed langchain_chroma-0.2.5


In [None]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma
from langchain_core.documents import Document

In [None]:
# Step 1: Your source documents
documents = [
    Document(page_content="LangChain helps developers build LLM applications easily."),
    Document(page_content="Chroma is a vector database optimized for LLM-based search."),
    Document(page_content="Embeddings convert text into high-dimensional vectors."),
    Document(page_content="HuggingFace provides free embedding models."),
]

In [None]:
# Step 2: Initialize embedding model
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Step 3: Create Chroma vector store in memory
vectorstore = Chroma.from_documents(
    documents=documents,
    embedding=embedding_model,
    collection_name="my_collection"
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
# Step 4: Convert vectorstore into a retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

In [None]:
query = "What is Chroma used for?"
results = retriever.invoke(query)

In [None]:
for i, doc in enumerate(results):
    print(f"\n--- Result {i+1} ---")
    print(doc.page_content)


--- Result 1 ---
Chroma is a vector database optimized for LLM-based search.

--- Result 2 ---
LangChain helps developers build LLM applications easily.


In [None]:
results = vectorstore.similarity_search(query, k=2)

In [None]:
for i, doc in enumerate(results):
    print(f"\n--- Result {i+1} ---")
    print(doc.page_content)


--- Result 1 ---
Chroma is a vector database optimized for LLM-based search.

--- Result 2 ---
LangChain helps developers build LLM applications easily.


# 🤔 Retriever vs similarity_search – When and Why to Use Each?

In LangChain, you might encounter two ways to retrieve relevant documents from a vector store:

```python
# Option 1 - Using Retriever (Runnable-compatible)
results = retriever.invoke(query)

# Option 2 - Using raw vector store
results = vectorstore.similarity_search(query, k=2)
```

Both approaches return similar results (top-k documents), but they serve different purposes depending on your application.

---

## ✅ Option 1: similarity_search()

```python
results = vectorstore.similarity_search(query, k=2)
```

- **What it is**: A low-level method provided by vector stores like FAISS, Chroma, etc.
- **How it works**: Converts the query to an embedding and performs vector similarity search.
- **Returns**: Top `k` similar documents.

### ✔️ When to use:
- You need quick and direct results.
- You're not building chains or pipelines.
- You're experimenting with one-off retrieval logic.

---

## ✅ Option 2: Retriever (e.g., retriever.invoke(query))

```python
results = retriever.invoke(query)
```

- **What it is**: A high-level abstraction in LangChain that wraps vector store retrieval logic.
- **Based on**: LangChain's `Runnable` interface.
- **Returns**: The same type of document results, but allows more flexibility.

---

## 🎯 Why Use a Retriever Instead of Direct Similarity Search?

| Reason                          | Explanation                                                                 |
|----------------------------------|-----------------------------------------------------------------------------|
| 🔁 Runnable-compatible           | Integrates directly with `LCEL` (LangChain Expression Language) pipelines. |
| 🔄 Retrieval strategy switching  | Easily switch between `similarity`, `MMR`, `filters`, or `contextual compression`. |
| 🧩 Chain-friendly                | Ideal for combining with LLMs, tools, memory, and output parsers.         |
| 🔧 Configurable                  | Supports advanced options like metadata filtering, re-ranking, etc.       |
| 📦 Standardized interface        | Makes your retrieval logic consistent and modular.                        |

---

## ⚖️ Comparison Table

| Feature / Use Case             | `similarity_search()`            | `retriever.invoke()`               |
|--------------------------------|----------------------------------|------------------------------------|
| Simple & fast                 | ✅                               | ✅                                 |
| Works in Runnable chains      | ❌                               | ✅                                 |
| Supports advanced strategies  | ❌                               | ✅ (e.g., MMR, compression)         |
| Easily extensible             | ❌                               | ✅                                 |
| Best for production pipelines | ❌                               | ✅                                 |
| Ideal for experimentation     | ✅                               | ✅                                 |

---

## 🧠 Final Thoughts

- **Use `similarity_search()`**:
  - For direct, lightweight queries.
  - When you're not building a full pipeline.
  - During testing or quick prototypes.

- **Use `retriever.invoke()`**:
  - When building full LangChain pipelines.
  - For chatbots, agents, or Retrieval-Augmented Generation (RAG).
  - When flexibility, modularity, and future extensibility matter.

LangChain is built to scale from prototypes to production — and `Retriever` is your bridge to go from "just searching" to "building intelligent apps."

---


## MMR

# 🔄 Maximal Marginal Relevance (MMR) Retriever in LangChain

**Maximal Marginal Relevance (MMR)** is an advanced retrieval technique that aims to balance **relevance** and **diversity** in the results retrieved from a vector store.

It is an alternative retrieval strategy to `similarity_search()` and helps avoid redundancy when retrieving multiple chunks or documents in response to a query.

---

## 🧠 What is MMR?

MMR stands for **Maximal Marginal Relevance**.

- It selects documents not only based on how **relevant** they are to the query but also how **different** they are from each other.
- This means it avoids retrieving **nearly identical chunks** that provide repetitive information.
- It is commonly used when you want **top-k diverse yet relevant results**.

---

## 🔧 How MMR Works – Step-by-Step

1. **Query Embedding**: Convert the user query into a vector using the embedding model.
2. **Candidate Pool**: Retrieve a larger pool of top-N most relevant documents using traditional similarity search.
3. **Diversity Filtering**:
   - Iteratively select documents that are both:
     - **Highly relevant to the query**
     - **Minimally similar to previously selected documents**
4. **Final Selection**: Return the top-k diverse results to avoid redundancy.

---

## 🟢 Why Use MMR Instead of Normal Retriever?

### ✅ Problem with Normal Retrieval:
- If your documents are split into similar or overlapping chunks, traditional similarity search (`similarity_search()` or `retriever.invoke()`) may return **very similar chunks**.
- This causes:
  - Redundant context
  - Wasted token space for LLM
  - Less informative responses

---

### ✅ MMR Fixes This By:
| Factor         | Description                                                                 |
|----------------|-----------------------------------------------------------------------------|
| Relevance      | Ensures each result is strongly related to the query.                      |
| Diversity      | Avoids repeating similar content, increasing the range of coverage.        |
| Efficiency     | Reduces overlapping information, giving LLM more unique input per token.   |

---

## 📚 Real-World Example

**Query**: "Explain the benefits of machine learning in healthcare."

### 🔴 Without MMR (Traditional Retrieval):
- Returns 3 chunks from the same paragraph of a document.
- All say similar things like “ML helps with diagnostics...”

### 🟢 With MMR:
- Returns:
  - One chunk about ML in diagnostics
  - One about ML in treatment planning
  - One about ML in administrative efficiency

Result: LLM now sees **diverse use cases** — more contextually rich response.

---

## 🧩 When to Use MMRRetriever

| Use Case                              | Why MMR Helps                                      |
|---------------------------------------|----------------------------------------------------|
| Long documents split into chunks      | Prevents repeated retrieval of overlapping chunks  |
| Multi-aspect queries                  | Surfaces different perspectives or subtopics       |
| Reducing token cost                   | Less repetition → better token usage               |
| Better answers from LLM               | LLM gets richer, non-redundant context             |

---

## ⚖️ Summary: MMR vs Normal Retrieval

| Feature                    | Normal Retriever             | MMR Retriever                        |
|----------------------------|------------------------------|--------------------------------------|
| Simple similarity          | ✅                           | ✅                                   |
| Supports diversity         | ❌                           | ✅                                   |
| Avoids redundant chunks    | ❌                           | ✅                                   |
| Better for long docs       | ❌                           | ✅                                   |
| Token-efficient            | ❌                           | ✅                                   |
| Ideal for chaining         | ✅                           | ✅                                   |

---

## 🧠 Final Thoughts

- **MMRRetriever** is useful when your document base is large, overlapping, or varied.
- It helps retrieve **non-redundant**, **diverse**, and **informative chunks** — perfect for **chatbots**, **Q&A apps**, or **summarization** tools.
- MMR makes your retrieval **smarter**, not just **similar**.

> 🔍 *Think of MMR as a way to get “more of what matters, less of what repeats.”*

---


In [None]:
# Sample documents
docs = [
    Document(page_content="LangChain makes it easy to work with LLMs."),
    Document(page_content="LangChain is used to build LLM based applications."),
    Document(page_content="Chroma is used to store and search document embeddings."),
    Document(page_content="Embeddings are vector representations of text."),
    Document(page_content="MMR helps you get diverse results when doing similarity search."),
    Document(page_content="LangChain supports Chroma, FAISS, Pinecone, and more."),
]

In [None]:
from langchain_community.vectorstores import FAISS

# Initialize OpenAI embeddings
embedding_model = HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2')

# Step 2: Create the FAISS vector store from documents
vectorstore = FAISS.from_documents(
    documents=docs,
    embedding=embedding_model
)

In [None]:
# Enable MMR in the retriever
retriever = vectorstore.as_retriever(
    search_type="mmr",                   # <-- This enables MMR
    search_kwargs={"k": 3, "lambda_mult": 0.5}  # k = top results, lambda_mult = relevance-diversity balance
)

In [None]:
query = "What is langchain?"
results = retriever.invoke(query)

In [None]:
for i, doc in enumerate(results):
    print(f"\n--- Result {i+1} ---")
    print(doc.page_content)


--- Result 1 ---
LangChain supports Chroma, FAISS, Pinecone, and more.

--- Result 2 ---
LangChain is used to build LLM based applications.

--- Result 3 ---
Embeddings are vector representations of text.


## Multiquery Retriever

# 🔁 MultiQueryRetriever in LangChain

The **MultiQueryRetriever** is a smart retriever in LangChain that improves search quality by **generating multiple diverse versions of the user’s query** and combining their retrieval results.

It’s especially useful when a single query might miss relevant documents or when you want broader context and more complete answers.

---

## 🧠 What is MultiQueryRetriever?

- A **wrapper retriever** that takes your original query and uses an LLM (like GPT) to generate **multiple semantically different queries**.
- These rewritten queries are then used to retrieve documents separately.
- The results are **combined and deduplicated** to give a richer set of context.

---

## 🔧 How MultiQueryRetriever Works – Step-by-Step

1. **User Query Input**: e.g., “What is the use of AI in medicine?”
2. **LLM Rewrites**: The LLM generates several different phrasings or perspectives of the same query, like:
   - “How is artificial intelligence applied in healthcare?”
   - “Examples of AI in medical diagnostics.”
   - “AI use cases in hospitals.”
3. **Parallel Retrieval**:
   - Each rewritten query is passed to the underlying retriever (e.g., FAISS or Chroma).
4. **Merge Results**:
   - Retrieved documents from all queries are combined.
   - Duplicates are removed.
5. **Return to Application**:
   - Final, diverse, and complete set of relevant documents is returned.

---

## ✅ Why Use MultiQueryRetriever?

### 🚫 Problem with Single Query Retrieval:
- May **miss important context** if query is too narrow.
- Retrieval quality depends on **exact wording**.
- Doesn’t cover **different angles** or subtopics.

### ✅ MultiQueryRetriever Solves This By:
| Benefit          | Explanation                                                             |
|------------------|-------------------------------------------------------------------------|
| 📌 Broader Coverage | Captures more relevant documents from different query perspectives.     |
| 💬 LLM-Augmented     | Uses language model to creatively and semantically rephrase the query. |
| 🧠 Smarter Search   | Retrieves content that may not match the original query wording.        |
| 🧹 Deduplication     | Combines results while removing redundant chunks.                      |

---

## 📚 Real-World Example

**Original Query**: “Applications of AI in education”

### 🔴 Normal Retriever:
- Returns documents only matching exact wording.
- Might miss use cases like grading, tutoring, or content generation.

### 🟢 MultiQueryRetriever:
- Uses LLM to expand:
   - “How is AI used in teaching?”
   - “AI in student performance evaluation”
   - “Benefits of AI for online learning”
- Retrieves a **broader and richer** set of documents → better LLM answer.

---

## 🧩 When to Use MultiQueryRetriever

| Use Case                             | Why It Helps                                                   |
|--------------------------------------|----------------------------------------------------------------|
| Open-ended or vague queries          | Ensures more relevant results even when query is ambiguous     |
| Complex or multi-dimensional topics  | Surfaces different aspects or angles of the topic              |
| Users may phrase queries differently | Handles natural language variability via LLM rewrites          |
| Low recall in vector search          | Increases coverage by expanding the semantic search net        |

---

## ⚖️ Summary: MultiQuery vs Normal Retriever

| Feature                     | Normal Retriever     | MultiQueryRetriever           |
|-----------------------------|----------------------|-------------------------------|
| Uses single query          | ✅                   | ❌                            |
| Uses multiple LLM queries  | ❌                   | ✅                            |
| Broader document coverage  | ❌                   | ✅                            |
| Handles query ambiguity    | ❌                   | ✅                            |
| Deduplication              | ❌ (manual)          | ✅ (automatic)                |
| Chain/Runnable compatible  | ✅                   | ✅                            |

---

## 🧠 Final Thoughts

- **MultiQueryRetriever** boosts your app’s retrieval intelligence by asking the search engine **multiple smart questions** instead of just one.
- It's especially powerful when dealing with **complex**, **ambiguous**, or **open-domain** user queries.
- Great for **chatbots**, **research assistants**, **educational tools**, or any RAG system where **depth and coverage** matter.

> 🔍 *“If one question can miss the answer, ask five better ones.” – that’s the MultiQueryRetriever in action.*

---


In [None]:
!pip install langchain-google-genai



In [None]:
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_core.documents import Document
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.retrievers.multi_query import MultiQueryRetriever

In [None]:
# Relevant health & wellness documents
all_docs = [
    Document(page_content="Regular walking boosts heart health and can reduce symptoms of depression.", metadata={"source": "H1"}),
    Document(page_content="Consuming leafy greens and fruits helps detox the body and improve longevity.", metadata={"source": "H2"}),
    Document(page_content="Deep sleep is crucial for cellular repair and emotional regulation.", metadata={"source": "H3"}),
    Document(page_content="Mindfulness and controlled breathing lower cortisol and improve mental clarity.", metadata={"source": "H4"}),
    Document(page_content="Drinking sufficient water throughout the day helps maintain metabolism and energy.", metadata={"source": "H5"}),
    Document(page_content="The solar energy system in modern homes helps balance electricity demand.", metadata={"source": "I1"}),
    Document(page_content="Python balances readability with power, making it a popular system design language.", metadata={"source": "I2"}),
    Document(page_content="Photosynthesis enables plants to produce energy by converting sunlight.", metadata={"source": "I3"}),
    Document(page_content="The 2022 FIFA World Cup was held in Qatar and drew global energy and excitement.", metadata={"source": "I4"}),
    Document(page_content="Black holes bend spacetime and store immense gravitational energy.", metadata={"source": "I5"}),
]

In [None]:
# Initialize OpenAI embeddings
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Create FAISS vector store
vectorstore = FAISS.from_documents(documents=all_docs, embedding=embedding_model)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [None]:
# Create retrievers
similarity_retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 5})

In [None]:
import os
os.environ["GOOGLE_API_KEY"] = "AIzaSyCmXV2cSvmT1l14uFD8dE3QSWyXziQNCJA"

In [None]:
multiquery_retriever = MultiQueryRetriever.from_llm(
    retriever=vectorstore.as_retriever(search_kwargs={"k": 5}), #we can use MMR also here
    llm=ChatGoogleGenerativeAI(model="gemini-2.0-flash") # model to generate multiple queries
)

In [None]:
# Query
query = "How to improve energy levels and maintain balance?"

In [None]:
# Retrieve results
similarity_results = similarity_retriever.invoke(query)
multiquery_results= multiquery_retriever.invoke(query)

In [None]:
for i, doc in enumerate(similarity_results):
    print(f"\n--- Result {i+1} ---")
    print(doc.page_content)

print("*"*150)

for i, doc in enumerate(multiquery_results):
    print(f"\n--- Result {i+1} ---")
    print(doc.page_content)


--- Result 1 ---
Drinking sufficient water throughout the day helps maintain metabolism and energy.

--- Result 2 ---
The solar energy system in modern homes helps balance electricity demand.

--- Result 3 ---
Consuming leafy greens and fruits helps detox the body and improve longevity.

--- Result 4 ---
Mindfulness and controlled breathing lower cortisol and improve mental clarity.

--- Result 5 ---
Photosynthesis enables plants to produce energy by converting sunlight.
******************************************************************************************************************************************************

--- Result 1 ---
The solar energy system in modern homes helps balance electricity demand.

--- Result 2 ---
Drinking sufficient water throughout the day helps maintain metabolism and energy.

--- Result 3 ---
Consuming leafy greens and fruits helps detox the body and improve longevity.

--- Result 4 ---
Regular walking boosts heart health and can reduce symptoms of de

## ContextualCompressionRetriever

# 🧠 ContextualCompressionRetriever in LangChain

The **ContextualCompressionRetriever** is a smart wrapper in LangChain that improves retrieval quality by **compressing or filtering the retrieved documents** based on the user's query before passing them to the LLM.

It’s like a retrieval post-processor — ensuring that **only the most relevant parts of the retrieved documents** are sent to the model, which improves quality and reduces token usage.

---

## 🚀 What Is ContextualCompressionRetriever?

- It wraps around any **base retriever** (e.g., vector retriever, MMR, multi-query).
- Once the base retriever returns a set of documents, it uses a **compression mechanism** (typically an LLM) to:
  - Filter out irrelevant documents.
  - Summarize or extract relevant portions from documents.

---

## ⚙️ How It Works – Step-by-Step

1. **Query Input**: User provides a query (e.g., "Explain Einstein's theory of relativity").
2. **Initial Retrieval**:
   - Base retriever (e.g., FAISS) fetches top-k relevant documents.
3. **Contextual Compression**:
   - Each document is passed through a **Document Compressor**, such as:
     - LLM-based relevance filter
     - Summarizer
     - Embedding-based chunk selector
   - Only **relevant parts or summaries** of the documents are kept.
4. **Return Compressed Context**:
   - Final filtered/summarized content is returned as input to the LLM or pipeline.

---

## 🧩 Why Use ContextualCompressionRetriever?

### 🔥 Problems It Solves:
| Problem                            | Solution via Compression                           |
|------------------------------------|-----------------------------------------------------|
| ❌ Too many irrelevant docs        | 🧹 Removes unrelated documents                      |
| ❌ Overlapping or long documents   | ✂️ Extracts only the relevant parts                |
| ❌ High token cost                 | 💸 Reduces input size while preserving meaning      |
| ❌ Confusing context for LLM       | 🎯 Focuses only on context that matches the query   |

---

## 🔄 Compression Strategies Supported

| Type                          | Description                                             |
|-------------------------------|---------------------------------------------------------|
| LLM Relevance Filtering       | Uses an LLM to judge if a document is relevant          |
| LLM Summarization             | Summarizes documents based on the query                 |
| Embedding-based Chunk Filter  | Selects chunks close to query vector                    |

*You can mix and match these based on your performance and latency needs.*

---

## 📚 Real-World Example

**Query**: "What is transfer learning in machine learning?"

### 🔴 Without Compression
- 5 documents retrieved.
- Only 1 really focuses on transfer learning.
- Others contain generic ML content → noise for LLM.

### 🟢 With ContextualCompressionRetriever
- Same 5 docs retrieved initially.
- 4 irrelevant ones filtered out.
- Only the one with transfer learning is passed to LLM → better and focused answer.

---

## ✅ When to Use ContextualCompressionRetriever

| Use Case                              | Why It Helps                                                   |
|---------------------------------------|----------------------------------------------------------------|
| Large or noisy knowledge bases        | Filters noise and keeps only what's needed                     |
| Token-limited environments            | Saves cost and stays under limits                              |
| Need for precision in QA              | Provides more accurate, high-quality answers                   |
| Long document retrieval               | Extracts and condenses long chunks into usable input           |

---

## ⚖️ Summary Table

| Feature / Goal                | Base Retriever Only | With ContextualCompressionRetriever |
|-------------------------------|---------------------|-------------------------------------|
| Simple similarity retrieval   | ✅                  | ✅                                  |
| Removes irrelevant context    | ❌                  | ✅                                  |
| Compresses long documents     | ❌                  | ✅                                  |
| Reduces token cost            | ❌                  | ✅                                  |
| Improves LLM input quality    | ❌                  | ✅                                  |
| Runnable & chain-compatible   | ✅                  | ✅                                  |

---

## 🧠 Final Thoughts

- `ContextualCompressionRetriever` adds a **filtering and compression layer** to your retrieval system.
- Ideal for building **efficient**, **accurate**, and **cost-effective** RAG pipelines.
- Combine it with any retriever: **vector, MMR, multi-query**, etc.
- Think of it as a **smart gatekeeper** that only lets the most relevant context through.

> 💡 *"Don’t feed your LLM everything — just feed it what matters."*

---


In [None]:
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain_core.documents import Document
from langchain_google_genai import ChatGoogleGenerativeAI

In [None]:
# Recreate the document objects from the previous data
docs = [
    Document(page_content=(
        """The Grand Canyon is one of the most visited natural wonders in the world.
        Photosynthesis is the process by which green plants convert sunlight into energy.
        Millions of tourists travel to see it every year. The rocks date back millions of years."""
    ), metadata={"source": "Doc1"}),

    Document(page_content=(
        """In medieval Europe, castles were built primarily for defense.
        The chlorophyll in plant cells captures sunlight during photosynthesis.
        Knights wore armor made of metal. Siege weapons were often used to breach castle walls."""
    ), metadata={"source": "Doc2"}),

    Document(page_content=(
        """Basketball was invented by Dr. James Naismith in the late 19th century.
        It was originally played with a soccer ball and peach baskets. NBA is now a global league."""
    ), metadata={"source": "Doc3"}),

    Document(page_content=(
        """The history of cinema began in the late 1800s. Silent films were the earliest form.
        Thomas Edison was among the pioneers. Photosynthesis does not occur in animal cells.
        Modern filmmaking involves complex CGI and sound design."""
    ), metadata={"source": "Doc4"})
]

In [None]:
# Create a FAISS vector store from the documents
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(docs, embedding_model)

In [None]:
base_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

In [None]:
# Set up the compressor using an LLM
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash")
compressor = LLMChainExtractor.from_llm(llm)

In [None]:
# Create the contextual compression retriever
compression_retriever = ContextualCompressionRetriever(
    base_retriever=base_retriever,
    base_compressor=compressor
)

In [None]:
# Query the retriever
query = "What is photosynthesis?"
compressed_results = compression_retriever.invoke(query)

In [None]:
for i, doc in enumerate(compressed_results):
    print(f"\n--- Result {i+1} ---")
    print(doc.page_content)



--- Result 1 ---
Photosynthesis is the process by which green plants convert sunlight into energy.

--- Result 2 ---
The chlorophyll in plant cells captures sunlight during photosynthesis.

--- Result 3 ---
Photosynthesis does not occur in animal cells.
