# LangChain Retrievers Guide

This notebook demonstrates different types of retrievers in LangChain for document retrieval and search.

---

## 1. Wikipedia Retriever

Retrieves documents directly from Wikipedia based on a query.

```python
from langchain_community.retrievers import WikipediaRetriever

retriever = WikipediaRetriever(top_k_results=2, lang="en")
query = "the geopolitical history of india and pakistan from the perspective of a chinese"
docs = retriever.invoke(query)
```

**Key Parameters:**
- `top_k_results`: Number of Wikipedia articles to retrieve
- `lang`: Language of Wikipedia to search

---

## 2. Vector Store Retriever

Uses vector embeddings to find semantically similar documents.

### Setup
```python
from langchain_core.documents import Document
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS

# Create documents
docs = [
    Document(page_content="LangChain makes it easy to work with LLMs."),
    Document(page_content="Embeddings are vector representations of text."),
    # ... more documents
]

# Create embeddings and vector store
embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector_store = FAISS.from_documents(documents=docs, embedding=embedding)
```

### Retriever with MMR (Maximal Marginal Relevance)
```python
retriever = vector_store.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 3, "lambda_mult": 0.5}
)
```

**Key Parameters:**
| Parameter | Description |
|-----------|-------------|
| `search_type` | `"similarity"` or `"mmr"` |
| `k` | Number of documents to retrieve |
| `lambda_mult` | Diversity factor (0 = max diversity, 1 = max relevance) |

---

## 3. MultiQuery Retriever

Generates multiple query variations using an LLM to improve retrieval coverage.

```python
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_groq import ChatGroq

model = ChatGroq(model="llama3-8b-8192")

multiquery_retriever = MultiQueryRetriever.from_llm(
    retriever=vector_store.as_retriever(search_kwargs={"k": 5}),
    llm=model
)

results = multiquery_retriever.invoke("How to improve energy levels?")
```

**How it works:**
1. Takes the original query
2. Uses LLM to generate multiple query variations
3. Retrieves documents for each variation
4. Combines and deduplicates results

---

## 4. Contextual Compression Retriever

Compresses retrieved documents to extract only the relevant portions.

```python
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

# Create base retriever
base_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# Create compressor
compressor = LLMChainExtractor.from_llm(llm=model)

# Create compression retriever
compression_retriever = ContextualCompressionRetriever(
    base_retriever=base_retriever,
    base_compressor=compressor
)

results = compression_retriever.invoke("What is photosynthesis?")
```

**How it works:**
1. Retrieves documents using base retriever
2. Uses LLM to extract only query-relevant content
3. Returns compressed documents with focused content

---

## Comparison Summary

| Retriever | Use Case | Pros | Cons |
|-----------|----------|------|------|
| **Wikipedia** | General knowledge queries | No setup needed | Limited to Wikipedia |
| **Vector Store** | Custom document search | Fast, semantic search | Requires embeddings |
| **MultiQuery** | Complex queries | Better coverage | Slower, uses LLM |
| **Contextual Compression** | Precise extraction | Focused results | Slower, uses LLM |

---

## Requirements

```bash
pip install langchain langchain-community langchain-huggingface langchain-groq faiss-cpu wikipedia
```

---

## Environment Variables

```bash
GROQ_API_KEY=your_groq_api_key
```

>> practice code 

In [20]:
from langchain_community.retrievers import WikipediaRetriever
retriever = WikipediaRetriever(top_k_results=2, lang="en")

In [21]:
# Define your query
query = "the geopolitical history of india and pakistan from the perspective of a chinese"

# Get relevant Wikipedia documents
docs = retriever.invoke(query)

In [22]:
# Print retrieved content
for i, doc in enumerate(docs):
    print(f"\n--- Result {i+1} ---")
    print(f"Content:\n{doc.page_content}...")  # truncate for display


--- Result 1 ---
Content:
India, officially the Republic of India, has full diplomatic relations with 201 states, including Palestine, the Holy See, Niue and the Cook Islands. The Ministry of External Affairs (MEA) is the government agency responsible for the conduct of foreign relations of India. With the world's third largest military expenditure, second largest armed force, fourth largest economy by GDP nominal rates and third largest economy in terms of purchasing power parity, India is a prominent regional power and a potential superpower.
According to the MEA, the main purposes of Indian diplomacy include protecting India's national interests, promoting friendly relations with other states, and providing consular services to "foreigners and Indian nationals abroad." In recent decades, India has pursued an expansive foreign policy, including the neighborhood-first policy embodied by SAARC as well as the Look East policy to forge more extensive economic and strategic relationships

# vector store retriever

In [23]:
from langchain_core.documents import Document
docs = [
    Document(page_content="LangChain makes it easy to work with LLMs."),
    Document(page_content="LangChain is used to build LLM based applications."),
    Document(page_content="Chroma is used to store and search document embeddings."),
    Document(page_content="Embeddings are vector representations of text."),
    Document(page_content="MMR helps you get diverse results when doing similarity search."),
    Document(page_content="LangChain supports Chroma, FAISS, Pinecone, and more."),
]

In [24]:
from langchain_huggingface import HuggingFaceEmbeddings
embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

In [25]:
from langchain_community.vectorstores import FAISS
vector_store=FAISS.from_documents(documents=docs,
                                  embedding=embedding)


In [26]:
retriever=vector_store.as_retriever(  search_type="mmr", search_kwargs={"k": 3, "lambda_mult": 0.5,}
    )
#lambda_mult= how distributed  data you want

In [27]:
query = "What is langchain?"
results = retriever.invoke(query)

In [28]:
for i, doc in enumerate(results):
    print(f"\n--- Result {i+1} ---")
    print(doc.page_content)


--- Result 1 ---
LangChain supports Chroma, FAISS, Pinecone, and more.

--- Result 2 ---
LangChain is used to build LLM based applications.

--- Result 3 ---
Embeddings are vector representations of text.


# multiquery retriever


In [37]:

from langchain_classic.retrievers.multi_query import MultiQueryRetriever
from langchain_groq import ChatGroq
from dotenv import load_dotenv
load_dotenv()
model=ChatGroq(model="openai/gpt-oss-120b")



In [None]:
all_docs = [
    Document(page_content="Regular walking boosts heart health and can reduce symptoms of depression.", metadata={"source": "H1"}),
    Document(page_content="Consuming leafy greens and fruits helps detox the body and improve longevity.", metadata={"source": "H2"}),
    Document(page_content="Deep sleep is crucial for cellular repair and emotional regulation.", metadata={"source": "H3"}),
    Document(page_content="Mindfulness and controlled breathing lower cortisol and improve mental clarity.", metadata={"source": "H4"}),
    Document(page_content="Drinking sufficient water throughout the day helps maintain metabolism and energy.", metadata={"source": "H5"}),
    Document(page_content="The solar energy system in modern homes helps balance electricity demand.", metadata={"source": "I1"}),
    Document(page_content="Python balances readability with power, making it a popular system design language.", metadata={"source": "I2"}),
    Document(page_content="Photosynthesis enables plants to produce energy by converting sunlight.", metadata={"source": "I3"}),
    Document(page_content="The 2022 FIFA World Cup was held in Qatar and drew global energy and excitement.", metadata={"source": "I4"}),
    Document(page_content="Black holes bend spacetime and store immense gravitational energy.", metadata={"source": "I5"}),
]

In [35]:
from langchain_huggingface import HuggingFaceEmbeddings
embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")


In [36]:
vector_store=FAISS.from_documents(embedding=embedding,documents=all_docs)

In [53]:
multiquery_retriever = MultiQueryRetriever.from_llm(
    retriever=vector_store.as_retriever(search_kwargs={"k": 5}),
    llm=model
)

In [54]:

similarity_retriever=vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 5, "lambda_mult": 0.25})

In [55]:
# Query
query = "How to improve energy levels and maintain balance?"

In [56]:
# Retrieve results
similarity_results = similarity_retriever.invoke(query)
multiquery_results= multiquery_retriever.invoke(query)

In [57]:
for i, doc in enumerate(similarity_results):
    print(f"\n--- Result {i+1} ---")
    print(doc.page_content)

print("*"*150)

for i, doc in enumerate(multiquery_results):
    print(f"\n--- Result {i+1} ---")
    print(doc.page_content)


--- Result 1 ---
Drinking sufficient water throughout the day helps maintain metabolism and energy.

--- Result 2 ---
The solar energy system in modern homes helps balance electricity demand.

--- Result 3 ---
Consuming leafy greens and fruits helps detox the body and improve longevity.

--- Result 4 ---
Mindfulness and controlled breathing lower cortisol and improve mental clarity.

--- Result 5 ---
Photosynthesis enables plants to produce energy by converting sunlight.
******************************************************************************************************************************************************

--- Result 1 ---
Drinking sufficient water throughout the day helps maintain metabolism and energy.

--- Result 2 ---
The solar energy system in modern homes helps balance electricity demand.

--- Result 3 ---
Mindfulness and controlled breathing lower cortisol and improve mental clarity.

--- Result 4 ---
Consuming leafy greens and fruits helps detox the body and impr

# contextual retriever

In [65]:
from langchain_classic.retrievers import ContextualCompressionRetriever
from langchain_classic.retrievers.document_compressors import LLMChainExtractor


In [61]:
# Recreate the document objects from the previous data
docs = [
    Document(page_content=(
        """The Grand Canyon is one of the most visited natural wonders in the world.
        Photosynthesis is the process by which green plants convert sunlight into energy.
        Millions of tourists travel to see it every year. The rocks date back millions of years."""
    ), metadata={"source": "Doc1"}),

    Document(page_content=(
        """In medieval Europe, castles were built primarily for defense.
        The chlorophyll in plant cells captures sunlight during photosynthesis.
        Knights wore armor made of metal. Siege weapons were often used to breach castle walls."""
    ), metadata={"source": "Doc2"}),

    Document(page_content=(
        """Basketball was invented by Dr. James Naismith in the late 19th century.
        It was originally played with a soccer ball and peach baskets. NBA is now a global league."""
    ), metadata={"source": "Doc3"}),

    Document(page_content=(
        """The history of cinema began in the late 1800s. Silent films were the earliest form.
        Thomas Edison was among the pioneers. Photosynthesis does not occur in animal cells.
        Modern filmmaking involves complex CGI and sound design."""
    ), metadata={"source": "Doc4"})
]

In [62]:
embedding_model=HuggingFaceEmbeddings(model="sentence-transformers/all-MiniLM-L6-v2")

In [63]:
vectorstore = FAISS.from_documents(docs, embedding_model)

In [66]:
base_retriever=vectorstore.as_retriever( search_kwargs={"k": 5,})
compressor=LLMChainExtractor.from_llm(llm=model)

In [68]:
compressor_retriever=ContextualCompressionRetriever(base_retriever=base_retriever,base_compressor=compressor)

In [70]:
# Query the retriever
query = "What is photosynthesis?"
compressed_results = compressor_retriever.invoke(query)

In [71]:
for i, doc in enumerate(compressed_results):
    print(f"\n--- Result {i+1} ---")
    print(doc.page_content)



--- Result 1 ---
Photosynthesis is the process by which green plants convert sunlight into energy.

--- Result 2 ---
The chlorophyll in plant cells captures sunlight during photosynthesis.

--- Result 3 ---
Photosynthesis does not occur in animal cells.
