# Lesson 3.5: Retrievers

---

In previous lessons, we learned how to load documents, split them into chunks, create embeddings, and store them in a Vector Store. Now, we need a mechanism to retrieve relevant text segments from the Vector Store when a user asks a question. This is the role of **Retrievers**.

## 1. Concept of Retrievers

### 1.1. What are Retrievers?

**Retrievers** are tools in LangChain designed to query relevant text segments (or documents) from a data source (typically a Vector Store) based on an input query. They act as a bridge between the user's question and your external knowledge base.

* **Relationship:** Retrievers are a core component in the **Retrieval-Augmented Generation (RAG)** architecture. They receive the user's query, search for relevant documents in your knowledge base, and then pass these documents along with the query to the LLM to generate the final answer.



### 1.2. The Role of Retrievers in RAG

In a RAG system, the processing flow typically goes as follows:
1.  The user poses a **query**.
2.  The **Retriever** receives this query.
3.  The Retriever uses a search mechanism (e.g., similarity search in a Vector Store) to identify the most relevant text segments (chunks) to the query.
4.  The Retriever returns these relevant text segments.
5.  These retrieved text segments are then provided as **context** to the LLM along with the original query for the LLM to generate the final answer.


---

## 2. `VectorStoreRetriever`: Basic Retriever based on Vector Store Similarity Search

`VectorStoreRetriever` is the most common and basic type of Retriever in LangChain. It works by performing a similarity search on a configured Vector Store.

### 2.1. How it Works

When you create a `VectorStoreRetriever` from a `VectorStore` (like FAISS or Chroma), it will use the embedding model associated with that Vector Store to:
1.  Convert the user's query into an embedding vector.
2.  Perform a similarity search within the Vector Store to find the text segments (chunks) whose embeddings are closest to the query's embedding.
3.  Return the `Document` objects corresponding to those chunks.

### 2.2. Configuring `k` (Number of Documents to Return)

The `k` parameter is one of the most important configurations for `VectorStoreRetriever`.

* **Concept:** `k` determines the number of top relevant text segments (chunks) that the Retriever will return from the Vector Store.
* **Impact:**
    * **Small `k`:** The LLM will receive less context. This can be good if you want the LLM to focus on very specific information or if you are dealing with strict token limits. However, there's a risk of missing important information.
    * **Large `k`:** The LLM will receive more context. This helps ensure all relevant information is included, but can lead to the LLM being "overwhelmed" with information or exceeding token limits.
* **Selection:** The optimal value of `k` depends on the length of your chunks, the token limit of the LLM you are using, and the complexity of the questions. Typically, a `k` value between 3 and 5 is a good starting point.

### 2.3. Practical Example with `VectorStoreRetriever` (using Chroma)

We will use Chroma (from Lesson 3.4) to create a Vector Store, then turn it into a Retriever and perform a query.

**Preparation:**
* Ensure you have `langchain-openai` and `chromadb` installed.
* Set the `OPENAI_API_KEY` environment variable.

In [None]:
# Install libraries if not already installed
# pip install langchain-openai openai chromadb

import os
import shutil
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Set environment variable for OpenAI API key
# os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

# 1. Initialize Embeddings model
embeddings_model = OpenAIEmbeddings(model="text-embedding-ada-002")

# 2. Prepare sample documents
long_text_content = """
LangChain is an open-source framework designed to help developers build applications powered by Large Language Models (LLMs) more easily and efficiently. It provides a set of tools, components, and abstractions to simplify complex processes related to LLMs, from prompt management to connecting LLMs with external data sources and tools.

Key components of LangChain include:
- Models: Interfaces for LLMs and Chat Models.
- Prompts: Tools for constructing and managing prompts for LLMs.
- Chains: Connecting components together to form a processing flow.
- Agents: Allowing LLMs to make decisions and use tools.
- Retrieval: Tools for loading, splitting, embedding, and retrieving documents.
- Memory: Maintaining state and context in conversations.

Retrieval-Augmented Generation (RAG) is a popular architecture in LLM applications. RAG allows an LLM to retrieve information from an external knowledge base (often a Vector Store) before generating a response. This helps improve accuracy and reduce LLM "hallucinations."

To build a RAG system, the main steps typically include:
1. Document Loading.
2. Text Splitting into chunks.
3. Creating embeddings for text chunks.
4. Storing embeddings in a Vector Store.
5. When queried, searching for relevant chunks in the Vector Store.
6. Passing relevant chunks and the query to the LLM to generate an answer.

Chroma is an open-source vector database, easy to use, supporting local and client/server storage.
"""

doc = Document(page_content=long_text_content, metadata={"source": "langchain_guide"})

# Split the document into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    length_function=len,
    add_start_index=True
)
chunks = text_splitter.split_documents([doc])

# 3. Create Chroma Vector Store from chunks
persist_directory = "./chroma_db_retriever_demo"
if os.path.exists(persist_directory):
    shutil.rmtree(persist_directory)

print(f"Creating Chroma Vector Store at: {persist_directory}...")
vector_store = Chroma.from_documents(
    chunks,
    embeddings_model,
    persist_directory=persist_directory
)
print("Chroma Vector Store created successfully.")

# 4. Convert Vector Store to Retriever
# Configure k=2 to retrieve 2 most relevant text segments
retriever = vector_store.as_retriever(search_kwargs={"k": 2})

print(f"\nRetriever created with k=2.")

# 5. Perform query with Retriever
query = "What are the steps to build a RAG system?"
print(f"\nPerforming query with Retriever: '{query}'")

retrieved_docs = retriever.invoke(query)

print("\n--- Documents retrieved by Retriever ---")
for i, doc_retrieved in enumerate(retrieved_docs):
    print(f"Document {i+1} (Source: {doc_retrieved.metadata.get('source', 'Unknown')}):")
    print(f"  Content: {doc_retrieved.page_content[:200]}...")
    print(f"  Metadata: {doc_retrieved.metadata}")
    print("-" * 30)

# Clean up Chroma directory
if os.path.exists(persist_directory):
    shutil.rmtree(persist_directory)
    print(f"\nChroma directory '{persist_directory}' removed.")

**Explanation:**
* `vector_store.as_retriever(search_kwargs={"k": 2})`: This is how you convert a Vector Store (Chroma here) into a Retriever. `search_kwargs` is a dictionary to pass custom parameters for the search process, in this case `k=2` to specify the number of documents to retrieve.
* `retriever.invoke(query)`: Invokes the Retriever with the user's question. The Retriever will perform internal steps (embedding the query, searching the Vector Store) and return a list of relevant `Document`s.


---

## 3. Advanced Retriever Types (Introduction)

LangChain provides various advanced Retriever types to address more complex challenges in information retrieval. Here's an introduction to some common ones:

### 3.1. `ContextualCompressionRetriever`

* **Concept:** The `ContextualCompressionRetriever` works by retrieving a larger initial set of documents (from a base Retriever), and then uses an LLM or another model to "compress" or "filter" the information within those documents, keeping only the most important parts directly relevant to the query.
* **When to Use:** When you are concerned that retrieved documents might contain a lot of noise or irrelevant information, "diluting" the context for the main LLM. It helps ensure the LLM receives only concise and high-quality context.
* **How it Works:** Typically uses a `BaseRetriever` (e.g., `VectorStoreRetriever`) and a `BaseLLMCompressor` (e.g., `LLMChainExtractor` or `LLMChainFilter`).



### 3.2. `MultiQueryRetriever`

* **Concept:** Instead of searching based on a single query, the `MultiQueryRetriever` uses an LLM to generate multiple different sub-queries from the user's original query. It then performs searches with all these sub-queries and combines the results.
* **When to Use:** When the user's query might be ambiguous or have multiple interpretations, or when you want to ensure a more comprehensive search to avoid missing relevant information.
* **Benefit:** Increases the likelihood of finding relevant documents by exploring different facets of the query.



### 3.3. `ParentDocumentRetriever`

* **Concept:** The `ParentDocumentRetriever` addresses the challenge where you want to search on small text segments (chunks) for high precision, but also want to provide the LLM with broader context from the "parent" document containing those segments. It stores both the small chunks and larger documents. During retrieval, it searches on the small chunks, but then returns the corresponding parent document.
* **When to Use:** When you need a balance between the precision of searching on small chunks and the need for full context for the LLM. For example, when you search for a small detail in a book, but want the LLM to have the entire chapter or relevant section to answer.
* **How it Works:** Uses two Vector Stores: one for the small chunks and one for the parent documents.



These advanced Retriever types allow you to build more sophisticated RAG systems, optimizing the information retrieval process to provide the best possible context to the LLM, thereby improving answer quality.


---

## Lesson Summary

This lesson introduced **Retrievers** in LangChain, essential tools for querying and fetching relevant text segments from a Vector Store. We delved into **`VectorStoreRetriever`**, the most basic Retriever type, and how to configure the `k` parameter to control the number of documents returned. Through a practical example with Chroma, you saw how to create and use a `VectorStoreRetriever` to retrieve information. Finally, the lesson provided an overview of advanced Retriever types such as **`ContextualCompressionRetriever`** (context compression), **`MultiQueryRetriever`** (multi-query generation), and **`ParentDocumentRetriever`** (parent document retrieval), opening the door to building more complex and effective RAG systems.