# Vector Stores and Retrievers: The Foundation of RAG

This video tutorial will familiarize you with LangChain's `VectorStore` and `Retriever` abstractions. These fundamental components are designed to facilitate the retrieval of data from various sources, especially vector databases, for seamless integration with Large Language Model (LLM) workflows. They are paramount for applications that need to fetch external data to augment the LLM's reasoning capabilities, as seen in Retrieval-Augmented Generation (RAG) systems.

We will cover:
* **Documents**: The basic unit of data in LangChain.
* **Vector Stores**: Databases optimized for storing and querying vector embeddings.
* **Retrievers**: LangChain's standardized interface for fetching relevant documents.
* **A Simple RAG Example**: Combining these components into an end-to-end RAG chain.

## Key Concepts:

* **Retrieval-Augmented Generation (RAG)**: A technique where an LLM's response is improved by first retrieving relevant information from an external knowledge base (often a vector store) and then using that information as context for generation.
* **`Document`**: LangChain's abstraction for a unit of text, comprising `page_content` (the text itself) and `metadata` (a dictionary of arbitrary information about the document).
* **`VectorStore`**: A database (like Chroma, FAISS, Pinecone) that stores numerical vector representations (embeddings) of text, allowing for efficient similarity searches.
* **`Embedding`**: A dense numerical vector that captures the semantic meaning of text.
* **`HuggingFaceEmbeddings`**: LangChain's integration to generate embeddings using models hosted on Hugging Face (e.g., `all-MiniLM-L6-v2`).
* **`Chroma`**: A popular open-source, AI-native vector database often used for local development and persistence.
* **`Retriever`**: A LangChain component that defines a standard way to retrieve documents based on a query. Retrievers are `Runnable`s, meaning they can be easily integrated into LCEL chains.
* **`RunnableLambda`**: An LCEL construct to turn any Python function into a `Runnable`.
* **`as_retriever()`**: A common method on `VectorStore` objects to convert them into a standard LangChain `Retriever`.
* **`search_type` and `search_kwargs`**: Parameters used when creating a `VectorStoreRetriever` to specify how the underlying vector store should perform its search (e.g., "similarity", "mmr") and with what parameters (e.g., `k` for number of results).
* **LangChain Expression Language (LCEL)**: The `|` operator for chaining components (`Runnable`s) together, facilitating complex workflows.
* **`RunnablePassthrough`**: An LCEL component that simply passes its input through to the next component in a chain. Used in RAG to pass the original question.
* **`ChatPromptTemplate`**: Used to structure the input to a chat model, incorporating both the question and the retrieved context.

## Setup: API Keys and Environment Variables

Ensure your `.env` file is configured with `GROQ_API_KEY` and optionally `HF_TOKEN`.

### Vector stores and retrievers
This video tutorial will familiarize you with LangChain's vector store and retriever abstractions. These abstractions are designed to support retrieval of data-- from (vector) databases and other sources-- for integration with LLM workflows. They are important for applications that fetch data to be reasoned over as part of model inference, as in the case of retrieval-augmented generation.

We will cover 
- Documents
- Vector stores
- Retrievers


### Documents
LangChain implements a Document abstraction, which is intended to represent a unit of text and associated metadata. It has two attributes:

- page_content: a string representing the content;
- metadata: a dict containing arbitrary metadata.
The metadata attribute can capture information about the source of the document, its relationship to other documents, and other information. Note that an individual Document object often represents a chunk of a larger document.

Let's generate some sample documents:

In [2]:
# Install necessary libraries (uncomment and run if not already installed)
# !pip install langchain langchain-chroma langchain_groq langchain_huggingface python-dotenv

import os
from dotenv import load_dotenv

# Load environment variables from the .env file.
load_dotenv()

# --- API Keys ---
# Get Groq API key from environment variables.
groq_api_key = os.getenv("GROQ_API_KEY")
# Get HuggingFace Token from environment variables (optional for some models).
os.environ["HF_TOKEN"] = os.getenv("HF_TOKEN")

# Import ChatGroq for using Groq's fast inference models.
from langchain_groq import ChatGroq

# Initialize a ChatGroq model.
# Using "Llama3-8b-8192" which is a powerful and fast open-source model via Groq.
llm = ChatGroq(groq_api_key=groq_api_key, model="Llama3-8b-8192")

# Display the initialized LLM object.
print("--- Initialized Groq LLM ---")
print(llm)

--- Initialized Groq LLM ---
client=<groq.resources.chat.completions.Completions object at 0x114302ba0> async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x114303770> model_name='Llama3-8b-8192'


### Documents: The Basic Unit of Data

In LangChain, a `Document` is a fundamental abstraction. It represents a piece of text content along with associated metadata. This structure allows us to keep track of the source and other relevant information about the text.

In [3]:
# Import the Document class from langchain_core.documents.
from langchain_core.documents import Document

# Create a list of sample Document objects.
# Each Document has `page_content` (the actual text) and `metadata` (a dictionary).
documents = [
    Document(
        page_content="Dogs are great companions, known for their loyalty and friendliness.",
        metadata={"source": "mammal-pets-doc"},
    ),
    Document(
        page_content="Cats are independent pets that often enjoy their own space.",
        metadata={"source": "mammal-pets-doc"},
    ),
    Document(
        page_content="Goldfish are popular pets for beginners, requiring relatively simple care.",
        metadata={"source": "fish-pets-doc"},
    ),
    Document(
        page_content="Parrots are intelligent birds capable of mimicking human speech.",
        metadata={"source": "bird-pets-doc"},
    ),
    Document(
        page_content="Rabbits are social animals that need plenty of space to hop around.",
        metadata={"source": "mammal-pets-doc"},
    ),
]

# Display the list of created documents.
print("--- Sample Documents ---")
print(documents)

--- Sample Documents ---
[Document(metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.'), Document(metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.'), Document(metadata={'source': 'fish-pets-doc'}, page_content='Goldfish are popular pets for beginners, requiring relatively simple care.'), Document(metadata={'source': 'bird-pets-doc'}, page_content='Parrots are intelligent birds capable of mimicking human speech.'), Document(metadata={'source': 'mammal-pets-doc'}, page_content='Rabbits are social animals that need plenty of space to hop around.')]


### Embeddings: Converting Text to Vectors

Before storing documents in a vector store, they need to be converted into numerical vector embeddings. We'll use a Hugging Face embedding model for this.

In [4]:
# Install langchain_huggingface if not already installed.
# !pip install langchain_huggingface

# Import HuggingFaceEmbeddings for creating embeddings using Hugging Face models.
from langchain_huggingface import HuggingFaceEmbeddings

# Initialize HuggingFaceEmbeddings with a pre-trained model.
# "all-MiniLM-L6-v2" is a good general-purpose embedding model.
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Display the embeddings object.
print("--- Initialized HuggingFace Embeddings ---")
print(embeddings)

  from .autonotebook import tqdm as notebook_tqdm


### Vector Stores: Storing and Querying Embeddings

A vector store is a database designed to store vector embeddings and efficiently perform similarity searches (find vectors close to a given query vector). Here, we'll use Chroma, an open-source vector database.

In [5]:
# Import Chroma as our vector store.
from langchain_chroma import Chroma

# Create a Chroma vector store from our documents and embedding function.
# Chroma will embed each document using `embeddings` and store the resulting vectors.
vectorstore = Chroma.from_documents(documents, embedding=embeddings)

# Display the created vector store object.
print("--- Chroma Vector Store Created ---")
print(vectorstore)

Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


--- Chroma Vector Store Created ---
<langchain_chroma.vectorstores.Chroma object at 0x3171f9400>


#### Similarity Search

You can query the vector store directly to find documents similar to a given text query.

In [6]:
# Perform a similarity search for the query "cat".
# It will embed "cat" and find documents with the most similar embeddings.
results = vectorstore.similarity_search("cat")

# Print the results. These are Document objects sorted by similarity.
print("--- Similarity Search Results for 'cat' ---")
print(results)

Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given


--- Similarity Search Results for 'cat' ---
[Document(id='9dc76707-687c-4ae1-90a0-90b6cda55f72', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.'), Document(id='e88e6e6b-b762-4231-897a-beb00c6b862b', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.'), Document(id='c397515b-ef86-4ac6-a89d-3f3c5cc4c682', metadata={'source': 'mammal-pets-doc'}, page_content='Rabbits are social animals that need plenty of space to hop around.'), Document(id='45da9439-feb0-4982-a46a-8a604f216e03', metadata={'source': 'bird-pets-doc'}, page_content='Parrots are intelligent birds capable of mimicking human speech.')]


#### Asynchronous Query

Vector stores often support asynchronous operations for better performance in concurrent applications.

In [7]:
# To run await, this cell needs to be in an async context, like a Jupyter Notebook cell.
# In a Python script, you'd define an async function and run it with `asyncio.run()`.
print("--- Asynchronous Similarity Search (Awaiting Result) ---")
async_results = await vectorstore.asimilarity_search("cat")
print(async_results)

--- Asynchronous Similarity Search (Awaiting Result) ---
[Document(id='9dc76707-687c-4ae1-90a0-90b6cda55f72', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.'), Document(id='e88e6e6b-b762-4231-897a-beb00c6b862b', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.'), Document(id='c397515b-ef86-4ac6-a89d-3f3c5cc4c682', metadata={'source': 'mammal-pets-doc'}, page_content='Rabbits are social animals that need plenty of space to hop around.'), Document(id='45da9439-feb0-4982-a46a-8a604f216e03', metadata={'source': 'bird-pets-doc'}, page_content='Parrots are intelligent birds capable of mimicking human speech.')]


#### Similarity Search with Score

Many vector stores allow you to retrieve documents along with a similarity score. The meaning of the score (higher is better vs. lower is better) depends on the distance metric used by the embedding model and vector store (e.g., cosine similarity vs. L2 distance). For cosine distance (often the default for embeddings), a lower score indicates higher similarity (0 means identical).

In [8]:
# Perform a similarity search and also return the similarity score.
# The result is a list of tuples: (Document, score).
results_with_score = vectorstore.similarity_search_with_score("cat")

# Print the results, including the scores.
print("--- Similarity Search Results with Score for 'cat' ---")
print(results_with_score)

--- Similarity Search Results with Score for 'cat' ---
[(Document(id='9dc76707-687c-4ae1-90a0-90b6cda55f72', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.'), 0.9351055026054382), (Document(id='e88e6e6b-b762-4231-897a-beb00c6b862b', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.'), 1.574090600013733), (Document(id='c397515b-ef86-4ac6-a89d-3f3c5cc4c682', metadata={'source': 'mammal-pets-doc'}, page_content='Rabbits are social animals that need plenty of space to hop around.'), 1.595691204071045), (Document(id='45da9439-feb0-4982-a46a-8a604f216e03', metadata={'source': 'bird-pets-doc'}, page_content='Parrots are intelligent birds capable of mimicking human speech.'), 1.6657930612564087)]


### Retrievers: Standardized Data Fetching

LangChain `VectorStore` objects, by themselves, are not `Runnable`s, which means they can't be directly chained using LCEL's `|` operator. LangChain `Retrievers` are `Runnable`s, providing a standardized interface for fetching documents and enabling seamless integration into LCEL chains.

### Retrievers
LangChain VectorStore objects do not subclass Runnable, and so cannot immediately be integrated into LangChain Expression Language chains.

LangChain Retrievers are Runnables, so they implement a standard set of methods (e.g., synchronous and asynchronous invoke and batch operations) and are designed to be incorporated in LCEL chains.

We can create a simple version of this ourselves, without subclassing Retriever. If we choose what method we wish to use to retrieve documents, we can create a runnable easily. Below we will build one around the similarity_search method:

#### Creating a Retriever Manually (using `RunnableLambda`)

We can manually create a `Runnable` that acts as a retriever by wrapping a vector store's search method with `RunnableLambda`.

In [9]:
# Import RunnableLambda for creating a runnable from a function.
from typing import List
from langchain_core.documents import Document
from langchain_core.runnables import RunnableLambda

# Create a retriever using RunnableLambda.
# We wrap `vectorstore.similarity_search` and bind the `k` parameter to 1 (return 1 result).
retriever_lambda = RunnableLambda(vectorstore.similarity_search).bind(k=1)

# Use the `batch` method on the retriever to get results for multiple queries.
# This demonstrates its `Runnable` capabilities.
batch_results_lambda = retriever_lambda.batch(["cat", "dog"])

# Print the batched results.
print("--- Batch Results using RunnableLambda Retriever ---")
print(batch_results_lambda)

--- Batch Results using RunnableLambda Retriever ---
[[Document(id='9dc76707-687c-4ae1-90a0-90b6cda55f72', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.')], [Document(id='e88e6e6b-b762-4231-897a-beb00c6b862b', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.')]]


#### Creating a Retriever from `as_retriever()`

Most `VectorStore` implementations in LangChain provide an `as_retriever()` method, which is the idiomatic way to create a `Retriever` from a vector store. This method also allows you to specify search parameters.

Vectorstores implement an as_retriever method that will generate a Retriever, specifically a VectorStoreRetriever. These retrievers include specific search_type and search_kwargs attributes that identify what methods of the underlying vector store to call, and how to parameterize them. For instance, we can replicate the above with the following:

In [10]:
# Create a retriever using the `as_retriever` method of the vector store.
# search_type: Specifies the type of search (e.g., "similarity", "mmr").
# search_kwargs: A dictionary of parameters to pass to the underlying search method (e.g., "k" for top_k results).
retriever_as_retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 1} # Retrieve only the top 1 most similar document
)

# Use the `batch` method to test the retriever.
batch_results_as_retriever = retriever_as_retriever.batch(["cat", "dog"])

# Print the batched results.
print("--- Batch Results using as_retriever() ---")
print(batch_results_as_retriever)

--- Batch Results using as_retriever() ---
[[Document(id='9dc76707-687c-4ae1-90a0-90b6cda55f72', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.')], [Document(id='e88e6e6b-b762-4231-897a-beb00c6b862b', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.')]]


### Simple RAG Application

Now, let's combine all these components to build a basic Retrieval-Augmented Generation (RAG) chain. This chain will:
1.  Take a user question.
2.  Retrieve relevant documents using our `retriever`.
3.  Combine the question and context into a prompt.
4.  Send the prompt to the LLM to generate an answer.

In [11]:
# Import necessary components for building the RAG chain.
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

# Define the prompt template for the LLM.
# It explicitly tells the LLM to answer "using the provided context only".
# "{question}" is for the user's query.
# "{context}" is for the retrieved documents.
message = """
Answer this question using the provided context only.

{question}

Context:
{context}
"""
prompt = ChatPromptTemplate.from_messages([("human", message)])

# Build the RAG chain using LCEL.
# The chain expects a dictionary input with a "question" key.
# {"context": retriever, "question": RunnablePassthrough()}
#   - "context" key: Populated by the `retriever` (which takes the "question" as input implicitly).
#   - "question" key: Passed directly from the original input using `RunnablePassthrough()`.
# The output of this dict construction is then piped to the `prompt`.
# The `prompt`'s output (formatted messages) is then piped to the `llm`.
rag_chain = {"context": retriever_as_retriever, "question": RunnablePassthrough()} | prompt | llm

# Invoke the RAG chain with a question.
response = rag_chain.invoke("tell me about dogs")

# Print the content of the LLM's response.
# The answer should be based on the dog document we provided earlier.
print("--- RAG Chain Response ---")
print(response.content)

--- RAG Chain Response ---
According to the provided context, dogs are great companions, known for their loyalty and friendliness.
