# RAG Pipeline Optimization with RAGExperiment

This notebook demonstrates:
1. Setting up a document dataset and evaluation set
2. Using `RAGExperiment` to test multiple RAG pipeline configurations
3. Automatic evaluation with Precision@K, Recall@K, and MRR metrics
4. Ranking and selecting the best pipeline configuration
5. Integrating the optimized pipeline with LangChain for Q&A

In [None]:
import os
os.environ['OPENAI_API_KEY'] = '<your_openai_api_key>'

In [24]:
from typing import List, Dict, Any
import json

# RAGExperiment for automated pipeline optimization
from rag_select import RAGExperiment, ExperimentResults

# Component implementations for search space
from rag_select.parameter_impls.chunking_impls import (
    SlidingWindowChunking,
    LangChainRecursiveChunking,
)
from rag_select.parameter_impls.embedding_impls import (
    HuggingFaceEmbedding,
)
from rag_select.parameter_impls.reranking_impls import (
    CrossEncoderReranker,
)

# LangChain imports for Q&A (consume RAGArtifact as a retriever)
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

## 1. Setup: Create Sample Document Dataset

We'll create a sample document corpus about various topics for testing.

In [25]:
# Define the search space - lists of component instances to test
# RAGExperiment will run the Cartesian product of all combinations

chunking_variants = [
    SlidingWindowChunking(chunk_size=128, chunk_overlap=20),
    SlidingWindowChunking(chunk_size=256, chunk_overlap=50),
    SlidingWindowChunking(chunk_size=512, chunk_overlap=100),
    LangChainRecursiveChunking(chunk_size=512, chunk_overlap=50),
]

embedding_variants = [
    HuggingFaceEmbedding(model_name='sentence-transformers/all-MiniLM-L6-v2'),
]

# Retriever config: pass dicts with params (storage/embedding injected automatically)
retriever_variants = [
    {"top_k": 3},
    {"top_k": 5},
    {"top_k": 5, "reranker": CrossEncoderReranker(model_name='cross-encoder/ms-marco-MiniLM-L-6-v2')},
]

print(f"Chunking variants: {len(chunking_variants)}")
print(f"Embedding variants: {len(embedding_variants)}")
print(f"Retriever variants: {len(retriever_variants)}")
print(f"\nTotal configurations: {len(chunking_variants) * len(embedding_variants) * len(retriever_variants)}")

Chunking variants: 4
Embedding variants: 1
Retriever variants: 3

Total configurations: 12


In [26]:
# -----------------------------
# Define corpus documents
# -----------------------------

documents = [
    {
        "doc_id": "doc_1",
        "text": "Retrieval-Augmented Generation (RAG) combines information retrieval with text generation to improve factual accuracy."
    },
    {
        "doc_id": "doc_2",
        "text": "Vector databases store embeddings and enable efficient similarity search for retrieval systems."
    },
    {
        "doc_id": "doc_3",
        "text": "Chunking strategies such as sliding windows affect recall and precision in RAG pipelines."
    },
    {
        "doc_id": "doc_4",
        "text": "Embedding models map text into dense vector representations used for semantic search."
    },
]


# -----------------------------
# Define evaluation dataset
# -----------------------------

eval_dataset = [
    {
        "query": "What is RAG?",
        "relevant_doc_ids": ["doc_1"]
    },
    {
        "query": "What do vector databases do?",
        "relevant_doc_ids": ["doc_2"]
    },
    {
        "query": "How does chunking affect RAG systems?",
        "relevant_doc_ids": ["doc_3"]
    },
    {
        "query": "What are embeddings used for?",
        "relevant_doc_ids": ["doc_4"]
    },
]

print(f"Documents: {len(documents)}")
print(f"Evaluation queries: {len(eval_dataset)}")

Documents: 4
Evaluation queries: 4


## 2. Create Evaluation Set with Ground Truth

Define queries and their relevant documents for evaluation.

In [27]:
# Create and run the experiment using RAGExperiment
experiment = RAGExperiment(
    dataset=eval_dataset,
    documents=documents,
    search_space={
        "chunking": chunking_variants,
        "embedding": embedding_variants,
        "retriever": retriever_variants,
    },
    metrics=["precision@3", "precision@5", "recall@5", "mrr"],
)

# Run all experiments (Cartesian product of all combinations)
results = experiment.run(verbose=True)

Running 12 experiments...

[1/12] SimpleIngestion | SlidingWindowChunking | HuggingFaceEmbedding | SimpleStorage | SimpleRetriever
  precision@3: 0.4167
  precision@5: 0.2500
  recall@5: 1.2500
  mrr: 0.8750

[2/12] SimpleIngestion | SlidingWindowChunking | HuggingFaceEmbedding | SimpleStorage | SimpleRetriever
  precision@3: 0.4167
  precision@5: 0.2500
  recall@5: 1.2500
  mrr: 0.8750

[3/12] SimpleIngestion | SlidingWindowChunking | HuggingFaceEmbedding | SimpleStorage | SimpleRetriever
  precision@3: 0.4167
  precision@5: 0.2500
  recall@5: 1.2500
  mrr: 1.0000

[4/12] SimpleIngestion | SlidingWindowChunking | HuggingFaceEmbedding | SimpleStorage | SimpleRetriever
  precision@3: 0.3333
  precision@5: 0.2000
  recall@5: 1.0000
  mrr: 0.8750

[5/12] SimpleIngestion | SlidingWindowChunking | HuggingFaceEmbedding | SimpleStorage | SimpleRetriever
  precision@3: 0.3333
  precision@5: 0.2000
  recall@5: 1.0000
  mrr: 0.8750

[6/12] SimpleIngestion | SlidingWindowChunking | HuggingFaceEmb

## 3. Define Evaluation Metrics

Implement retrieval evaluation metrics.

In [28]:
# Metrics are automatically calculated by RAGExperiment:
# - precision@k: Fraction of retrieved docs that are relevant (at cutoff k)
# - recall@k: Fraction of relevant docs that were retrieved (at cutoff k)
# - mrr: Mean Reciprocal Rank - measures how high the first relevant doc ranks

# The MetricsCalculator class is available for manual use if needed:
from rag_select.experiment import MetricsCalculator

# Example usage:
retrieved = ["doc_1", "doc_3", "doc_2"]
relevant = ["doc_1"]

print(f"Precision@3: {MetricsCalculator.precision_at_k(retrieved, relevant, 3):.4f}")
print(f"Recall@3: {MetricsCalculator.recall_at_k(retrieved, relevant, 3):.4f}")
print(f"MRR: {MetricsCalculator.mrr(retrieved, relevant):.4f}")

Precision@3: 0.3333
Recall@3: 1.0000
MRR: 1.0000


## 4. Define Pipeline Component Variations

Create sets of component instances to test different configurations.

In [29]:
# The search space was defined above with component instances:
# - chunking_variants: Different chunking strategies
# - embedding_variants: Different embedding models  
# - retriever_variants: Different retrieval configurations (top_k, reranker)

# RAGExperiment automatically:
# - Uses SimpleIngestion and SimpleStorage as defaults
# - Creates fresh storage instances per config (stateful)
# - Injects storage/embedding into retrievers

print(f"Search space:")
print(f"  Chunking: {len(chunking_variants)} variants")
print(f"  Embedding: {len(embedding_variants)} variants")
print(f"  Retriever: {len(retriever_variants)} variants")
print(f"\nTotal configs tested: {len(experiment.all_configs)}")

Search space:
  Chunking: 4 variants
  Embedding: 1 variants
  Retriever: 3 variants

Total configs tested: 12


## 5. Run Optimization Experiments

Test each pipeline configuration and collect metrics.

In [30]:
# Rank results by MRR and get the best pipeline
results.rank(by="mrr")

# Print summary of top results
print(results.summary(top_k=5))

# Get the best pipeline as a LangChain-compatible retriever
best_artifact = results.get_best_pipeline()
print(f"\nBest pipeline retrieved and ready for use!")

Top 5 Results (ranked by mrr):

1. SimpleIngestion | SlidingWindowChunking | HuggingFaceEmbedding | SimpleStorage | SimpleRetriever
   Config ID: 3
   mrr: 1.0000
   precision@3: 0.4167
   precision@5: 0.2500
   recall@5: 1.2500

2. SimpleIngestion | SlidingWindowChunking | HuggingFaceEmbedding | SimpleStorage | SimpleRetriever
   Config ID: 6
   mrr: 1.0000
   precision@3: 0.3333
   precision@5: 0.2000
   recall@5: 1.0000

3. SimpleIngestion | SlidingWindowChunking | HuggingFaceEmbedding | SimpleStorage | SimpleRetriever
   Config ID: 9
   mrr: 1.0000
   precision@3: 0.3333
   precision@5: 0.2000
   recall@5: 1.0000

4. SimpleIngestion | LangChainRecursiveChunking | HuggingFaceEmbedding | SimpleStorage | SimpleRetriever
   Config ID: 12
   mrr: 1.0000
   precision@3: 0.3333
   precision@5: 0.2000
   recall@5: 1.0000

5. SimpleIngestion | SlidingWindowChunking | HuggingFaceEmbedding | SimpleStorage | SimpleRetriever
   Config ID: 1
   mrr: 0.8750
   precision@3: 0.4167
   precision@5: 

## 6. Use Best Pipeline for Q&A

Create a simple retrieval function using the best pipeline.

In [31]:
# Test retrieval using the best artifact (RAGArtifact is a LangChain BaseRetriever)
test_query = "What is RAG?"
print(f"Query: {test_query}\n")

docs = best_artifact.invoke(test_query)
print("Retrieved Context:")
for i, doc in enumerate(docs, 1):
    print(f"{i}. {doc.page_content}")

Query: What is RAG?

Retrieved Context:
1. Retrieval-Augmented Generation (RAG) combines information retrieval with text generation to improve factual accuracy.
2. Chunking strategies such as sliding windows affect recall and precision in RAG pipelines.
3. accuracy.


In [32]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_template(
    """You are a helpful assistant. Use the provided context to answer the question.

Context:
{context}

Question: {input}
"""
)

# Create Q&A chain using the best artifact as retriever
doc_chain = create_stuff_documents_chain(llm, prompt)
qa_chain = create_retrieval_chain(best_artifact, doc_chain)

# Test with sample questions from the evaluation dataset
sample_questions = [
    eval_dataset[0]['query'],
    eval_dataset[1]['query'],
]

for q in sample_questions:
    out = qa_chain.invoke({"input": q})
    print("Q:", q)
    print("A:", out["answer"])
    print("-" * 80)

Q: What is RAG?
A: Retrieval-Augmented Generation (RAG) is a method that combines information retrieval with text generation to enhance the factual accuracy of generated content. It leverages external information sources to provide more reliable and contextually relevant responses.
--------------------------------------------------------------------------------
Q: What do vector databases do?
A: Vector databases store embeddings and enable efficient similarity search for retrieval systems.
--------------------------------------------------------------------------------


## 7. Save Best Configuration

Save the best pipeline configuration for later use.

In [33]:
# Save best configuration info using pickle
import pickle

# Get the best result from ranked results
best_result = list(results)[0]  # First item after ranking

best_info = {
    'config_name': best_result.config_name,
    'metrics': best_result.metrics,
    'component_params': best_result.component_params,
}

with open('best_rag_config.json', 'w') as f:
    json.dump(best_info, f, indent=2)
print("Saved best configuration info to best_rag_config.json")

with open('best_artifact.pkl', 'wb') as f:
    pickle.dump(best_artifact, f)
print("Saved best artifact to best_artifact.pkl")

Saved best configuration info to best_rag_config.json
Saved best artifact to best_artifact.pkl


## 8. Example: Creating a Custom Pipeline

Demonstrate how to create a pipeline with specific component instances.

In [34]:
# Example: Creating a custom pipeline manually (without RAGExperiment)
from rag_select import RAGClient
from rag_select.parameter_impls.ingestion_impls import SimpleIngestion
from rag_select.parameter_impls.storage_impls import SimpleStorage
from rag_select.parameter_impls.retriever_impls import SimpleRetriever

# Create custom components
custom_chunking = SlidingWindowChunking(chunk_size=256, chunk_overlap=30)
custom_embedding = HuggingFaceEmbedding(model_name='sentence-transformers/all-MiniLM-L6-v2')
custom_storage = SimpleStorage()
custom_retriever = SimpleRetriever(
    storage=custom_storage,
    embedding=custom_embedding,
    top_k=5,
    reranker=CrossEncoderReranker(model_name='cross-encoder/ms-marco-MiniLM-L-6-v2')
)

# Build custom client
custom_client = RAGClient(
    ingestion=SimpleIngestion(),
    chunking=custom_chunking,
    embedding=custom_embedding,
    storage=custom_storage,
    retriever=custom_retriever,
)

# Ingest documents
custom_client.upload_documents(documents)

# Test retrieval
query = "What is RAG?"
results_manual = custom_client.retrieve(query)
print(f"Query: {query}\n")
print("Results:")
for i, result in enumerate(results_manual, 1):
    print(f"  {i}. {result['text'][:80]}...")

Query: What is RAG?

Results:
  1. Retrieval-Augmented Generation (RAG) combines information retrieval with text ge...
  2. Chunking strategies such as sliding windows affect recall and precision in RAG p...
  3. Vector databases store embeddings and enable efficient similarity search for ret...
  4. Embedding models map text into dense vector representations used for semantic se...
