# RAGLab Tutorial: Setup and Environment Configuration

**Welcome to RAGLab!** This notebook series will walk you through RAGLab's comprehensive RAG evaluation framework, including the new component registry system that enables comparative testing of multiple implementations.

## What You'll Learn

This tutorial series covers:

1. **Setup & Configuration** (this notebook) - Environment setup and component registry introduction
2. **Ingest & Index** - Document chunking and FAISS indexing with component comparison
3. **Retrieval Evaluation** - BEIR-style retrieval metrics across different implementations  
4. **Agent Evaluation** - Complete RAG pipeline evaluation with LLM-as-Judge
5. **Analysis & Comparison** - Results analysis and component performance comparison

## RAGLab Overview

RAGLab is a **local-first** RAG evaluation framework featuring:

- **üèõÔ∏è Component Registry**: Compare multiple implementations side-by-side
- **‚öñÔ∏è LLM-as-Judge**: Multi-stage evaluation with insurance risk semantics
- **üìä BEIR Metrics**: Standard retrieval evaluation (Recall@K, Precision@K, nDCG@K)
- **üîç Meta-Evaluation**: Judge reliability assessment and bias detection
- **üìì Notebook-Driven**: Interactive development and analysis workflow
- **üè† Local Storage**: No cloud dependencies, complete data ownership

## Key Innovation: Component Registry

RAGLab's registry system allows you to:
- Register multiple implementations of each component type
- Switch between implementations with simple name changes
- Compare performance across different approaches
- Test new implementations against established baselines

Let's get started!

In [None]:
# Import required libraries
import sys
import os
from pathlib import Path
import numpy as np
import pandas as pd
import logging

# Add src to path
sys.path.append('../src')

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

print("‚úÖ Basic imports successful")

In [None]:
# Import raglab modules
from core.io import DataLoader, RunManager
from core.interfaces import EvaluationExample, Query, Chunk

print("‚úÖ RAGLab modules imported successfully")

In [None]:
# Check system requirements
try:
    import faiss
    print(f"‚úÖ FAISS version: {faiss.__version__}")
except ImportError:
    print("‚ùå FAISS not installed. Install with: pip install faiss-cpu")

try:
    import yaml
    print("‚úÖ YAML support available")
except ImportError:
    print("‚ùå PyYAML not installed. Install with: pip install pyyaml")

# Check directory structure
base_path = Path('..')
required_dirs = ['data', 'artifacts', 'runs', 'src', 'docs']

for dir_name in required_dirs:
    dir_path = base_path / dir_name
    if dir_path.exists():
        print(f"‚úÖ {dir_name}/ directory exists")
    else:
        print(f"‚ùå {dir_name}/ directory missing")
        dir_path.mkdir(parents=True, exist_ok=True)
        print(f"  ‚Üí Created {dir_name}/ directory")

## Configuration

Set up your LLM and embedding providers here.

In [None]:
# Example LLM function (replace with your actual provider)
def example_llm_function(prompt: str, temperature: float = 0.1, max_tokens: int = 500) -> str:
    """
    Replace this with your actual LLM API call.
    
    Example providers:
    - OpenAI: openai.ChatCompletion.create(...)
    - Azure OpenAI: azure_openai.ChatCompletion.create(...)
    - Anthropic: anthropic.messages.create(...)
    """
    # This is a mock implementation
    return f"Mock LLM response to: {prompt[:50]}..."

# Example embedding function (replace with your actual provider) 
def example_embedding_function(texts: list) -> np.ndarray:
    """
    Replace this with your actual embedding API call.
    
    Example providers:
    - OpenAI: openai.embeddings.create(model="text-embedding-ada-002", input=texts)
    - Sentence Transformers: model.encode(texts)
    - Cohere: co.embed(texts=texts, model="embed-english-v3.0")
    """
    # This is a mock implementation - returns random embeddings
    return np.random.random((len(texts), 768))

print("‚úÖ Configuration functions defined")
print("‚ö†Ô∏è  Remember to replace mock functions with real API calls")

## Sample Data Creation

Create some sample data for testing the evaluation pipeline.

In [None]:
# Create sample corpus
sample_documents = [
    "Health insurance copayments are fixed amounts you pay for covered services. For example, you might pay $20 for a doctor visit.",
    "Deductibles are amounts you must pay before your insurance begins to pay. A $1,000 deductible means you pay the first $1,000 of covered services.",
    "Coinsurance is the percentage you pay after meeting your deductible. With 20% coinsurance, you pay 20% and insurance pays 80%.",
    "Out-of-pocket maximums limit your yearly costs. Once you reach this limit, insurance pays 100% of covered services.",
    "Prior authorization requires approval before certain services. Emergency services typically don't require prior authorization."
]

# Save sample corpus
corpus_df = pd.DataFrame({
    'doc_id': [f'doc_{i}' for i in range(len(sample_documents))],
    'text': sample_documents,
    'source': ['sample'] * len(sample_documents)
})

loader = DataLoader(base_path='..')
loader.save_corpus(corpus_df, 'data/corpus.parquet')

print(f"‚úÖ Created sample corpus with {len(sample_documents)} documents")
print("üìÑ Saved to data/corpus.parquet")

In [None]:
# Create sample evaluation tasks
sample_tasks = [
    {
        "example_id": "task_001",
        "question": "What is a copayment in health insurance?",
        "reference_answer": "A copayment is a fixed amount you pay for covered services, such as $20 for a doctor visit.",
        "ground_truth_chunk_ids": ["chunk_0"],
        "beir_failure_scale_factor": 1.0
    },
    {
        "example_id": "task_002", 
        "question": "How does a deductible work?",
        "reference_answer": "A deductible is an amount you must pay before insurance begins to pay. For example, with a $1,000 deductible, you pay the first $1,000 of covered services.",
        "ground_truth_chunk_ids": ["chunk_1"],
        "beir_failure_scale_factor": 1.0
    },
    {
        "example_id": "task_003",
        "question": "What happens after I reach my out-of-pocket maximum?",
        "reference_answer": "Once you reach your out-of-pocket maximum, insurance pays 100% of covered services for the rest of the year.",
        "ground_truth_chunk_ids": ["chunk_3"],
        "beir_failure_scale_factor": 1.0
    }
]

# Save sample tasks
loader.save_tasks(sample_tasks, 'data/tasks.jsonl')

print(f"‚úÖ Created {len(sample_tasks)} sample evaluation tasks")
print("üìÑ Saved to data/tasks.jsonl")

## Validation

Validate that everything is set up correctly.

In [None]:
# Test loading data
loaded_corpus = loader.load_corpus('data/corpus.parquet')
loaded_tasks = loader.load_tasks('data/tasks.jsonl')

print(f"‚úÖ Loaded corpus: {len(loaded_corpus)} documents")
print(f"‚úÖ Loaded tasks: {len(loaded_tasks)} evaluation examples")

# Display sample data
print("\nüìä Sample corpus:")
print(loaded_corpus.head())

print("\nüìä Sample tasks:")
for task in loaded_tasks[:2]:
    print(f"  - {task['example_id']}: {task['question'][:50]}...")

In [None]:
# Test basic functionality
print("üß™ Testing basic functions...")

# Test LLM function
test_response = example_llm_function("Test prompt", temperature=0.1, max_tokens=50)
print(f"‚úÖ LLM function: {test_response[:50]}...")

# Test embedding function
test_embeddings = example_embedding_function(["test text", "another test"])
print(f"‚úÖ Embedding function: shape {test_embeddings.shape}")

print("\nüéâ Setup complete! Ready for notebook 01_ingest_and_index.ipynb")