# RAG Playground: Dynamic Strategy Lab

This notebook allows you to experiment with different RAG Indexing and Retrieval strategies using the generalized framework with **Dynamic Boolean Chaining**.

## Objectives
1. **Load Documents**: Use the robust multi-format loader.
2. **Index**: Create a VectorStore (testing chunking strategies).
3. **Retrieve**: Test different strategies by toggling boolean flags.
4. **Chain**: Combine strategies (e.g. MultiQuery + Compression).

In [None]:
%load_ext autoreload
%autoreload 2

import sys
import os
from pathlib import Path
import logging

# Setup paths
project_root = Path("..").resolve()
sys.path.append(str(project_root))

# Logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

from src.config import get_config, RAGConfig
from src.document_loader import CVDocumentLoader
from src.embeddings import EmbeddingsManager
from src.vector_store import VectorStoreManager
from src.retriever_factory import RetrieverFactory
from src.parent_retriever import CVParentRetriever
from langchain_openai import AzureChatOpenAI

print("Libraries loaded.")

## 1. Setup & Data Loading

In [None]:
# Load Config
config = get_config()
print(f"Data Directory: {config.rag.data_directory_ntb}")

# Initialize Loader
loader = CVDocumentLoader(config.rag.data_directory_ntb)
candidates = loader.load_all_cvs()
documents = loader.convert_to_langchain_documents(candidates)

print(f"Loaded {len(documents)} documents.")

## 2. Infrastructure: Embeddings & VectorStore

In [None]:
# Embeddings
embeddings_mgr = EmbeddingsManager(config.azure)
embedding_model = embeddings_mgr.get_embeddings()

# VectorStore
vs_manager = VectorStoreManager(config.rag, embedding_model)
vectorstore = vs_manager.create_or_load_vectorstore()
print("VectorStore ready.")

# LLM (needed for advanced strategies)
llm = AzureChatOpenAI(
    azure_deployment=config.azure.llm_deployment,
    openai_api_version=config.azure.api_version,
    temperature=0
)

# Initialize Parent/Child Retriever (The original/default indexer)
parent_retriever_indexer = CVParentRetriever(config.rag, vectorstore, config.azure)

# In a real scenario we might need to load_from_existing_store() if not rebuilding
try:
    parent_retriever_indexer.load_from_existing_store()
except Exception as e:
    print(f"Could not load existing store: {e}. You may need to run training/indexing first.")

print("Parent Retriever (Indexer) ready.")

## 3. Experiment: Dynamic Strategy Chaining

We now use boolean flags to compose our retriever pipeline.

In [None]:
def test_retriever(retriever, query, name="Custom"):
    print(f"\n--- Testing Strategy: {name} ---")
    try:
        docs = retriever.invoke(query)
        print(f"Query: '{query}'")
        print(f"Retrieved {len(docs)} docs:")
        for i, doc in enumerate(docs, 1):
            source = doc.metadata.get('candidate_name', 'Unknown')
            print(f"  {i}. {source} (Length: {len(doc.page_content)})")
            print(f"     Preview: {doc.page_content[:150].replace('\n', ' ')}...")
    except Exception as e:
        print(f"Error: {e}")

def create_and_test(name, flags, query, k=3):
    # Set Flags
    config.rag.use_parent_document_retrieval = flags.get("use_parent_document_retrieval", True)
    config.rag.use_hybrid_search = flags.get("use_hybrid_search", False)
    config.rag.use_multi_query = flags.get("use_multi_query", False)
    config.rag.use_contextual_compression = flags.get("use_contextual_compression", False)
    config.rag.use_self_query = flags.get("use_self_query", False)
    
    config.rag.top_k = k
    
    print(f"Configuring: {flags}")
    
    # Factory creation
    retriever = RetrieverFactory.create_retriever(
        config=config, 
        vectorstore=vectorstore,
        llm=llm,
        parent_retriever=parent_retriever_indexer
    )
    test_retriever(retriever, query, name)

In [None]:
TEST_QUERY = "candidates with python and azure experience"

# 1. Standard Vector Search (No Parent Doc, No Extras)
create_and_test(
    "Vector Only", 
    {"use_parent_document_retrieval": False}, 
    TEST_QUERY
)

# 2. Parent Document (The Default)
create_and_test(
    "Parent Document", 
    {"use_parent_document_retrieval": True}, 
    TEST_QUERY
)

# 3. Multi-Query (Wraps Parent)
create_and_test(
    "Multi-Query", 
    {"use_parent_document_retrieval": True, "use_multi_query": True}, 
    TEST_QUERY
)

# 4. Compression (Wraps Parent)
create_and_test(
    "Compression", 
    {"use_parent_document_retrieval": True, "use_contextual_compression": True}, 
    TEST_QUERY
)

## 4. Chaining: The Full Power
Now we chain them together just by setting multiple flags to `True`.
The factory handles the wrapping order: Base -> Hybrid -> MultiQuery -> Compression.

In [None]:
# 5. CHAINED: MultiQuery + Compression
create_and_test(
    "CHAINED: MultiQuery + Compression",
    {
        "use_parent_document_retrieval": True,
        "use_multi_query": True,
        "use_contextual_compression": True
    },
    TEST_QUERY
)