<a href="https://colab.research.google.com/github/Levara/carnet--workshop-notebooks/blob/master/04_rag_experimentation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Session 2.4: RAG Experimentation Framework

## Overview

This notebook is designed for **rapid experimentation** with RAG systems. Unlike notebook 03 which focuses on learning, this notebook lets you:

- ‚úÖ Quickly test different chunk sizes
- ‚úÖ Compare different embedding/LLM models
- ‚úÖ Experiment with your own documents
- ‚úÖ Run batch comparisons
- ‚úÖ Configure everything from the bottom cells (no scrolling!)

**Quick Start:**
1. Run setup cells (1-3)
2. Upload your documents (see instructions below)
3. Jump to any experiment section at the bottom
4. All configuration is done via function parameters

## 1. Setup and Installation

In [None]:
# Install required packages
!pip install chromadb openai pypdf2 requests tabulate -q

print("‚úì Packages installed")

In [None]:
# Imports
import chromadb
from chromadb.config import Settings
import json
from typing import List, Dict, Optional, Tuple, Any
from openai import OpenAI
import os
import PyPDF2
import re
from pathlib import Path
from tabulate import tabulate
from datetime import datetime

print("‚úì Imports loaded")

In [None]:
# Configure OpenRouter API
from google.colab import userdata
os.environ['OPENROUTER_API_KEY'] = userdata.get('OPENROUTER_API_KEY')

OPENROUTER_API_KEY = os.getenv('OPENROUTER_API_KEY', None)

if not OPENROUTER_API_KEY:
    raise ValueError("Please set your OpenRouter API key in Colab secrets.")

print("‚úì OpenRouter API configured")

## 2. Document Upload Instructions

### How to Upload Your Documents to Google Colab

You have **3 documents** that make up your knowledge base. Follow these steps:

#### **Method 1: Manual Upload (Recommended for small files)**

1. **Click the folder icon** üìÅ in the left sidebar of Colab
2. **Click the upload button** ‚¨ÜÔ∏è (looks like a file with an arrow)
3. **Select your 3 documents** (PDF, TXT, or other text files)
4. **Wait for upload to complete** - you'll see the files appear in the file browser
5. **Note the file paths** - they'll be in `/content/filename.pdf` format

#### **Method 2: Mount Google Drive (Recommended for larger files or permanent storage)**

1. **Run the cell below** to mount your Google Drive
2. **Upload your documents** to a folder in Google Drive (e.g., `My Drive/RAG_Workshop/`)
3. **Access them** using paths like `/content/drive/MyDrive/RAG_Workshop/document.pdf`

```python
# Uncomment and run this to mount Google Drive
# from google.colab import drive
# drive.mount('/content/drive')
```

#### **Method 3: Direct Upload in Code**

Run the cell below to get an interactive file upload button:

```python
# Uncomment and run this to upload files via code
# from google.colab import files
# uploaded = files.upload()
# print(f"Uploaded files: {list(uploaded.keys())}")
```

---

### After Uploading

Once your files are uploaded, you'll specify their paths when running experiments:

```python
document_paths = [
    "/content/document1.pdf",
    "/content/document2.pdf",
    "/content/document3.txt"
]
```

**‚ö†Ô∏è Note**: Files uploaded directly to `/content/` are **temporary** and will be deleted when the runtime restarts. Use Google Drive for permanent storage.

## 3. Core RAG Components

These are reusable classes and functions. **You don't need to modify these** - just run them once.

In [None]:
class OpenRouterEmbeddingFunction:
    """
    Custom embedding function for ChromaDB using OpenRouter.
    """
    def __init__(self, api_key: str, model: str):
        self.client = OpenAI(
            base_url="https://openrouter.ai/api/v1",
            api_key=api_key
        )
        self.model = model

    def __call__(self, input: List[str]) -> List[List[float]]:
        """Generate embeddings for documents."""
        response = self.client.embeddings.create(
            input=input,
            model=self.model
        )
        return [item.embedding for item in response.data]

    def embed_query(self, input: str) -> List[List[float]]:
        """Generate embedding for a single query."""
        response = self.client.embeddings.create(
            input=input,
            model=self.model
        )
        return [response.data[0].embedding]


def load_pdf(file_path: str) -> Dict[str, Any]:
    """Load PDF and extract text."""
    with open(file_path, 'rb') as file:
        pdf_reader = PyPDF2.PdfReader(file)
        full_text = ""
        for page in pdf_reader.pages:
            full_text += page.extract_text() + "\n"

        return {
            "source": file_path,
            "full_text": full_text,
            "num_pages": len(pdf_reader.pages)
        }


def load_text_file(file_path: str) -> Dict[str, Any]:
    """Load plain text file."""
    with open(file_path, 'r', encoding='utf-8') as file:
        return {
            "source": file_path,
            "full_text": file.read(),
            "num_pages": 1
        }


def load_document(file_path: str) -> Dict[str, Any]:
    """Load document (auto-detect PDF or text)."""
    if file_path.lower().endswith('.pdf'):
        return load_pdf(file_path)
    else:
        return load_text_file(file_path)


def clean_text(text: str) -> str:
    """Clean extracted text."""
    text = re.sub(r'\n\s*\n', '\n\n', text)
    text = re.sub(r' +', ' ', text)
    text = re.sub(r'\n\d+\n', '\n', text)
    return text.strip()


def chunk_text_fixed(text: str, chunk_size: int = 500, overlap: int = 50) -> List[str]:
    """Chunk text using fixed word count with overlap."""
    words = text.split()
    chunks = []

    for i in range(0, len(words), chunk_size - overlap):
        chunk = " ".join(words[i:i + chunk_size])
        if len(chunk.strip()) > 0:
            chunks.append(chunk)

    return chunks


print("‚úì Core functions defined")

In [None]:
class RAGExperiment:
    """
    Complete RAG system for experimentation.
    All configuration is done via constructor parameters.
    """

    def __init__(
        self,
        collection_name: str = "rag_experiment",
        embedding_model: str = "openai/text-embedding-3-small",
        llm_model: str = "openai/gpt-4o-mini",
        chunk_size: int = 500,
        overlap: int = 50,
        top_k: int = 5,
        temperature: float = 0.3,
        api_key: str = None
    ):
        """
        Initialize RAG experiment.

        Args:
            collection_name: ChromaDB collection name
            embedding_model: Model for embeddings (e.g., 'openai/text-embedding-3-small')
            llm_model: Model for generation (e.g., 'openai/gpt-4o-mini')
            chunk_size: Number of words per chunk
            overlap: Number of overlapping words
            top_k: Number of chunks to retrieve
            temperature: LLM temperature
            api_key: OpenRouter API key (uses env var if None)
        """
        self.collection_name = collection_name
        self.embedding_model = embedding_model
        self.llm_model = llm_model
        self.chunk_size = chunk_size
        self.overlap = overlap
        self.top_k = top_k
        self.temperature = temperature

        self.api_key = api_key or os.getenv('OPENROUTER_API_KEY')
        if not self.api_key:
            raise ValueError("API key required")

        # Initialize clients
        self.llm_client = OpenAI(
            base_url="https://openrouter.ai/api/v1",
            api_key=self.api_key
        )

        self.embedding_function = OpenRouterEmbeddingFunction(
            api_key=self.api_key,
            model=self.embedding_model
        )

        # ChromaDB client and collection (will be set during indexing)
        self.chroma_client = None
        self.collection = None

    def index_documents(
        self,
        document_paths: List[str],
        recreate: bool = True,
        verbose: bool = True
    ) -> Dict[str, Any]:
        """
        Load and index documents.

        Args:
            document_paths: List of file paths to documents
            recreate: If True, delete existing collection and recreate
            verbose: Print progress

        Returns:
            Dictionary with indexing statistics
        """
        if verbose:
            print(f"\n{'='*80}")
            print("INDEXING DOCUMENTS")
            print(f"{'='*80}")
            print(f"Collection: {self.collection_name}")
            print(f"Embedding model: {self.embedding_model}")
            print(f"Chunk size: {self.chunk_size} words")
            print(f"Overlap: {self.overlap} words")
            print(f"{'='*80}\n")

        # Initialize ChromaDB
        self.chroma_client = chromadb.Client()

        # Delete existing collection if requested
        if recreate:
            try:
                self.chroma_client.delete_collection(name=self.collection_name)
                if verbose:
                    print(f"‚úì Deleted existing collection '{self.collection_name}'")
            except:
                pass

        # Create collection with cosine similarity
        self.collection = self.chroma_client.create_collection(
            name=self.collection_name,
            embedding_function=self.embedding_function,
            metadata={"hnsw:space": "cosine"}
        )

        # Process documents
        all_chunks = []
        all_ids = []
        all_metadata = []

        for doc_idx, doc_path in enumerate(document_paths):
            if verbose:
                print(f"Processing: {Path(doc_path).name}")

            # Load document
            doc = load_document(doc_path)

            # Clean and chunk
            cleaned_text = clean_text(doc['full_text'])
            chunks = chunk_text_fixed(cleaned_text, self.chunk_size, self.overlap)

            if verbose:
                print(f"  Created {len(chunks)} chunks")

            # Add to lists
            for chunk_idx, chunk in enumerate(chunks):
                all_chunks.append(chunk)
                all_ids.append(f"doc{doc_idx}_chunk{chunk_idx}")
                all_metadata.append({
                    "source": Path(doc_path).name,
                    "full_path": doc_path,
                    "chunk_index": chunk_idx,
                    "total_chunks": len(chunks)
                })

        # Add to ChromaDB
        if verbose:
            print(f"\nAdding {len(all_chunks)} chunks to ChromaDB...")

        self.collection.add(
            documents=all_chunks,
            ids=all_ids,
            metadatas=all_metadata
        )

        stats = {
            "num_documents": len(document_paths),
            "num_chunks": len(all_chunks),
            "chunk_size": self.chunk_size,
            "overlap": self.overlap
        }

        if verbose:
            print(f"\n‚úì Indexing complete!")
            print(f"  Documents indexed: {stats['num_documents']}")
            print(f"  Total chunks: {stats['num_chunks']}")
            print(f"{'='*80}\n")

        return stats

    def query(
        self,
        question: str,
        top_k: int = None,
        temperature: float = None,
        show_sources: bool = False,
        language: str = "hr"
    ) -> Dict[str, Any]:
        """
        Query the RAG system.

        Args:
            question: User question
            top_k: Number of chunks to retrieve (uses default if None)
            temperature: LLM temperature (uses default if None)
            show_sources: Print retrieved sources
            language: Response language ('hr' or 'en')

        Returns:
            Dictionary with answer and metadata
        """
        if self.collection is None:
            raise ValueError("No documents indexed. Call index_documents() first.")

        top_k = top_k or self.top_k
        temperature = temperature or self.temperature

        # Retrieve chunks
        results = self.collection.query(
            query_texts=[question],
            n_results=top_k
        )

        retrieved_chunks = results['documents'][0]
        metadatas = results['metadatas'][0]
        distances = results['distances'][0]
        similarities = [1 - d for d in distances]

        if show_sources:
            print(f"\n{'='*80}")
            print("RETRIEVED SOURCES")
            print(f"{'='*80}")
            for i, (chunk, meta, sim) in enumerate(zip(retrieved_chunks, metadatas, similarities), 1):
                print(f"\n[{i}] Similarity: {sim:.3f} | Source: {meta['source']}")
                print(f"    {chunk[:200]}...")
            print(f"{'='*80}\n")

        # Construct prompt
        context = "\n\n".join([
            f"[Dokument {i+1}]: {chunk}"
            for i, chunk in enumerate(retrieved_chunks)
        ])

        if language == "hr":
            prompt = f"""Ti si pametan asistent koji odgovara na pitanja na temelju dostavljenog konteksta.

Kontekst:
{context}

Pitanje: {question}

Upute:
- Koristi SAMO informacije iz dostavljenog konteksta
- Ako odgovor nije u kontekstu, reci "Nemam tu informaciju u dostavljenim dokumentima"
- Budi koncizan ali potpun u odgovoru
- Odgovaraj na hrvatskom jeziku

Odgovor:"""
        else:
            prompt = f"""You are a helpful assistant that answers questions based on the provided context.

Context:
{context}

Question: {question}

Instructions:
- Use ONLY information from the provided context
- If the answer is not in the context, say "I don't have this information in the provided documents"
- Be concise but complete in your answer

Answer:"""

        # Generate answer
        response = self.llm_client.chat.completions.create(
            model=self.llm_model,
            messages=[{"role": "user", "content": prompt}],
            temperature=temperature,
            max_tokens=1000
        )

        answer = response.choices[0].message.content

        return {
            "answer": answer,
            "sources": metadatas,
            "retrieved_chunks": retrieved_chunks,
            "similarities": similarities,
            "top_k": top_k,
            "temperature": temperature
        }

    def get_config(self) -> Dict[str, Any]:
        """Get current configuration."""
        return {
            "collection_name": self.collection_name,
            "embedding_model": self.embedding_model,
            "llm_model": self.llm_model,
            "chunk_size": self.chunk_size,
            "overlap": self.overlap,
            "top_k": self.top_k,
            "temperature": self.temperature
        }


print("‚úì RAGExperiment class defined")

## 4. Comparison and Evaluation Utilities

In [None]:
def compare_configurations(
    document_paths: List[str],
    test_questions: List[str],
    configs: List[Dict[str, Any]],
    language: str = "hr"
) -> List[Dict[str, Any]]:
    """
    Compare multiple RAG configurations on the same questions.

    Args:
        document_paths: List of document paths to index
        test_questions: List of questions to test
        configs: List of configuration dicts (each with chunk_size, overlap, top_k, etc.)
        language: Response language

    Returns:
        List of result dictionaries
    """
    results = []

    for i, config in enumerate(configs, 1):
        print(f"\n{'='*80}")
        print(f"CONFIGURATION {i}/{len(configs)}")
        print(f"{'='*80}")
        print(f"Config: {config}")
        print(f"{'='*80}\n")

        # Create experiment with this config
        exp = RAGExperiment(**config)

        # Index documents
        exp.index_documents(document_paths, verbose=True)

        # Test all questions
        config_results = {
            "config": config,
            "questions": []
        }

        for q_idx, question in enumerate(test_questions, 1):
            print(f"\nQuestion {q_idx}/{len(test_questions)}: {question}")
            result = exp.query(question, show_sources=False, language=language)

            config_results["questions"].append({
                "question": question,
                "answer": result["answer"],
                "avg_similarity": sum(result["similarities"]) / len(result["similarities"]),
                "max_similarity": max(result["similarities"]),
                "num_sources": len(result["sources"])
            })

            print(f"  Answer: {result['answer'][:150]}...")
            print(f"  Avg similarity: {config_results['questions'][-1]['avg_similarity']:.3f}")

        results.append(config_results)

    return results


def print_comparison_table(comparison_results: List[Dict[str, Any]]):
    """
    Print a comparison table of results.

    Args:
        comparison_results: Results from compare_configurations()
    """
    print(f"\n{'='*80}")
    print("COMPARISON SUMMARY")
    print(f"{'='*80}\n")

    # Create table data
    headers = ["Config", "Chunk Size", "Overlap", "Top-K", "Avg Similarity", "Answer Length"]
    rows = []

    for i, result in enumerate(comparison_results, 1):
        config = result["config"]
        avg_sim = sum(q["avg_similarity"] for q in result["questions"]) / len(result["questions"])
        avg_length = sum(len(q["answer"]) for q in result["questions"]) / len(result["questions"])

        rows.append([
            f"Config {i}",
            config.get("chunk_size", "N/A"),
            config.get("overlap", "N/A"),
            config.get("top_k", "N/A"),
            f"{avg_sim:.3f}",
            f"{avg_length:.0f} chars"
        ])

    print(tabulate(rows, headers=headers, tablefmt="grid"))
    print()


def export_results(results: List[Dict[str, Any]], filename: str = "rag_results.json"):
    """
    Export results to JSON file.

    Args:
        results: Results from compare_configurations()
        filename: Output filename
    """
    with open(filename, 'w', encoding='utf-8') as f:
        json.dump(results, f, ensure_ascii=False, indent=2)

    print(f"‚úì Results exported to {filename}")


print("‚úì Comparison utilities defined")

---

# EXPERIMENTS START HERE

## Everything below can be run independently with full configuration!

---

## Experiment 1: Quick Single Query Test

**Test a single question with custom configuration.**

All configuration is done here - no need to scroll up!

In [None]:
# ============================================================================
# CONFIGURE EVERYTHING HERE
# ============================================================================

# Your document paths (update these after uploading)
DOCUMENT_PATHS = [
    "/content/loomen_faq.md",
    "/content/moodle.md",
    "/content/tecaj.md"
]

# RAG configuration
EXPERIMENT_CONFIG = {
    "collection_name": "quick_test",
    "embedding_model": "openai/text-embedding-3-small",
    "llm_model": "openai/gpt-4o-mini",
    "chunk_size": 500,
    "overlap": 50,
    "top_k": 5,
    "temperature": 0.3
}

# Your question
QUESTION = "≈†to je RAG sustav?"  # Change this to your question

# Language for response ("hr" or "en")
LANGUAGE = "hr"

# ============================================================================
# RUN EXPERIMENT
# ============================================================================

# Create experiment
exp = RAGExperiment(**EXPERIMENT_CONFIG)

# Index documents
exp.index_documents(DOCUMENT_PATHS, verbose=True)

# Query
result = exp.query(QUESTION, show_sources=True, language=LANGUAGE)

# Display answer
def display_answer(result):
  print(f"\n{'='*80}")
  print("ANSWER")
  print(f"{'='*80}\n")
  print(result["answer"])
  print(f"\n{'='*80}")
  print(f"Avg similarity: {sum(result['similarities']) / len(result['similarities']):.3f}")
  print(f"Max similarity: {max(result['similarities']):.3f}")
  print(f"{'='*80}\n")

display_answer(result)

### Try out with your own questions

Just change the `question` variable and re-run the cell again.

In [None]:
# Test your own query
question = "U kojim sluƒçajevima je potrebno direktno kontaktirati administratore Loomen-a?"
top_k = 2


result  = exp.query(question, show_sources=True, language="hr", top_k=top_k)
display_answer(result)

## Experiment 2: Compare Different Chunk Sizes

**Test how chunk size affects retrieval quality.**

In [None]:
# ============================================================================
# CONFIGURE EVERYTHING HERE
# ============================================================================
# Hard questions json
HARD_QUESTIONS_FILENAME = "/content/loomen_faq.hard_questions.json"
# Load json from content folder
with open(HARD_QUESTIONS_FILENAME, 'r') as f:
    HARD_QUESTIONS_DATA = json.load(f)

HARD_QUESTIONS = [ x["question"] for x in HARD_QUESTIONS_DATA ]


# Your document paths
DOCUMENT_PATHS = [
    "/content/loomen_faq.md",
    "/content/moodle.md",
    "/content/tecaj.md"
]

# Test questions
TEST_QUESTIONS = [
    "≈†to je RAG?",
    "Kako funkcionira pretra≈æivanje?",
    "Koje su prednosti ovog sustava?",
    # Extend with your questions
]
TEST_QUESTIONS += HARD_QUESTIONS

# Chunk sizes to compare
CHUNK_SIZES = [256, 512, 1024]

# Fixed parameters
BASE_CONFIG = {
    "embedding_model": "openai/text-embedding-3-small",
    "llm_model": "openai/gpt-4o-mini",
    "overlap": 50,
    "top_k": 5,
    "temperature": 0.3
}

LANGUAGE = "hr"

# ============================================================================
# RUN EXPERIMENT
# ============================================================================

# Create configs for each chunk size
configs = [
    {**BASE_CONFIG, "collection_name": f"chunk_{size}", "chunk_size": size}
    for size in CHUNK_SIZES
]

# Run comparison
results = compare_configurations(
    document_paths=DOCUMENT_PATHS,
    test_questions=TEST_QUESTIONS,
    configs=configs,
    language=LANGUAGE
)

# Print summary table
print_comparison_table(results)

# Export results
export_results(results, "chunk_size_comparison.json")

## Experiment 3: Compare Different Models

**Test different embedding or LLM models.**

In [None]:
# ============================================================================
# CONFIGURE EVERYTHING HERE
# ============================================================================
# ============================================================================
# CONFIGURE EVERYTHING HERE
# ============================================================================
# Hard questions json
HARD_QUESTIONS_FILENAME = "/content/loomen_faq.hard_questions.json"
# Load json from content folder
with open(HARD_QUESTIONS_FILENAME, 'r') as f:
    HARD_QUESTIONS_DATA = json.load(f)

HARD_QUESTIONS = [ x["question"] for x in HARD_QUESTIONS_DATA ]


# Your document paths
DOCUMENT_PATHS = [
    "/content/loomen_faq.md",
    "/content/moodle.md",
    "/content/tecaj.md"
]

# Test questions
TEST_QUESTIONS = [
    "≈†to je RAG?",
    "Kako funkcionira pretra≈æivanje?",
    "Koje su prednosti ovog sustava?",
    # Extend with your questions
]
TEST_QUESTIONS += HARD_QUESTIONS


# Models to compare
EMBEDDING_MODELS = [
    #"openai/text-embedding-3-small",
    "qwen/qwen3-embedding-8b"

]

# Or compare LLM models:
# LLM_MODELS = [
#     "openai/gpt-4o-mini",
#     "openai/gpt-4o",
#     "anthropic/claude-3.5-sonnet"
# ]

# Fixed parameters
BASE_CONFIG = {
    "llm_model": "openai/gpt-4o-mini",
    "chunk_size": 250,
    "overlap": 50,
    "top_k": 5,
    "temperature": 0.3
}

LANGUAGE = "hr"

# ============================================================================
# RUN EXPERIMENT
# ============================================================================

# Create configs for each embedding model
configs = [
    {**BASE_CONFIG, "collection_name": f"embed_{i}", "embedding_model": model}
    for i, model in enumerate(EMBEDDING_MODELS)
]

# Or for LLM models:
# configs = [
#     {**BASE_CONFIG, "collection_name": f"llm_{i}", "llm_model": model}
#     for i, model in enumerate(LLM_MODELS)
# ]

# Run comparison
results = compare_configurations(
    document_paths=DOCUMENT_PATHS,
    test_questions=TEST_QUESTIONS,
    configs=configs,
    language=LANGUAGE
)

# Print summary
print_comparison_table(results)
export_results(results, "model_comparison.json")

## Experiment 4: Top-K Comparison

**Test different numbers of retrieved chunks.**

In [None]:
# ============================================================================
# CONFIGURE EVERYTHING HERE
# ============================================================================

# Your document paths
DOCUMENT_PATHS = [
    "/content/loomen_faq.md",
    "/content/moodle.md",
    "/content/tecaj.md"
]

TEST_QUESTIONS = [
    "Koja je glavna tema dokumenta?",
    "Objasni kljuƒçne koncepte."
]

# Top-K values to compare
TOP_K_VALUES = [3, 5, 10]

BASE_CONFIG = {
    "embedding_model": "openai/text-embedding-3-small",
    "llm_model": "openai/gpt-4o-mini",
    "chunk_size": 500,
    "overlap": 50,
    "temperature": 0.3
}

LANGUAGE = "hr"

# ============================================================================
# RUN EXPERIMENT
# ============================================================================

configs = [
    {**BASE_CONFIG, "collection_name": f"topk_{k}", "top_k": k}
    for k in TOP_K_VALUES
]

results = compare_configurations(
    document_paths=DOCUMENT_PATHS,
    test_questions=TEST_QUESTIONS,
    configs=configs,
    language=LANGUAGE
)

print_comparison_table(results)
export_results(results, "topk_comparison.json")

## Experiment 5: Custom Batch Testing

**Create your own custom experiment with any parameters.**

In [None]:
# ============================================================================
# FULLY CUSTOMIZABLE EXPERIMENT
# ============================================================================

DOCUMENT_PATHS = [
    "/content/document1.pdf",
    "/content/document2.pdf",
    "/content/document3.txt"
]

TEST_QUESTIONS = [
    "Your question 1?",
    "Your question 2?",
    "Your question 3?"
]

# Define your own configurations to compare
CUSTOM_CONFIGS = [
    {
        "collection_name": "config_a",
        "embedding_model": "openai/text-embedding-3-small",
        "llm_model": "openai/gpt-4o-mini",
        "chunk_size": 256,
        "overlap": 30,
        "top_k": 3,
        "temperature": 0.2
    },
    {
        "collection_name": "config_b",
        "embedding_model": "openai/text-embedding-3-small",
        "llm_model": "openai/gpt-4o-mini",
        "chunk_size": 512,
        "overlap": 50,
        "top_k": 5,
        "temperature": 0.3
    },
    # Add more configurations as needed
]

LANGUAGE = "hr"

# ============================================================================
# RUN EXPERIMENT
# ============================================================================

results = compare_configurations(
    document_paths=DOCUMENT_PATHS,
    test_questions=TEST_QUESTIONS,
    configs=CUSTOM_CONFIGS,
    language=LANGUAGE
)

print_comparison_table(results)
export_results(results, "custom_experiment.json")

# Print detailed results for each question
print(f"\n{'='*80}")
print("DETAILED RESULTS")
print(f"{'='*80}\n")

for i, result in enumerate(results, 1):
    print(f"\nConfiguration {i}:")
    print(f"  {result['config']}\n")

    for q_result in result["questions"]:
        print(f"  Q: {q_result['question']}")
        print(f"  A: {q_result['answer'][:200]}...")
        print(f"  Similarity: {q_result['avg_similarity']:.3f}\n")

## Interactive Playground

**Quick testing area - modify and run repeatedly.**

In [None]:
# ============================================================================
# QUICK PLAYGROUND - Run this cell multiple times with different questions
# ============================================================================

# First time: Set up the experiment (comment out after first run)
playground_exp = RAGExperiment(
    collection_name="playground",
    embedding_model="openai/text-embedding-3-small",
    llm_model="openai/gpt-4o-mini",
    chunk_size=500,
    overlap=50,
    top_k=5
)

playground_exp.index_documents([
    "/content/document1.pdf",
    "/content/document2.pdf",
    "/content/document3.txt"
], verbose=False)

print("‚úì Playground ready!\n")

# ============================================================================
# Ask questions here - change and run repeatedly
# ============================================================================

question = "Your question here?"  # <-- CHANGE THIS

result = playground_exp.query(question, show_sources=True, language="hr")

print(f"\n{'='*80}")
print("ANSWER")
print(f"{'='*80}\n")
print(result["answer"])
print(f"\n{'='*80}\n")

---

## Summary

This notebook provides a framework for rapid RAG experimentation:

‚úÖ **No scrolling needed** - All config in each experiment cell  
‚úÖ **Quick iterations** - Change parameters and re-run  
‚úÖ **Easy comparisons** - Side-by-side evaluation  
‚úÖ **Flexible** - Use your own documents  
‚úÖ **Export results** - Save comparisons as JSON  

### Tips for Experimentation

1. **Start simple**: Test one variable at a time (chunk size, then top-k, then model)
2. **Use consistent questions**: Same questions across experiments for fair comparison
3. **Check similarities**: Low similarity scores mean irrelevant retrievals
4. **Export results**: Keep track of what works best
5. **Iterate quickly**: This notebook is designed for rapid testing

### Next Steps

- Test with your own documents
- Try different embedding models (text-embedding-3-large, multilingual models)
- Experiment with different LLMs (Claude, GPT-4, etc.)
- Test edge cases (questions not in documents)
- Optimize for your specific use case