# üöÄ RAGScore Demo - Quick RAG Evaluation

This notebook demonstrates how to use RAGScore to evaluate your RAG system.

**Features:**
- ‚úÖ Works in Google Colab and Jupyter
- ‚úÖ Supports local LLMs (Ollama) and cloud APIs
- ‚úÖ One-liner RAG testing with `quick_test()`
- ‚úÖ Returns pandas DataFrames for easy analysis

## 1. Setup

### Option A: Use with Cloud LLM (OpenAI, Anthropic, etc.)

In [None]:
# Install RAGScore
!pip install -q ragscore[notebook,openai]

# Set your API key
import os
os.environ["OPENAI_API_KEY"] = "sk-..."  # Replace with your key

### Option B: Use with Local LLM (Ollama) - FREE & Private!

Run this cell to set up Ollama in Colab (takes ~2 minutes):

In [None]:
# ü™Ñ Magic Cell: Install and start Ollama in Colab
# This runs Ollama as a background process

!curl -fsSL https://ollama.com/install.sh | sh

import subprocess
import time

# Start Ollama server in background
process = subprocess.Popen(
    "ollama serve",
    shell=True,
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL
)
time.sleep(5)  # Wait for server to start

# Pull a model (llama3 is recommended)
!ollama pull llama3

print("‚úÖ Ollama is ready! RAGScore will auto-detect it.")

## 2. Quick Test Your RAG

The `quick_test()` function generates QA pairs from your documents and evaluates your RAG in one call.

In [None]:
from ragscore import quick_test

# Test with an HTTP endpoint
result = quick_test(
    endpoint="http://localhost:8000/query",  # Your RAG API
    docs="docs/",                            # Path to your documents
    n=10,                                    # Number of test questions
    threshold=0.7,                           # Pass if >= 70% correct
)

print(f"Accuracy: {result.accuracy:.0%}")
print(f"Passed: {result.passed}")

### Test with a Function (No Server Needed!)

You can pass a Python function directly - perfect for testing in notebooks:

In [None]:
# Example: Test a simple RAG function
def my_rag(question: str) -> str:
    """Your RAG implementation here."""
    # Replace with your actual RAG logic
    # e.g., vectorstore.similarity_search(question)
    return "This is a placeholder answer."

# Test it!
result = quick_test(
    endpoint=my_rag,  # Pass function directly
    docs="docs/",
    n=5,
)

## 3. Analyze Results with Pandas

In [None]:
# Get results as DataFrame
df = quick_test(
    endpoint="http://localhost:8000/query",
    docs="docs/",
    n=10,
    return_df=True,  # Return DataFrame instead of QuickTestResult
)

# View all results
df

In [None]:
# Filter to see only failures
failures = df[df["is_correct"] == False]
print(f"Found {len(failures)} failures:")
failures[["question", "score", "reason"]]

## 4. Export Corrections for RAG Improvement

RAGScore identifies incorrect answers and provides corrections that can be injected back into your RAG system.

In [None]:
from ragscore.quick_test import export_corrections

# Run test and get corrections
result = quick_test(
    endpoint="http://localhost:8000/query",
    docs="docs/",
    n=20,
)

# View corrections
print(f"Found {len(result.corrections)} corrections:")
for c in result.corrections[:3]:
    print(f"\nQ: {c['question'][:60]}...")
    print(f"Wrong: {c['incorrect_answer'][:60]}...")
    print(f"Correct: {c['correct_answer'][:60]}...")

In [None]:
# Export corrections to file
export_corrections(result, "corrections.jsonl")
print("‚úÖ Corrections saved to corrections.jsonl")
print("Inject these into your RAG to improve accuracy!")

## 5. Use in pytest (CI/CD)

```python
# test_rag.py
from ragscore import quick_test

def test_rag_accuracy():
    result = quick_test(
        endpoint="http://localhost:8000/query",
        docs="docs/",
        n=20,
        threshold=0.8,
        silent=True,
    )
    assert result.passed, f"RAG accuracy too low: {result.accuracy:.0%}"
```

## 6. Full Pipeline: Generate + Evaluate

For comprehensive evaluation, use the two-step pipeline:

In [None]:
from ragscore import run_pipeline, run_evaluation

# Step 1: Generate QA pairs from documents
run_pipeline(paths=["docs/"], concurrency=5)
# Output: output/generated_qas.jsonl

In [None]:
# Step 2: Evaluate RAG against generated QAs
summary = run_evaluation(
    golden_path="output/generated_qas.jsonl",
    endpoint="http://localhost:8000/query",
    output_path="results.json",
)

print(f"Accuracy: {summary.accuracy:.1%}")
print(f"Average Score: {summary.avg_score:.1f}/5.0")

---

## üìö Resources

- **GitHub**: https://github.com/HZYAI/RagScore
- **PyPI**: https://pypi.org/project/ragscore/
- **Issues**: https://github.com/HZYAI/RagScore/issues

‚≠ê Star us on GitHub if you find this useful!