# üöÄ RAGScore Complete Demo: Build & Test a RAG in 5 Minutes

This notebook shows the **complete workflow**:
1. üìÑ Load a PDF (from URL or local file)
2. üîß Build a minimal RAG with SentenceTransformers
3. ‚úÖ Test it with RAGScore

**No external servers needed** - everything runs in this notebook!

## 1. Install Dependencies

In [None]:
!pip install -q ragscore[notebook] sentence-transformers pypdf2

# Safety net for Colab's event loop
import nest_asyncio

nest_asyncio.apply()
print("‚úÖ Dependencies installed")

## 2. Load Your PDF

Choose **Option A** (download from URL) or **Option B** (use local file).

In [None]:
# === OPTION A: Download a sample PDF from the web ===
PDF_URL = "https://arxiv.org/pdf/2005.11401.pdf"  # RAG paper by Facebook

import os
import urllib.request

os.makedirs("docs", exist_ok=True)
pdf_path = "docs/rag_paper.pdf"

if not os.path.exists(pdf_path):
    print(f"üì• Downloading PDF from {PDF_URL}...")
    urllib.request.urlretrieve(PDF_URL, pdf_path)
    print(f"‚úÖ Saved to {pdf_path}")
else:
    print(f"‚úÖ Using existing {pdf_path}")

In [None]:
# === OPTION B: Use a local PDF (uncomment and modify) ===
# pdf_path = "/content/your_document.pdf"  # Change this path
# print(f"Using local file: {pdf_path}")

## 3. Extract Text from PDF

In [None]:
import PyPDF2

from ragscore.data_processing import chunk_text


def extract_text_from_pdf(pdf_path: str) -> str:
    """Extract all text from a PDF file."""
    text = ""
    with open(pdf_path, "rb") as f:
        reader = PyPDF2.PdfReader(f)
        for page in reader.pages:
            text += page.extract_text() or ""
    return text

# Extract and chunk
raw_text = extract_text_from_pdf(pdf_path)
chunks = chunk_text(raw_text)

print(f"üìÑ Extracted {len(raw_text):,} characters")
print(f"üì¶ Created {len(chunks)} chunks")
print(f"\nüìù Sample chunk:\n{chunks[0][:300]}...")

## 4. Build a Minimal RAG

This is a simple but functional RAG using:
- **SentenceTransformers** for embeddings
- **Cosine similarity** for retrieval
- **Top-k chunks** as context

In [None]:
import numpy as np
from sentence_transformers import SentenceTransformer


class SimpleRAG:
    """
    A minimal RAG implementation for demonstration.

    In production, you'd use:
    - A vector database (Pinecone, Weaviate, Chroma)
    - An LLM for answer generation (GPT-4, Claude, Llama)
    """

    def __init__(self, chunks: list[str], model_name: str = "all-MiniLM-L6-v2"):
        print(f"üîß Loading embedding model: {model_name}")
        self.model = SentenceTransformer(model_name)
        self.chunks = chunks

        print(f"üìä Encoding {len(chunks)} chunks...")
        self.embeddings = self.model.encode(chunks, show_progress_bar=True)
        print("‚úÖ RAG ready!")

    def retrieve(self, question: str, top_k: int = 3) -> list[str]:
        """Retrieve the most relevant chunks for a question."""
        q_embedding = self.model.encode([question])[0]

        # Cosine similarity
        scores = np.dot(self.embeddings, q_embedding) / (
            np.linalg.norm(self.embeddings, axis=1) * np.linalg.norm(q_embedding)
        )

        # Get top-k indices
        top_indices = np.argsort(scores)[-top_k:][::-1]
        return [self.chunks[i] for i in top_indices]

    def query(self, question: str) -> str:
        """
        Answer a question using retrieved context.

        Note: This simple version just returns the best chunk.
        A real RAG would pass this to an LLM for synthesis.
        """
        relevant_chunks = self.retrieve(question, top_k=1)

        # Simple answer: return the most relevant chunk
        # In production: send to LLM with prompt like:
        # f"Context: {relevant_chunks}\n\nQuestion: {question}\n\nAnswer:"
        return relevant_chunks[0] if relevant_chunks else "No relevant information found."

# Build the RAG
rag = SimpleRAG(chunks)

## 5. Test the RAG Manually

In [None]:
# Try a few questions
test_questions = [
    "What is RAG?",
    "How does retrieval work?",
    "What are the benefits of RAG?",
]

for q in test_questions:
    print(f"\n‚ùì {q}")
    answer = rag.query(q)
    print(f"üí¨ {answer[:200]}...")

---

## 6. üéØ Test with RAGScore!

Now let's use RAGScore to systematically evaluate this RAG.

### ‚ö†Ô∏è Expected: Low Scores (~1-2/5)

**This is intentional!** Our `SimpleRAG` only retrieves text chunks - it doesn't use an LLM to synthesize answers. This is called **"retrieval-only"** and is just half of a real RAG system.

| Component | SimpleRAG | Production RAG |
|-----------|-----------|----------------|
| Retrieval | ‚úÖ Yes | ‚úÖ Yes |
| LLM Synthesis | ‚ùå No | ‚úÖ Yes (GPT-4, Claude, Llama) |
| Expected Score | 1-2/5 | 4-5/5 |

**The low scores demonstrate that RAGScore correctly identifies a weak RAG.** In production, you'd add LLM synthesis to generate coherent answers from the retrieved chunks, and scores would improve dramatically.

In [None]:
import os

# Set your LLM API key for RAGScore's QA generation and judging
# Option 1: OpenAI
os.environ["OPENAI_API_KEY"] = "sk-..."  # Replace with your key

# Option 2: Use local Ollama (uncomment below)
# !curl -fsSL https://ollama.com/install.sh | sh
# import subprocess, time
# subprocess.Popen("ollama serve", shell=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
# time.sleep(5)
# !ollama pull llama3

In [None]:
from ragscore import quick_test

# Run RAGScore evaluation!
result = quick_test(
    endpoint=rag.query,  # Pass the RAG function directly
    docs=pdf_path,       # Use the same PDF
    n=10,                # Generate 10 test questions
    threshold=0.7,       # Pass if 70%+ correct
)

# Show the visualization
result.plot()

In [None]:
# Detailed metrics
print("üìä Results Summary")
print("="*40)
print(f"Accuracy: {result.accuracy:.1%}")
print(f"Average Score: {result.avg_score:.1f}/5.0")
print(f"Passed: {'‚úÖ Yes' if result.passed else '‚ùå No'}")
print(f"Corrections needed: {len(result.corrections)}")

In [None]:
# View all results as DataFrame
result.df

In [None]:
# Inspect failures
bad_rows = result.df[result.df['score'] < 4]
if len(bad_rows) > 0:
    print(f"‚ùå {len(bad_rows)} questions scored below 4:\n")
    for _, row in bad_rows.iterrows():
        print(f"Q: {row['question'][:80]}...")
        print(f"Score: {row['score']}/5 - {row['reason']}")
        print()

## 7. Export Corrections

Save the corrections to improve your RAG system.

In [None]:
from ragscore.quick_test import export_corrections

if result.corrections:
    export_corrections(result, "corrections.jsonl")
    print("‚úÖ Corrections saved to corrections.jsonl")
    print("\nüìù Sample correction:")
    c = result.corrections[0]
    print(f"Q: {c['question'][:60]}...")
    print(f"Wrong: {c['incorrect_answer'][:60]}...")
    print(f"Correct: {c['correct_answer'][:60]}...")
else:
    print("üéâ No corrections needed - your RAG is perfect!")

---

## üéì What's Next?

This demo used a **minimal RAG** (just retrieval, no LLM synthesis). To improve:

1. **Add an LLM** - Use GPT-4, Claude, or Llama to synthesize answers from retrieved chunks
2. **Better chunking** - Use semantic chunking or sentence-level splitting
3. **Vector database** - Use Chroma, Pinecone, or Weaviate for production
4. **Reranking** - Add a cross-encoder reranker for better retrieval

### Resources
- **RAGScore GitHub**: https://github.com/HZYAI/RagScore
- **RAGScore PyPI**: https://pypi.org/project/ragscore/

‚≠ê Star us on GitHub if you found this useful!