# RAG Retrieval & Generation Evaluation

This notebook offers a lightweight harness for evaluating retrieval quality and end-to-end responses.
Run the setup cell, adjust the dataset path, and execute the evaluation helpers to inspect precision, MRR, and generated answers.

In [None]:
from pathlib import Path
import sys, json

# Add repository root to import path when running from the notebook directory
REPO_ROOT = Path.cwd().resolve().parent
if (REPO_ROOT / 'scripts').exists():
    sys.path.append(str(REPO_ROOT))

DATASET_PATH = REPO_ROOT / 'data' / 'samples' / 'queries.jsonl'
print(f'Dataset path: {DATASET_PATH}')
print('Ensure OpenSearch and Ollama are running before proceeding.')

In [None]:
from scripts.eval_retrieval import load_queries, ensure_retriever, evaluate_queries

retriever = ensure_retriever(index_name='quest-research')
queries = load_queries(DATASET_PATH)
metrics = evaluate_queries(retriever, queries, top_k=5)
metrics

## Inspect Generated Answers

Use the smoke-test helper to inspect LLM outputs for a specific question.
Adjust `question` and `model_override` to target different queries or models.

In [None]:
from scripts.smoke_test import smoke_test

smoke_test(
    pdf_path=None,
    question='Summarize the key findings about attention mechanisms.',
    index_name='quest-research',
    model_override='mistral',
    timeout_override=120.0,
    fallback_override='gemma3:1b',
)