Lighthouse for RAG pipelines. Get PASS/WARN/FAIL audit scores in your terminal before you ship.
PASS 84/100 my-rag-app "what is RAG?"
✓ precision:90 ✓ efficiency:80 ✓ redundancy:85 ✓ coverage:100
WARN 61/100 my-rag-app "what is dense passage retrieval?"
✗ precision:30 ✗ efficiency:45 ~ redundancy:70 ✓ coverage:100
→ Reduce TOP_K 10→5 · 2 near-duplicate chunks detected
─────────────────────────────────────────────────────────────
Session 2 queries · avg 72/100 ↑ improving
You build a RAG pipeline. It looks fine in demos. You ship it. Users complain the answers are wrong or vague — but nothing in your logs tells you why.
The real issue is usually invisible: too many chunks retrieved, half of them never reaching the LLM, near-duplicate content eating your context window, no similarity scores to optimize against. RAGScope makes all of this visible — scored, labelled, and actionable — in your terminal, before you ship.
# 1. Start RAGScope (no install needed)
npx ragscope start
# 2. Point your pipeline's OTel exporter at it
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4321
# 3. Run your test queries — scores appear instantlyThat's it. No config files, no accounts, no data leaving your machine.
Every query gets four sub-scores combined into a single 0–100:
| Sub-score | Weight | What it measures |
|---|---|---|
| Retrieval Precision | 40% | Fraction of retrieved chunks that actually reached the LLM |
| Context Efficiency | 30% | Token waste on chunks the LLM never saw |
| Redundancy | 20% | Near-duplicate chunks eating your context window |
| Score Coverage | 10% | Whether chunks carry similarity scores for optimization |
| Label | Score | Meaning |
|---|---|---|
| PASS | ≥ 75 | Retrieval pipeline is healthy |
| WARN | 50–74 | Issues detected — check recommendations |
| FAIL | < 50 | Significant retrieval problems before ship |
Add --verbose for a full per-query breakdown with specific recommendations.
RAGScope is source-agnostic. Traces arrive via two paths.
One line change: set the OTLP exporter URL to http://localhost:4321/v1/traces.
TraceAI / Traceloop (auto-instruments LangChain, LlamaIndex, OpenAI, Pinecone, Qdrant, Cohere…)
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { instrument } from '@traceloop/node-server-sdk';
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({ url: 'http://localhost:4321/v1/traces' }),
});
sdk.start();
instrument();Vercel AI SDK
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({ url: 'http://localhost:4321/v1/traces' }),
});
sdk.start();Phoenix (Arize) / OpenLLMetry — set PHOENIX_COLLECTOR_ENDPOINT=http://localhost:4321 or TRACELOOP_BASE_URL=http://localhost:4321.
Manual OpenTelemetry
import { trace } from '@opentelemetry/api';
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({ url: 'http://localhost:4321/v1/traces' }),
});
sdk.start();
const tracer = trace.getTracer('my-rag-app');
const span = tracer.startSpan('qdrant.query');
span.setAttribute('gen_ai.operation.name', 'retrieve');
span.setAttribute('gen_ai.retrieval.documents', JSON.stringify(docs));
span.end();Set two env vars — RAGScope polls every 30 seconds, zero code changes:
LANGFUSE_PUBLIC_KEY=pk-lf-... \
LANGFUSE_SECRET_KEY=sk-lf-... \
npx ragscope startComing soon: LangSmith · Helicone adapters. Open an issue to vote or contribute.
npx ragscope start [options]
--port <n> Port to listen on (default: 4321)
--db <path> Path to SQLite database file (default: in-memory)
--verbose Show full sub-score breakdown and recommendations per query
| Category | Tools |
|---|---|
| Vector stores | Qdrant · Chroma · Pinecone · Weaviate · pgvector |
| LLM frameworks | LangChain · LlamaIndex · Vercel AI SDK · custom |
| Models | OpenAI · Anthropic · Cohere · Mistral · any OTel-instrumented provider |
| Rerankers | Cohere Rerank · any span with gen_ai.operation.name = rerank |
| Ingestion | Any OTel exporter · Langfuse · (LangSmith, Helicone coming soon) |
Those are excellent production monitoring tools — they record what happened after you ship.
RAGScope is a pre-ship quality gate — like ESLint or Lighthouse, you run it during development to catch retrieval problems before they reach users. Different job, smaller footprint, zero cloud dependency.
- OTLP ingestion — works with any OTel-compatible source
- Langfuse polling adapter
- Four sub-scores: precision, efficiency, redundancy, coverage
- PASS / WARN / FAIL per query with rolling session average
-
--verboseflag for full breakdown + recommendations
- LangSmith adapter — poll runs via LangSmith API, zero code changes
- Helicone adapter — fetch requests via Helicone API
- Langfuse webhooks — real-time instead of 30s polling
- Audit report export —
npx ragscope reportwrites a Markdown/JSON summary you can commit or share
- Compare mode —
npx ragscope compare v1 v2diffs two pipeline versions side-by-side - Python support — native Python instrumentation helpers
- Threshold config —
.ragscope.jsonto set custom PASS/WARN/FAIL thresholds per project - Span-level drill-down —
--trace <id>to inspect a single trace in detail
Vote on features or propose new ones by opening an issue.
Contributions are welcome. See CONTRIBUTING.md for setup and guidelines.
Good first issues: LangSmith adapter, Helicone adapter, audit report export, improving scoring heuristics.
All data stays on your machine. No telemetry, no cloud, no accounts required.
Apache 2.0 — © 2026 Siddharth Pandey