A production-grade implementation of the CRAG paper — Corrective Retrieval-Augmented Generation (Yan et al., 2024)
Built live at: correctiverag.vercel.app
Standard RAG systems retrieve document chunks and feed them to an LLM. But when the retrieved documents are irrelevant or the knowledge base lacks an answer, these systems either hallucinate or produce misleading responses. This is a critical failure mode for any QA system.
The paper "Corrective Retrieval-Augmented Generation" by Yan et al. (2024) proposed an elegant fix: add a corrective evaluator between retrieval and generation that:
- Grades each retrieved passage for relevance
- Decides a path:
- Correct (relevant) → Generate from local documents
- Incorrect (irrelevant) → Rewrite query → Web search → Generate from web results
- Synthesizes the final answer with source citations
This transforms RAG from a passive pipeline into an agentic system with fallback reasoning.
| Paper's Approach | My Implementation | Rationale |
|---|---|---|
| Trained evaluator (T5-based) | Cosine similarity threshold (0.18) | No training data needed; works zero-shot |
| Google Search API (paid) | DuckDuckGo HTML scraper | Free, no API key required |
| Single LLM (ChatGPT) | Groq (llama-3.3-70b) + heuristic fallback | Free tier, low latency, reliable |
| Command-line tool | Full-stack web app (React + FastAPI) | Real-world usable product |
| Batch processing | Real-time logging of each pipeline step | Transparency & debugging |
The core CRAG idea — evaluate-then-decide with web fallback — is preserved, but adapted for a production web environment.
User Query
│
▼
┌─────────────────────────────┐
│ 1. RETRIEVAL │
│ Vector DB search (FAISS) │
│ Filtered by user's docs │
└──────────┬──────────────────┘
│ 5 candidate chunks
▼
┌─────────────────────────────┐
│ 2. EVALUATION │
│ Cosine similarity scoring │
│ Threshold: 0.18 │
└──────────┬──────────────────┘
│
▼
┌──────┴──────┐
│ │
PASS FAIL
(≥0.18) (<0.18)
│ │
│ ▼
│ ┌──────────────────┐
│ │ 3. QUERY REWRITE │
│ │ Strip filler words│
│ └────────┬─────────┘
│ │
│ ▼
│ ┌──────────────────┐
│ │ 4. WEB SEARCH │
│ │ DuckDuckGo (4 │
│ │ results) │
│ └────────┬─────────┘
│ │
└──────┬──────┘
│ context chunks
▼
┌─────────────────────────────┐
│ 5. GENERATION │
│ Groq → Gemini → OpenAI │
│ → Heuristic fallback │
│ + Source citations │
└──────────┬──────────────────┘
│
▼
Final Answer + Pipeline Logs
- Multi-format upload: PDF, TXT, Markdown → chunked (500 chars, 100 overlap) → embedded (all-MiniLM-L6-v2)
- Multi-tenant isolation: Users only see their own documents at database and vector search level
- Real-time pipeline logs: Toggle-able drawer showing each CRAG step with live messages
- Groq LLM: llama-3.3-70b-versatile via Groq API (with heuristic fallback)
- Conversation memory: Multi-turn chat with previous context preserved
- JWT auth: Register/login with bcrypt + 24h tokens
- Dark glassmorphism UI: Premium obsidian design, mobile-responsive
| Layer | Technology |
|---|---|
| Frontend | React 19 + Vite 8 + react-markdown |
| Backend | FastAPI + Uvicorn |
| Vector Store | Custom FAISS (numpy-based, persisted via pickle) |
| Embeddings | sentence-transformers (all-MiniLM-L6-v2) |
| Database | SQLite |
| Auth | JWT + bcrypt |
| Container | Docker (multi-stage) + Docker Compose |
| Deployment | Vercel (frontend) + Railway (backend) |
# Backend
cd backend
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --reload
# Frontend (another terminal)
cd frontend
npm install
npm run devSet your API keys in backend/.env:
JWT_SECRET=your-secret-key
GROQ_API_KEY=gsk_...- Frontend: correctiverag.vercel.app
- API Docs: celebrated-adaptation-production-95fd.up.railway.app/docs