Chat with any PDF or YouTube video — 100% locally, with a self-correcting agentic loop, dual-graded answer verification, conversational memory, and zero API costs.
Most RAG demos stop at "retrieve chunks → generate answer." This project goes further — it uses a LangGraph state-machine agent that grades its own answers across two independent dimensions, detects failures, and automatically rewrites queries with targeted awareness of what went wrong before re-retrieving.
The agent catches two distinct failure modes that most RAG systems miss:
- Hallucination — the answer contains claims not supported by the retrieved chunks
- Retrieval mismatch — the chunks were retrieved but didn't match the intent of the query (e.g. asked for examples, got limitations)
On top of that, it has conversational memory — follow-up questions like "give examples for it" are automatically rephrased into standalone queries before retrieval, so context is never lost between turns.
Everything runs locally via Ollama and FAISS. No API keys. No data leaving your machine.
| Feature | Description |
|---|---|
| 🔁 Self-Correcting Agent | LangGraph loop retries with a rewritten query up to N iterations before giving up gracefully |
| 🎯 Dual-Graded Verification | Every answer is graded on both groundedness (hallucination check) AND relevance (did it actually answer the question?) — both must pass |
| 🔍 Context-Aware Query Rewriter | On failure, the rewriter receives the specific failure reason (hallucinated / off-topic / both) and rewrites accordingly |
| 💬 Conversational Memory | Resolves pronoun references and follow-up questions using the last N chat turns before retrieval |
| 🕵️ Agentic Telemetry | Expandable per-response panel showing every iteration's query, generated answer, retrieved chunks, and dual verdict |
| ⚙️ Live Parameter Control | Sidebar sliders for chunk size, overlap, top-K retrieval, max correction iterations, and memory window |
| 🔄 Smart Re-indexer | Detects when sidebar parameters differ from the active vector index and prompts a targeted rebuild |
| 📄 PDF Support | Drag-and-drop local PDF ingestion via PyPDF |
| 🎥 YouTube Support | Paste any YouTube URL to auto-fetch and query its transcript |
| 🔒 Fully Local | Ollama + FAISS CPU — zero API costs, zero tracking, total privacy |
| 🖥️ CLI Mode | Full terminal interface via main.py with diagnostic iteration logs |
User Question
│
▼
[contextualize_query] → Rephrase follow-ups using chat history (e.g. "it" → "Expert Systems")
│
▼
[retrieve] → Pull top-K chunks from FAISS vector index
│
▼
[generate] → LLM answers using ONLY the retrieved context
│
▼
[check_hallucination] → Dual grader: GROUNDED (yes/no) + RELEVANT (yes/no)
│
├── GROUNDED + RELEVANT ────────────────▶ Return answer ✅
│
├── either fails + iterations < N ──────▶ [rewrite_query] → back to [retrieve] 🔄
│ ↑
│ rewriter receives specific
│ failure reason (hallucinated /
│ off-topic / both)
│
└── either fails + iterations == N ────▶ "Information not found in document" ❌
Two distinct query rewriting steps exist for two different reasons:
contextualize_query— runs once at the start of each turn to resolve ambiguous references using chat history. Never re-runs during the correction loop.rewrite_query— runs inside the correction loop when dual grading fails. Receives a specific failure reason so the rewrite is targeted, not generic.
| Component | Tool |
|---|---|
| 🧠 Agent Framework | LangGraph (state-machine with conditional edges and retry loop) |
| 🤖 LLM | llama3 / mistral / gemma / phi3 via Ollama |
| 🔢 Embeddings | all-MiniLM-L6-v2 — HuggingFace Sentence Transformers |
| 🗄️ Vector Store | FAISS (CPU-based, no server required) |
| 📑 PDF Parsing | LangChain + PyPDFLoader |
| 🎥 YouTube Transcripts | youtube-transcript-api |
| 🖥️ Web UI | Streamlit with custom glassmorphic CSS |
- Python 3.11+
- Ollama installed and running locally
- ~5 GB disk space for the LLM model
1. Clone the repository
git clone https://github.com/Varn1t/LoreLoop.git
cd LoreLoop2. Install dependencies
pip install -r requirements.txt3. Pull a local model
ollama pull llama34a. Launch the web app
streamlit run app.py4b. Or run the CLI
python main.pyLoreLoop/
├── agent.py # LangGraph agent — state, nodes, edges, dual grader, pipeline builder
├── app.py # Streamlit web UI with telemetry and parameter controls
├── main.py # CLI interface with iteration diagnostics
├── requirements.txt # Python dependencies
├── .gitignore
└── README.md
langchain
langchain-community
langchain-huggingface
langchain-ollama
langgraph
faiss-cpu
streamlit
youtube-transcript-api
pypdf
sentence-transformers
- LLM-as-judge — Implemented dual-graded self-correction; identified that local 8B models have reliability ceiling as LLM-as-judge — production deployment would require a stronger model or dedicated NLI scorer.
- Single document per session — the current implementation indexes one source at a time. Multi-document support is a planned upgrade.
- Local model quality — grounding and relevance accuracy are directly tied to the capability of the Ollama model you pull.
llama3is recommended as the minimum.
Built by Varnit · LangGraph · FAISS · Ollama · Streamlit