A highly resilient, production-ready Retrieval-Augmented Generation (RAG) system built in Python. This system extracts knowledge from local HDFC documents page-by-page, builds a hybrid local search index, and streams responses to user queries using the OpenRouter API with conversational memory and strict domain-adherence parameters.
- Hybrid Retrieval (Dense + Sparse): Combines semantic vector lookup via
FAISSand lexical keyword matching viaBM25using LangChain'sEnsembleRetriever(weighted 50/50). This guarantees maximum accuracy across structured financial tables, annual reports, and narrative policies. - Local Embedding Computation: Runs
BAAI/bge-small-en-v1.5locally using Hugging Face embeddings. It automatically leverages CUDA GPUs when available or runs on a highly optimized local CPU fallback, consuming zero cloud API tokens for document ingestion. - Smart Hash-Based Cache Caching: Aggregates PDF file attributes (name, size, and last modified date) into a SHA-256 signature hash stored in
faiss_index/cache_metadata.json.- Sub-Second Boot: Subsequent runs load FAISS and BM25 databases instantly, skipping heavy text extraction and embeddings computation.
- Auto-Detect Changes: If a PDF is added, modified, or deleted, the system automatically detects the signature mismatch and re-indexes the project.
- PyMuPDF Ingestion Engine: Parses pages fast, handling complex multi-column reports and large annual publications. Features page-level exception isolation: corrupt pages are gracefully skipped without halting the indexing process.
- Resilient OpenRouter Client: Direct HTTP stream parsing with built-in exponential backoff retries (up to 5 attempts) to absorb transient connection drops, rate limits (429), or model timeouts.
- Windows UTF-8 Stream Safety: Automatically configures system standard stdout/stderr streams to UTF-8 with character replacements, preventing common Windows CP1252 charmap encoding crashes when rendering rich Markdown borders, tables, or Unicode punctuation.
- Conversational Memory buffer: Implements a sliding conversation window (retains last 5 exchanges). Conversational references (e.g., pronouns) are resolved via a fast, deterministic Query Reformulation Step prior to running search.
- Supporting Excerpts & Confidence Scores: Each query outputs distinct sources with page numbers, exact textual supporting excerpts, and a computed confidence score mapped from FAISS cosine similarity values.
C:\Users\admin\Desktop\Projects\RAG_ORBK\
├── .env # Active API key and LLM Model configuration
├── .env.example # Reference template for configuration setup
├── requirements.txt # Pinned dependencies for reproducible builds
├── app.py # Main CLI REPL shell & stream parser
├── config.py # Environment config parsing and validation
├── README.md # System documentation
└── src/
├── __init__.py
├── document_processor.py # Fast PyMuPDF (fitz) page extraction
├── chunker.py # Page splitter with ID metadata preservation
├── embeddings.py # BGE-small-en-v1.5 local embedding provider
├── vector_store.py # Local indices manager with change-detection
├── openrouter_client.py # HTTP client with retries and SSE stream parser
├── rag_engine.py # RAG Coordinator, Hybrid Search, Memory, and scoring
└── utils.py # ASCII Art logs and console theme formatting
Ensure you have Python 3.11+ installed on your system.
Verify that the .env file in the project root is configured with your OpenRouter API keys. (A template configuration is available in .env.example).
OPENROUTER_API_KEY=sk-or-v1-...
OPENROUTER_MODEL=google/gemini-2.5-flash:freeOpen a terminal in the project directory C:\Users\admin\Desktop\Projects\RAG_ORBK and run:
# Create a virtual environment
python -m venv venv
# Activate the virtual environment
.\venv\Scripts\activate
# Install all pinned dependencies
pip install -r requirements.txtStart the interactive terminal shell using the active virtual environment:
python app.py- Force Rebuild: Forces the system to re-extract text and compile a fresh index, even if no files have changed:
python app.py --rebuild - Retrieval Debug Mode: Enables detailed logging. Prior to answering, the console will print standalone reformulated query terms, similarity distance levels, and matching excerpts:
python app.py --debug
Once inside the interactive chat loop, the following system commands are available:
/clear- Wipes the conversational memory buffer clean (starts a fresh dialogue thread)./debug- Toggle the Retrieval Debugging Mode on or off in real-time./help- Prints a reference sheet of all CLI options.exitorquit- Safely exits the application.