📄 Advanced Document Retrieval System

An intelligent, high-performance RAG (Retrieval-Augmented Generation) system for PDF documents. Built with a modern React frontend and a robust FastAPI backend, featuring open-source document extraction, hybrid search, and cross-encoder reranking.

🚀 Key Features

Modern React UI: A responsive, premium dashboard for document management and intelligent chat.
Open-Source Extraction: Leverages Docling for high-fidelity, structure-aware PDF parsing.
Hybrid Search Engine: Combines FAISS (Vector Search) and BM25 (Lexical Search) with Reciprocal Rank Fusion (RRF) for superior retrieval precision.
Advanced Reasoning: Powering answers with Sarvam-105B (via Sarvam API or Modal vLLM) for superior reasoning and document understanding.
Semantic Routing: Automatic query routing to specific document sections based on content type.
Detailed Analytics: Real-time stats on processing time, chunk counts, and retrieval confidence.

🏗️ Architecture

graph TD
    User[👤 User] -->|Interacts| React["⚛️ React Frontend<br>(frontend/)"]
    
    subgraph API_Layer [Backend API]
        React -->|HTTP / JSON| FastAPI["⚡ FastAPI Backend<br>(backend/main.py)"]
        FastAPI -->|Query/Upload| Store["📦 Document Store<br>(EnhancedDocumentStoreHybrid)"]
    end
    
    subgraph Processing_Layer [Ingestion & Processing]
        Store -->|Extract| Docling["📄 Docling<br>(Open-Source PDF Extraction)"]
        Store -->|Chunk| Chunker["✂️ Chunker<br>(Logical Boundaries)"]
    end

    subgraph Retrieval_Layer [Hybrid Search & RAG]
        Store -->|Retrieve| Hybrid["🔍 Hybrid Retriever<br>(FAISS + BM25)"]
        Hybrid -->|Fusion| RRF["⚖️ RRF Scoring"]
        RRF -->|Rank| Reranker["⭐ Cross-Encoder Rerank"]
        Reranker -->|Context| LLM["🤖 Sarvam LLM<br>(Answer Generation)"]
    end

    subgraph Data_Storage [Local Storage]
        Hybrid -->|FAISS Index| VectorDB[(Vector Store)]
        Hybrid -->|BM25 Index| DocDB[(Lexical Store)]
    end

🛠️ Setup & Execution

1. Requirements

Node.js: For the React frontend.
Python 3.10+: For the FastAPI backend.
Sarvam API Key: For answer generation (Sarvam-105B).

2. Backend Setup

Navigate to the backend directory:
```
cd backend
```
Install dependencies:
```
pip install -r requirements.txt
```

3. Model & Worker Setup (Sarvam + Modal)

The system is optimized for cloud scale using Modal for heavy processing and Sarvam AI for high-performance reasoning.

A. Sarvam AI (API Setup)

Sign up at sarvam.ai.
Generate an API Key and add it to your .env:
```
SARVAM_API_KEY=your_sarvam_api_key
```

B. Modal Deployment (LLM, Docling, Reranker)

Initialize Modal: pip install modal && modal setup.
Create Secrets: In the Modal dashboard, create a secret named huggingface-secret containing your HF_TOKEN.

Deploy the Stack:

# 1. LLM Server (Gemma-2 9B)
modal run modal/modal_llm_server.py::download_model
modal deploy modal/modal_llm_server.py

# 2. Docling Worker (PDF Extraction)
modal deploy modal/modal_docling_worker.py

# 3. Reranker Server (MiniLM-L-6)
modal run modal/modal_reranker_server.py::download_model
modal deploy modal/modal_reranker_server.py

Finalize .env: Copy the deployment URLs into your backend .env:

LLM_URL=https://your-llm-server.modal.run
DOCLING_URL=https://your-docling-worker.modal.run
RERANKER_URL=https://your-reranker-server.modal.run

Important

Deployment Workflow:

First Time: Run download_model then deploy. This ensures the Volume is populated before the server starts.
Subsequent Changes: Only run modal deploy. You do NOT need to redownload unless you change the MODEL_NAME in the script.
Why Deploy?: modal run gives a temporary development URL. modal deploy creates the permanent production URL required for your .env.

4. Running the Frontend

Navigate to the frontend directory:
```
cd frontend
```
Install dependencies:
```
npm install
```
Run Dev Server:
```
npm run dev
```
Open http://localhost:5173 in your browser.

📈 Performance & Evaluation

The system's performance is validated using the Ragas evaluation framework, focusing on faithfulness, relevancy, and retrieval quality.

Evaluation Metrics (Latest Run)

Faithfulness: 0.84 (High adherence to the source document)
Answer Relevancy: 0.86 (Measures how pertinent the answer is to the query)
Context Precision: 0.88 (Quality of the retrieved chunks)
Context Recall: 0.976 (Ability to retrieve all relevant information)

Note

Evaluation was performed on a diverse set of complex financial and legal documents to ensure robustness across different domains.

📂 Project Structure

document-retrieval-system/
├── backend/                 # FastAPI Backend & RAG Logic
│   ├── core/                # Core Processing Engine
│   │   ├── document_store.py  # Hybrid storage & management
│   │   ├── retriever.py       # FAISS + BM25 + RRF logic
│   │   ├── pdf_processor.py   # Docling integration
│   │   ├── chunker.py         # Advanced text chunking
│   │   └── query_router.py    # Semantic query routing
│   ├── llm/                 # LLM & Embedding Configuration
│   │   ├── llm_router.py      # Modal/Sarvam smart routing
│   │   └── gemini_setup.py    # Legacy/Fallback config
│   ├── modal/               # Cloud Deployment Scripts
│   │   ├── modal_llm_server.py # vLLM hosting (Gemma-2)
│   │   ├── modal_docling_worker.py # Serverless PDF extraction
│   │   └── modal_reranker_server.py # Cross-Encoder hosting
│   ├── main.py              # API Entry Point
│   ├── requirements.txt     # Python dependencies
│   └── .env                 # API Keys & Worker URLs
├── frontend/                # Vite + React Frontend
│   ├── src/                 # Application Source
│   │   ├── App.jsx          # Main Chat Interface & Logic
│   │   ├── App.css          # Premium Glassmorphism styling
│   │   └── main.jsx         # React entry point
│   ├── public/              # Static assets
│   ├── package.json         # Frontend dependencies
│   └── vite.config.js       # Vite proxy & build config
├── notebooks/               # R&D and Evaluation
│   └── evaluation.ipynb     # Ragas benchmarking pipeline
├── results/                 # Metrics & Analysis outputs
├── .gitignore               # Build & Secret exclusions
└── README.md                # Project documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 Advanced Document Retrieval System

🚀 Key Features

🏗️ Architecture

🛠️ Setup & Execution

1. Requirements

2. Backend Setup

3. Model & Worker Setup (Sarvam + Modal)

A. Sarvam AI (API Setup)

B. Modal Deployment (LLM, Docling, Reranker)

4. Running the Frontend

📈 Performance & Evaluation

Evaluation Metrics (Latest Run)

📂 Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
backend		backend
frontend		frontend
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

📄 Advanced Document Retrieval System

🚀 Key Features

🏗️ Architecture

🛠️ Setup & Execution

1. Requirements

2. Backend Setup

3. Model & Worker Setup (Sarvam + Modal)

A. Sarvam AI (API Setup)

B. Modal Deployment (LLM, Docling, Reranker)

4. Running the Frontend

📈 Performance & Evaluation

Evaluation Metrics (Latest Run)

📂 Project Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages