An intelligent chatbot system that enables natural conversations with your PDF documents using advanced AI technologies including RAG (Retrieval-Augmented Generation) and SLM reasoning.
- 📄 PDF Document Upload - Support for multiple PDF files with intelligent extraction
- 💬 Natural Language Chat - ChatGPT-style conversational interface
- 🧠 AI Reasoning Display - Transparent thought process visualization
- 🎯 Context-Aware Responses - Answers based on document content
- 🌐 Bilingual Support - Vietnamese and English responses
- 📊 Document Management - Easy upload, view, and delete operations
- 🔍 Adaptive RAG - Hybrid BM25 + Dense retrieval with dynamic re-ranking
- 💡 Chain-of-Thought - Step-by-step reasoning before final answers
- 🎨 Modern UI - Dark theme with green accents and smooth animations
- ⚡ Real-time Streaming - Token-by-token response generation
- 📱 Responsive Design - Works on desktop, tablet, and mobile
graph TB
subgraph "Frontend (React + Vite)"
UI[User Interface]
Sidebar[Document Sidebar]
Chat[Chat Interface]
PDF[PDF Viewer]
end
subgraph "Backend (FastAPI)"
API[REST API]
Upload[Upload Service]
ChatSvc[Chat Service]
DocSvc[Document Service]
end
subgraph "AI Pipeline"
Extract[PDF Extraction]
RAG[Adaptive RAG]
SLM[SLM Engine]
subgraph "RAG Components"
BM25[BM25 Retrieval]
Dense[Dense Retrieval]
Rerank[Re-ranking]
end
end
subgraph "Storage"
DB[(SQLite Database)]
Files[File Storage]
Index[Vector Index]
end
UI --> API
Sidebar --> API
Chat --> API
PDF --> Files
API --> Upload
API --> ChatSvc
API --> DocSvc
Upload --> Extract
Extract --> RAG
ChatSvc --> RAG
ChatSvc --> SLM
RAG --> BM25
RAG --> Dense
BM25 --> Rerank
Dense --> Rerank
DocSvc --> DB
Upload --> DB
Upload --> Files
RAG --> Index
sequenceDiagram
participant User
participant Frontend
participant API
participant RAG
participant SLM
participant DB
User->>Frontend: Upload PDF
Frontend->>API: POST /api/documents/upload
API->>DB: Save metadata
API->>RAG: Extract & Index
RAG->>DB: Store chunks
API-->>Frontend: Upload complete
User->>Frontend: Ask question
Frontend->>API: POST /api/chat (streaming)
API->>DB: Get conversation history
API->>RAG: Retrieve relevant chunks
RAG-->>API: Return top-k chunks
API->>SLM: Generate with context
loop Streaming
SLM-->>API: Token
API-->>Frontend: SSE event
Frontend-->>User: Display token
end
API->>DB: Save message
API-->>Frontend: Stream complete
PDF Document
↓
[PyPDF2 Extraction]
↓
Raw Text
↓
[Smart Chunking]
- Chunk size: 1200 chars
- Stride: 200 chars
↓
Text Chunks
graph LR
Query[User Query] --> Expand[Query Expansion]
Expand --> BM25[BM25 Retrieval<br/>Keyword-based]
Expand --> Dense[Dense Retrieval<br/>E5-multilingual]
BM25 --> Combine[Score Normalization<br/>& Combination]
Dense --> Combine
Combine --> Filter{Document<br/>Filter?}
Filter -->|Yes| DocFilter[Filter by doc_id]
Filter -->|No| Rerank[Feedback Re-ranking]
DocFilter --> Rerank
Rerank --> TopK[Top-K Selection]
TopK --> Context[Context for SLM]
Key Components:
- BM25: Sparse retrieval using keyword matching
- Dense Retrieval: Semantic search using
multilingual-e5-smallembeddings - FAISS Index: Fast similarity search for dense vectors
- Feedback Learning: User feedback improves future retrievals
- Dynamic Top-K: Adaptive number of chunks based on query
Context Chunks
↓
[Prompt Construction]
- System: Instructions in Vietnamese
- Context: Retrieved chunks
- History: Last 3 messages
- Query: User question
↓
[Qwen2.5-1.5B-Instruct]
- Chain-of-Thought prompting
- Structured output (reasoning + answer)
↓
[Streaming Parser]
- Extract <reasoning>...</reasoning>
- Extract <answer>...</answer>
↓
Real-time Display
LLM Features:
- Model: Qwen2.5-1.5B-Instruct (1.5 billion parameters)
- Precision: FP16 on GPU / FP32 on CPU
- Context: Up to 4096 tokens
- Output: Max 256 tokens with streaming
- Temperature: 0.7 for balanced creativity/accuracy
- Framework: React 18 with Vite
- Styling: Vanilla CSS with custom design system
- PDF Rendering: react-pdf
- Markdown: react-markdown + remark-gfm
- Icons: lucide-react
- Code Highlighting: react-syntax-highlighter
- Framework: FastAPI (Python 3.10)
- Database: SQLAlchemy with SQLite
- LLM Engine: Hugging Face Transformers
- Vector Store: FAISS
- Text Retrieval: rank-bm25
- Embeddings: sentence-transformers
- PDF Processing: PyPDF2 (fallback: mineru)
- SLM: Qwen2.5-1.5B-Instruct
- Embeddings: intfloat/multilingual-e5-small
- Retrieval: Hybrid BM25 + Dense
- Framework: PyTorch + Transformers
- Python 3.10+
- Node.js 20.x+
- CUDA-capable GPU (optional, recommended)
- 8GB+ RAM
- Clone the repository
git clone https://github.com/yourusername/DocBot.git
cd DocBot- Setup Python environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt- Install Node dependencies
cd frontend
npm install
cd ..- Run the application
bash start.shThe application will be available at:
- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
- Click "Upload PDF" button in sidebar
- Select PDF file(s) to upload
- Wait for processing and indexing
Document-Specific Chat:
- Click on a document in the sidebar
- Ask questions about that specific document
- Bot retrieves context only from selected document
Global Chat:
- Click on "DocBot" logo
- Ask questions across all uploaded documents
- Bot searches in entire knowledge base
- Click the lightbulb icon on bot messages
- Expand to see step-by-step reasoning
- Understand how the bot arrived at the answer
- View: Click on document to open PDF viewer
- Delete: Click trash icon to remove document
- Refresh: Documents auto-update on changes
DocBot/
├── backend/ # FastAPI backend
│ ├── app.py # Main application entry
│ ├── config.py # Backend configuration
│ ├── models/ # Database models & schemas
│ │ ├── database.py # SQLAlchemy models
│ │ ├── schemas.py # Pydantic schemas
│ │ └── db_manager.py # Database utilities
│ ├── routers/ # API endpoints
│ │ ├── upload.py # Document upload
│ │ ├── chat.py # Chat endpoints
│ │ ├── qa.py # Q&A endpoints
│ │ └── admin.py # Admin utilities
│ ├── services/ # Business logic
│ │ ├── chat_service.py # LLM generation
│ │ ├── adaptive_rag.py # RAG system
│ │ └── document_service.py
│ └── process/ # Document processing
│ ├── config.py # Processing config
│ └── extract/ # PDF extraction
│ └── mineru.py # Advanced extraction
├── frontend/ # React frontend
│ ├── src/
│ │ ├── components/ # React components
│ │ │ ├── Sidebar.jsx # Document sidebar
│ │ │ ├── ChatInterface.jsx
│ │ │ ├── Message.jsx # Message display
│ │ │ ├── DocumentViewer.jsx
│ │ │ └── MessageInput.jsx
│ │ ├── services/ # API clients
│ │ │ └── api.js # API service
│ │ ├── App.jsx # Main app component
│ │ └── index.css # Global styles
│ └── package.json
├── data/ # Data storage
│ ├── uploads/ # Uploaded PDFs
│ ├── vector_store/ # FAISS indices
│ └── docbot.db # SQLite database
├── start.sh # Startup script
├── requirements.txt # Python dependencies
└── README.md # This file
# LLM Settings
LLM = "Qwen/Qwen2.5-1.5B-Instruct"
MAX_CONTEXT_CHARS = 2500
# RAG Settings
CHUNK_CHARS = 1200
CHUNK_STRIDE = 200
TOPK_BM25 = 8
TOPK_EMB = 8
KEEP_TOPK = 6
# Embedding Model
EMB_MODEL = "intfloat/multilingual-e5-small"Create .env file:
# Optional settings
CUDA_VISIBLE_DEVICES=0 # GPU selection
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True- PDF document upload and processing
- Adaptive RAG with hybrid retrieval
- Chain-of-thought reasoning
- Streaming responses
- Document-specific and global chat
- Vietnamese language support
- Multi-document comparison
- Export chat history
- Advanced analytics dashboard
- Multi-user support
- Cloud deployment
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- Qwen Team for the excellent LLM models
- Hugging Face for the Transformers library
- FastAPI and React communities
- All open-source contributors