Intelligent PDF Assistant with Multi-Agent Architecture
Doc2Agent is a document Q&A system that combines local LLM inference with a multi-agent architecture. Upload PDFs, ask questions, and get answers powered by local models running on your machine.
Doc2Agent transforms static PDF documents into interactive knowledge bases. Using a multi-agent system with specialized roles, it provides accurate, context-aware answers to your questions while maintaining complete privacy through local inference.
Key Capabilities:
- Intelligent PDF parsing and semantic enrichment
- Multi-agent collaboration for quality assurance
- Local-first architecture with Ollama integration
- Full-text search with SQLite FTS5
- Query caching for improved performance
- Personal information validation
- Configurable logging (terminal + optional log file)
- File-based configuration (agents/models/backends and prompts)
- Python 3.10 or higher
- Ollama installed and running
uvpackage manager (or pip)
- Install Ollama models:
ollama pull ministral-3:3b
ollama pull deepseek-r1:8b- Install dependencies:
uv sync- Configure environment:
cp env.example .env- Start the application:
uv run chainlit run app/chainlit_app.py- Access the web UI:
- Open your browser to the URL shown in the terminal (typically
http://localhost:8000) - Upload a PDF document
- Start asking questions
- Open your browser to the URL shown in the terminal (typically
- PyMuPDF-based parsing: Extracts text, tables, and images with high fidelity
- LLM enrichment: Automatically identifies names, dates, headings, and keywords
- Smart caching: Documents are cached based on modification time to avoid reprocessing
- Flexible storage: Large documents stored in SQLite; optional JSON export for smaller files
- Main Agent: Orchestrates queries and coordinates with specialized agents
- Reviewer Agent: Reviews draft answers for quality and accuracy
- Validator Agent: Validates claims against user-provided personal information
- Ingestion Agent: Enriches document pages with semantic metadata
- SQLite with FTS5: Full-text search across all ingested documents
- Query caching: Intelligent caching system reduces redundant LLM calls
- Document management: Load, select, and manage multiple documents from the UI
- Cache management: Built-in cache flushing and automatic cleanup
- Chainlit web UI: Modern, interactive chat interface
- Document upload: Drag-and-drop PDF upload with progress tracking
- Document selection: Switch between multiple cached documents
- Query history: View and manage cached queries per document
- Logs to the terminal by default; optional file logging.
- Configurable via:
LOG_LEVEL,LOG_TO_FILE,LOG_FILE.
- Environment-driven: copy
env.example→.env. - File-driven (editable JSON):
src/agents_config/agents.json,src/agents_config/prompts.json.
flowchart TB
subgraph UI[User Interface]
CL[Chainlit Web UI]
end
subgraph Ingestion[PDF Ingestion Pipeline]
PDF[PDF File] --> Parser[PDFParser<br/>PyMuPDF]
Parser --> RawPages[Raw Pages<br/>text/tables/images]
RawPages --> IngAgent[Ingestion Agent<br/>LLM Enrichment]
IngAgent --> EnrichedPages[Enriched Pages<br/>names/dates/headings]
end
subgraph Storage[Persistence Layer]
EnrichedPages --> SQLite[(SQLite DB<br/>FTS5 Search)]
EnrichedPages -.->|Optional| JSON[(JSON File)]
end
subgraph Agents[Multi-Agent System]
Main[Main Agent]
Reviewer[Reviewer Agent]
Validator[Validator Agent]
Main -->|review_draft| Reviewer
Main -->|validate_against_personal_info| Validator
end
subgraph LLM[LLM Backend]
Ollama[Ollama<br/>Local Inference]
end
CL -->|upload PDF| Ingestion
CL -->|chat message| Main
Main -->|query_pages / search_fts| Storage
Main -->|generate| Ollama
Reviewer -->|generate| Ollama
Validator -->|generate| Ollama
IngAgent -->|generate| Ollama
Main -->|response| CL
Data Flow:
- User uploads PDF → Ingestion pipeline parses and enriches pages
- Enriched pages stored in SQLite with full-text search capabilities
- User asks question → Main agent queries storage and generates response
- Reviewer agent validates answer quality
- Validator agent checks against personal info (if configured)
- Final answer returned to user interface
For detailed architecture documentation, see docs/architecture.md.
| Variable | Default | Description |
|---|---|---|
OLLAMA_BASE_URL |
http://localhost:11434/v1 |
Ollama API endpoint |
OPENROUTER_API_KEY |
- | OpenRouter API key for cloud inference (set and switch backend in agents.json) |
| Variable | Description |
|---|---|
PERSONAL_INFO_JSON |
JSON object with user's personal data: {"name":"...","email":"..."} |
| Variable | Default | Description |
|---|---|---|
PDF_JSON_MAX_BYTES |
2000000 |
Max file size (bytes) for JSON storage |
PDF_SQLITE_DIR |
data |
Directory for SQLite database files |
PDF_STORAGE_DIR |
data/pdfs |
Directory for permanent PDF file storage |
| Variable | Default | Description |
|---|---|---|
INLINE_DOC_MAX_CHARS |
20000 |
Max characters to inline in prompt |
SHOW_REASONING |
true |
Display <think> tags in UI |
| Variable | Default | Description |
|---|---|---|
QUERY_CACHE_ENABLED |
true |
Enable/disable query caching |
QUERY_CACHE_MAX_PER_FILE |
10 |
Max cached queries per document |
Note: Cached queries are automatically deleted when documents are removed. Use the "Flush Cache" button in the UI to manually clear cached queries.
| Variable | Description |
|---|---|
LOG_LEVEL |
Logging level (DEBUG, INFO, WARNING, ERROR) |
LOG_FILE |
Path to log file |
LOG_TO_FILE |
Enable file logging (true/false) |
- Agent/model/backend config:
src/agents_config/agents.json(andsrc/agents_config/agents.openrouter.json) - Prompts:
src/agents_config/prompts.json(system prompts are fully editable here)
- UI Framework: Chainlit - Modern chat interface
- Agent Framework: pydantic-ai - Type-safe AI agents
- LLM Backend: Ollama - Local LLM inference
- PDF Processing: PyMuPDF (fitz) - High-performance PDF parsing
- Storage: SQLite with FTS5 - Full-text search and document persistence
- Configuration: python-dotenv - Environment variable management
- Package Management: uv - Fast Python package manager
Doc2Agent/
├── app/
│ ├── chainlit_app.py # Chainlit UI entry point
│ ├── config.toml # Chainlit configuration
│ └── utils.py # UI utilities
├── src/
│ ├── agents/
│ │ ├── base.py # Agent creation and execution
│ │ ├── main.py # Main agent implementation
│ │ ├── reviewer.py # Reviewer agent
│ │ ├── ingestion.py # Ingestion agent
│ │ └── tooling.py # Tool registration
│ ├── agents_config/
│ │ ├── agents.json # Agent configurations
│ │ ├── prompts.json # System prompts
│ │ └── schemas.py # Config schemas
│ ├── chat/
│ │ └── assistant.py # Chat orchestration
│ ├── schemas/
│ │ └── document.py # Document and page schemas
│ ├── storage/
│ │ └── sqlite_store.py # SQLite persistence layer
│ ├── tools/
│ │ ├── pdf_parser.py # PDFParser (PyMuPDF)
│ │ ├── pdf.py # Legacy PDF tools
│ │ └── retrieval.py # Search utilities
│ ├── bootstrap.py # Application initialization
│ └── logging.py # Logging setup
├── tests/ # Unit tests
├── docs/ # Documentation
├── data/ # Data storage (SQLite, PDFs, JSON)
└── uploads/ # Temporary upload directory
make lint # Format code with black and isort
make lint-check # Check formatting without modifying files
make test # Run test suiteuv run pytestThe project uses uv for dependency management. Key commands:
uv sync- Install dependenciesuv run <command>- Run commands in the project environmentuv add <package>- Add a new dependency
-
Start the application:
uv run chainlit run app/chainlit_app.py
-
Upload a document:
- Click the upload button in the UI
- Select a PDF file
- Wait for ingestion to complete
-
Ask questions:
- Type your question in the chat interface
- The system will search the document and generate an answer
- Use
/resetto clear the chat history
-
Manage documents:
- Use
/docsto view cached documents - Select a document to switch context
- Delete documents or flush cache as needed
- Use
- Multiple documents: Upload and switch between multiple PDFs
- Query caching: Frequently asked questions are cached for faster responses
- Personal info validation: Configure personal information to validate document claims
- Reasoning traces: Enable
SHOW_REASONING=trueto see agent reasoning steps
MIT License - see LICENSE file for details
