A sophisticated Retrieval-Augmented Generation (RAG) system that allows users to upload PDF documents and chat with them using AI. Built with a microservices architecture using Node.js, Python, and modern web technologies.
This project implements a distributed RAG system with three main components:
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Next.js β β Node.js β β Python β
β Client βββββΊβ API Server βββββΊβ Processing β
β (Port 3500) β β (Port 3000) β β Service β
βββββββββββββββββββ βββββββββββββββββββ β (Port 8000) β
β βββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ
β RabbitMQ βββββΊβ PostgreSQL β
β (Message β β + pgvector β
β Queue) β βββββββββββββββββββ
βββββββββββββββββββ
- Port: 3500
- Technology: Next.js 15, React 19, TypeScript, Tailwind CSS
- Features: PDF upload interface, real-time processing status, chat interface
- Components: PDF dropzone, processing status tracker, chat UI
- Port: 3000
- Technology: Express.js, TypeScript, WebSocket
- Responsibilities:
- File upload handling
- Chat orchestration
- WebSocket communication
- LLM integration (Anthropic/OpenAI)
- Queue management
- Port: 8000
- Technology: FastAPI, Docling, OpenAI Embeddings
- Responsibilities:
- PDF text extraction using Docling
- Text chunking and preprocessing
- Vector embedding generation
- Vector search operations
- Database management
- PostgreSQL with pgvector: Vector database for embeddings
- RabbitMQ: Message queue for asynchronous processing
- Docker: Containerized deployment
- PgAdmin: Database administration interface
- Docker and Docker Compose
- Node.js 18+ (for local development)
- Python 3.9+ (for local development)
- Virtual environment support (venv or conda)
-
Clone the repository
git clone <repository-url> cd pdf-RAG
-
Set up environment variables
Create a
.envfile in the root directory:# OpenAI API Key (required for embeddings) OPENAI_API_KEY=your_openai_api_key_here # Anthropic API Key (for chat responses) ANTHROPIC_API_KEY=your_anthropic_api_key_here # Database Configuration DATABASE_URL=postgres://postgres:yourpassword@postgres:5432/ragdb # RabbitMQ Configuration RABBITMQ_URL=amqp://rabbitmq:5672 # Processing Service Configuration PROCESSING_SERVICE_URL=http://localhost:8000 UPLOADS_DIR=/app/uploads
-
Start the system
# Development mode with hot reload npm run dev # Or production mode npm start
-
Access the application
- Frontend: http://localhost:3500
- API Server: http://localhost:3000
- Processing Service: http://localhost:8000
- PgAdmin: http://localhost:5050 (admin@example.com / adminpassword)
- RabbitMQ Management: http://localhost:15672 (guest / guest)
When working with the Python processing service locally:
# Always activate the virtual environment before working
cd processing-service
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install new dependencies
pip install package_name
pip freeze > requirements/base.txt # Update requirements file
# Deactivate when done
deactivateImportant Notes:
- The virtual environment is excluded from version control (see
.gitignore) - Always use the virtual environment for local development
- Update
requirements.txtfiles when adding new dependencies - Docker containers use their own isolated environments
-
Start infrastructure services
docker-compose -f docker-compose.dev.yml up postgres rabbitmq pgadmin
-
Install and run the API server
cd server npm install npm run dev -
Install and run the processing service
cd processing-service # Create and activate virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install -r requirements/base.txt pip install -r requirements/heavy.txt # Run the service python src/main.py
-
Install and run the client
cd client npm install npm run dev
-
Build and start all services
docker-compose up --build
-
Monitor logs
docker-compose logs -f
- Upload: User uploads PDF through the web interface
- Queue: File is queued for processing via RabbitMQ
- Extraction: Docling extracts text and structure from PDF
- Chunking: Text is split into optimal chunks for embedding
- Embedding: OpenAI generates vector embeddings for each chunk
- Storage: Chunks and embeddings are stored in PostgreSQL with pgvector
- Notification: WebSocket notifies frontend of completion
- Query: User sends a message through the chat interface
- Vector Search: Query is embedded and searched against stored chunks
- Context Building: Relevant chunks are retrieved and combined with conversation history
- LLM Generation: Anthropic/OpenAI generates response using the context
- Response: Answer is streamed back to the user
POST /api/document/upload- Upload PDF fileGET /api/document/status/:fileId- Get processing status
POST /api/chat/chat- Send chat messageGET /api/chat/history/:conversationId- Get conversation history
POST /api/search- Vector searchGET /api/health- Health check
The database schema is defined in init.sql and includes:
- documents: Stores document metadata
- chunks: Stores text chunks with vector embeddings (1536 dimensions)
- vector extension: PostgreSQL pgvector for similarity search
See init.sql for the complete schema definition.
- postgres: PostgreSQL with pgvector extension
- rabbitmq: Message queue with management interface
- pgadmin: Database administration
- server: Node.js API server
- processing-service: Python processing service
The processing service is configured with:
- Memory limit: 7GB
- Memory reservation: 2GB
- Restart policy: on-failure
Common Issues:
- Processing stuck: Check memory usage with
docker stats, restart withdocker-compose restart processing-service - Large documents fail: Increase memory limits in docker-compose.yml
- Database issues: Verify PostgreSQL is running with
docker-compose ps - OCR models: First run downloads ~6.5GB, ensure sufficient disk space
Health Checks:
- API Server:
GET http://localhost:3000/health - Processing Service:
GET http://localhost:8000/api/health
Logs:
docker-compose logs -f # All services
docker-compose logs -f processing-service # Specific service- API keys are stored in environment variables
- File uploads are validated and size-limited
- Database connections use connection pooling
- CORS is configured for development
For detailed analysis of the current architecture, identified issues, and improvement recommendations, see recommendations.md. This document includes:
- Current architecture strengths and weaknesses
- Immediate fixes for production readiness
- Medium-term improvements for scalability
- Long-term vision for advanced AI capabilities
- Performance optimization strategies
- Future implementation ideas for agentic patterns