An enterprise-grade, fully containerized Retrieval-Augmented Generation (RAG) pipeline that allows users to upload massive PDF documents and query them using natural language. Built with Spring Boot, React, and PostgreSQL (pgvector), this system implements advanced AI routing and asynchronous processing to handle heavy concurrent workloads.
- Advanced RAG Pipeline: Contextualizes LLM responses by mathematically matching user queries to document chunks stored in a pgvector database.
- Intelligent Query Expansion (AI Router): Utilizes a "Middleman AI" to dynamically translate abstract or indirect user queries into optimized vector-search parameters, drastically improving retrieval accuracy.
- Asynchronous Ingestion Engine: Implements non-blocking polling and
@EnableSchedulingwatchdogs to process heavy PDF uploads (300+ pages) without starving database connections or freezing the UI. - Token-Optimized LLM Integration: Communicates with the Google Gemini API, featuring robust error handling, token limit management (up to 2048 output tokens), and dynamic prompt engineering.
- Fully Containerized: One-click local deployment using Docker Compose, orchestrating the React frontend, Spring Boot backend, and pgvector database across an isolated network.
- Frontend (React/Vite & Nginx): Provides a responsive UI for file uploads and a conversational chat interface.
- Backend (Spring Boot 3): Exposes REST APIs, extracts text via Apache PDFBox, manages asynchronous chunking/embedding, and orchestrates the LLM dual-routing process.
- Database (PostgreSQL + pgvector): Stores document chunks and their high-dimensional vector embeddings for low-latency cosine similarity search.
- LLM Provider (Google Gemini): Powers both the Query Expansion module and the Final Answer Generation module.
- Backend: Java 17, Spring Boot, Spring Data JPA, Apache PDFBox, Maven
- Frontend: React 18, Vite, Node.js, HTML/CSS
- Database: PostgreSQL 16,
pgvectorextension - AI/LLM: Google Gemini API (gemini-1.5-flash)
- DevOps: Docker, Docker Compose
- Docker Desktop installed and running.
- A free Google Gemini API Key.
- Clone the repository:
git clone [https://github.com/](https://github.com/)[Your-Username]/[Your-Repo-Name].git cd [Your-Repo-Name]
Build the Java Backend:
Bash cd backend mvn clean package -DskipTests cd .. Launch the Docker Fleet: Replace YOUR_API_KEY with your actual Gemini API Key.
Windows (PowerShell):
PowerShell $env:GEMINI_API_KEY="YOUR_API_KEY"; docker-compose up --build -d Mac/Linux:
Bash GEMINI_API_KEY="YOUR_API_KEY" docker-compose up --build -d Access the Application: Open your browser and navigate to http://localhost:3000.
🧠 How the Query Expansion Works To prevent "hallucinations" and handle vague questions (e.g., "Is this about a person or a company?"), the system does not search the database immediately.
The user's query is sent to a Query Expander LLM.
The AI generates highly specific search keywords based on the intent.
These keywords are vectorized and sent to pgvector for similarity matching.
The retrieved document chunks and the original prompt are sent to the Generator LLM to construct the final response.