A FastAPI-powered RAG service that answers natural-language questions from your PDF collection — with source citations, guardrails, and a multi-agent pipeline built on Pydantic-AI.
This system is designed as a document-based GPT implementation that provides intelligent answers from a collection of internal documents. The architecture prioritizes precision, transparency, and verifiability by ensuring users receive short, well-sourced answers with direct links to the relevant document sections.
-
Source Attribution: Every answer includes a direct link to the relevant document and specific paragraph, enabling users to verify information at its source. This design choice prioritizes transparency and trust.
-
Relevance Filtering: The system is built to recognize when questions fall outside the scope of available documents, explicitly informing users when queries are not applicable to the document collection. This prevents hallucination and maintains accuracy.
-
Dynamic Content Management: The architecture supports seamless addition of new documents, ensuring the knowledge base can grow and stay current without system redesign.
-
Built-in Guardrails: Multiple layers of validation prevent inappropriate or irrelevant responses, maintaining system reliability and user trust.
-
Install dependencies:
uv sync
-
Create environment file:
cp .env.example .env
Then edit
.envto add your OpenAI API key and other required variables. -
Start the service:
uv run uvicorn drm_document_service.app:app --host 127.0.0.1 --port 8000 --reload
The service can be run using Docker Compose, which includes all required dependencies:
-
Prerequisites: Ensure you have Docker and Docker Compose installed on your system.
-
Environment Setup: Create a
.envfile in the project root with your OpenAI API key:echo "OPEN_AI_KEY=your-openai-api-key-here" > .env
-
Build and Run: Start all services using Docker Compose:
docker compose -f doc-service.compose.yaml up --build
This will start:
- doc-service-backend (port 8000): The main FastAPI application
- qdrant (ports 6333, 6334): Vector database for document embeddings
- minio (port 9000): Object storage for document files
- minio console (port 9001): MinIO web console for storage management
-
Access the Service:
- API:
http://localhost:8000 - MinIO Console:
http://localhost:9001(admin interface for file storage)
- API:
-
Stop the Services:
docker compose -f doc-service.compose.yaml down
-
Clean Up (removes volumes and data):
docker compose -f doc-service.compose.yaml down -v
The Docker Compose setup creates:
- Network:
doc-service-network(bridge driver) for inter-service communication - Volumes:
minio_data: Persistent storage for uploaded documentsqdrant_data: Persistent storage for vector embeddings
- Services: All services run on the same network and can communicate using service names
The service can be configured using the following environment variables:
OPEN_AI_KEY: Your OpenAI API key for LLM and embedding servicesMINIO_ACCESS_KEY: MinIO access key for object storageMINIO_SECRET_KEY: MinIO secret key for object storage
LOG_LEVEL: Logging level (default:DEBUG)SERVICE_HOST: Service host address (default:127.0.0.1)SERVICE_PORT: Service port (default:8000)
MINIO_HOST: MinIO server host (default:minio)MINIO_PORT: MinIO server port (default:9000)QDRANT_HOST: Qdrant vector database host (default:qdrant)QDRANT_PORT: Qdrant vector database port (default:6333, Docker Compose uses6334)
MAX_RETRIEVAL_RESULTS: Maximum number of document chunks to retrieve (default:5)MAX_DOCUMENT_TEXT_LENGTH: Maximum length of document text for processing (default:3000)
When running with Docker Compose, most variables are automatically configured:
MINIO_HOST=minio
MINIO_PORT=9000
MINIO_ACCESS_KEY=drm-document-service
MINIO_SECRET_KEY=drm-document-service-secret-key
QDRANT_HOST=qdrant
QDRANT_PORT=6334
SERVICE_PORT=8000
SERVICE_HOST=0.0.0.0
SERVICE_DEBUG=true
LOG_LEVEL=DEBUGNote: The Qdrant service exposes both ports 6333 and 6334, but the application is configured to use port 6334.
MinIO Credentials: The MinIO service uses MINIO_ROOT_USER=drm-document-service and MINIO_ROOT_PASSWORD=drm-document-service-secret-key internally, which map to the application's MINIO_ACCESS_KEY and MINIO_SECRET_KEY environment variables.
You only need to provide your OPEN_AI_KEY in the .env file.
flowchart LR
Client([Client / CLI])
API[FastAPI<br/>/query · /upload · /documents]
Pipeline[DocumentPipeline]
subgraph Agents [Pydantic-AI Agents]
Orch[Orchestrator]
Guard[Guardrail]
Retr[Retrieval]
end
Embed[EmbeddingsService]
Parser[PdfParserService]
Qdrant[(Qdrant<br/>vector DB)]
Minio[(MinIO<br/>object storage)]
OpenAI[(OpenAI<br/>LLM + embeddings)]
Client --> API
API --> Pipeline
API --> Parser
API --> Minio
Parser --> Embed
Pipeline --> Orch
Orch --> Guard
Orch --> Retr
Retr --> Embed
Retr --> Qdrant
Embed --> OpenAI
Orch --> OpenAI
Guard --> OpenAI
- Query Reception: FastAPI endpoint receives the user's question via
/query - Pipeline Orchestration: DocumentPipeline initializes and coordinates the multi-agent workflow
- Safety Validation: GuardrailAgent validates the query for appropriateness and safety
- Document Retrieval: RetrievalAgent performs semantic search to find relevant document parts
- Embedding Generation: EmbeddingsService converts the query into a vector using OpenAI's embedding model
- Similarity Search: EmbeddingsRepository searches Qdrant vector database for semantically similar content
- Answer Synthesis: OrchestratorAgent combines retrieved context with LLM reasoning to generate the final answer
- Response Delivery: Structured response includes the answer, sources, confidence score, and relevance flags
sequenceDiagram
participant U as Client
participant API as FastAPI /query
participant O as Orchestrator
participant G as Guardrail
participant R as Retrieval
participant E as Embeddings
participant Q as Qdrant
participant L as OpenAI
U->>API: POST /query {question}
API->>O: run(question)
O->>G: validate(question)
G-->>O: safe / blocked
O->>R: retrieve(question)
R->>E: embed(question)
E->>L: embeddings API
L-->>E: vector
E-->>R: vector
R->>Q: similarity search
Q-->>R: top-k chunks
R-->>O: context
O->>L: synthesize answer
L-->>O: answer + confidence
O-->>API: answer, sources, is_relevant
API-->>U: QueryResponseSchema
- 🎭 Agent Layer: Pydantic-AI agents that handle orchestration, safety, and retrieval
- 🧠 Logic Layer: Business logic services for embeddings and document processing
- 💾 Storage Layer: Repositories managing Qdrant vector database and MinIO object storage
- 🏗️ Infrastructure: External services (OpenAI, Qdrant, MinIO) supporting the RAG pipeline
The service includes a beautiful CLI tool built with Rich formatting for working with all endpoints. The CLI provides comprehensive document management and querying capabilities.
The CLI dependencies are already included in the project. After running uv sync, you can use the CLI immediately.
# Use default server (http://localhost:8000)
uv run python cli.py --help
# Specify custom server URL
uv run python cli.py --server https://your-server.com --helpuv run python cli.py health# Upload a single PDF document
uv run python cli.py upload --file /path/to/document.pdf
uv run python cli.py upload -f /path/to/document.pdf
# Upload all PDFs from a folder
uv run python cli.py upload --folder /path/to/folder
uv run python cli.py upload -d /path/to/folder
# Upload PDFs recursively from folder and subfolders
uv run python cli.py upload --folder /path/to/folder --recursive
uv run python cli.py upload -d /path/to/folder -r
# List all documents
uv run python cli.py list-docs
# Get detailed document information
uv run python cli.py info <document-id>
# Download a document
uv run python cli.py download <document-id> --output downloaded.pdf
uv run python cli.py download <document-id> -o downloaded.pdf
# Delete a document (with confirmation prompt)
uv run python cli.py delete <document-id>
# Delete a document (skip confirmation)
uv run python cli.py delete <document-id> --yes
uv run python cli.py delete <document-id> -y# Ask a question about your documents
uv run python cli.py query --question "What is the main topic of the documents?"
uv run python cli.py query -q "What is the main topic of the documents?"
# Interactive query (prompts for question)
uv run python cli.py query# Complete workflow example
uv run python cli.py upload -f ~/documents/manual.pdf
uv run python cli.py list-docs
uv run python cli.py query -q "How do I configure the system?"
# Batch upload example
uv run python cli.py upload -d ~/documents -r
uv run python cli.py query -q "What are the main features?"