A simple, demo-ready RAG (Retrieval-Augmented Generation) system that answers patient-specific questions using patient data stored in Postgres + pgvector together with LLM capabilities.
The project is built with FastAPI microservices and designed as a lightweight healthcare AI architecture for demos, experimentation, and future production expansion.
Responsible for:
- Loading patient/FHIR JSON data
- Creating text chunks from notes, medications, and lab results
- Generating embeddings
- Storing chunks and embeddings in Postgres + pgvector
Responsible for:
- Performing vector similarity search
- Retrieving the most relevant patient chunks
- Returning top-k matching records
Responsible for:
- Embedding user questions
- Retrieving relevant context from
vector_service - Building prompts
- Generating grounded responses using OpenAI models
The AI service is stateless and can be scaled horizontally.
Responsible for:
- Acting as the entry point for frontend or staff applications
- Forwarding requests to the AI service
- Providing a simplified external API layer
+-------------------+
| API Gateway |
| :8003 |
+---------+---------+
|
v
+-------------------+
| AI Service |
| :8002 |
+---------+---------+
|
+---------------+----------------+
| |
v v
+-------------------+ +-------------------+
| OpenAI API | | Vector Service |
| Embeddings + LLM | | :8001 |
+-------------------+ +---------+---------+
|
v
+-------------------+
| Postgres + |
| pgvector |
+---------+---------+
^
|
+---------+---------+
| Data Service |
| :8000 |
+-------------------+
flowchart TB
Client[User / Frontend]
Gateway[API Gateway\nPOST /query]
AI[AI Service\nPOST /ask]
Vector[Vector Service\nPOST /search]
Data[Data Service\nPOST /load_patient_data]
DB[Postgres + pgvector]
OpenAI[OpenAI API]
Client -->|POST /query| Gateway
Gateway -->|patient_id + question| AI
AI -->|question embedding| Vector
Vector -->|retrieve chunks| DB
AI -->|send prompt + receive answer| OpenAI
Data -->|ingest + store embeddings| DB
- Docker
- Docker Compose
- OpenAI API key
Create a .env file and configure:
OPENAI_API_KEYVECTOR_DIMEMBEDDING_MODELLLM_MODEL
docker compose build --no-cache
docker compose up -ddocker compose psConfiguration values are loaded in the following order:
- Runtime environment variables
- Values from
.env - Default values defined in the services
The project stores patient chunks and embeddings in Postgres using pgvector.
Make sure the vector dimension matches the embedding model being used.
Examples:
| Embedding Model | Vector Dimension |
|---|---|
text-embedding-3-small |
1536 |
text-embedding-3-large |
3072 |
curl -X POST "http://localhost:8000/load_patient_data" \
-H "Content-Type: application/json" \
-d @sample_patient.jsonExpected response:
status: okchunks_loaded > 0
curl -X POST "http://localhost:8003/query" \
-H "Content-Type: application/json" \
-d '{"patient_id":"patient-001","question":"Does the patient have any chronic conditions?"}'Expected response:
- Generated answer
- Retrieved context chunks
curl -X POST "http://localhost:8001/search" \
-H "Content-Type: application/json" \
-d '{"patient_id":"patient-001","query_embedding":[0.0,0.0,0.0],"top_k":3}'curl -X POST "http://localhost:8002/ask" \
-H "Content-Type: application/json" \
-d '{"patient_id":"patient-001","question":"What medications is the patient taking?"}'Inspect the schema and vector dimensions:
docker compose exec postgres \
psql -U postgres -d healthcare \
-c "\d+ patient_chunks"If you see errors related to vector dimensions:
- Ensure
VECTOR_DIMmatches the embedding model - Ensure the Postgres vector column dimension matches the configured model
Example mismatch:
- Database uses
vector(1536) - Application generates
3072dimension embeddings
If services fail because of missing packages:
docker compose up --build --force-recreate -d- Authentication and authorization
- HIPAA/security controls
- Audit logging
- Redis caching
- Better chunking strategies
- Metadata filtering
- Multi-patient isolation
- Streaming responses
- Kubernetes deployment
- Direct FHIR API integration
This project demonstrates a lightweight healthcare AI RAG architecture using:
- FastAPI microservices
- Postgres + pgvector
- OpenAI embeddings and LLMs
- Retrieval-augmented generation
- Docker-based deployment
It provides a clean foundation for building scalable healthcare AI systems that can answer patient-specific questions using grounded medical context.