Internal document search and Q&A platform for Mitos Therapeutics. Upload PDF, DOCX, PPTX, and HWP research documents, and chat with them using retrieval-augmented generation (RAG).
┌──────────────────────────────────────────────────────────┐
│ Browser / Client │
│ React + Vite + TypeScript (port 5173) │
└────────────────────────┬─────────────────────────────────┘
│ HTTP / REST
┌────────────────────────▼─────────────────────────────────┐
│ FastAPI Backend │
│ Python 3.12 (port 8000) │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ │
│ │ Auth/JWT │ │ Doc Service │ │ RAG Service │ │
│ │ (passlib) │ │ (extract, │ │ (embedding + │ │
│ │ │ │ chunk, │ │ vector search │ │
│ │ │ │ embed) │ │ + LLM) │ │
│ └─────────────┘ └──────────────┘ └─────────────────┘ │
└────────────────────────┬─────────────────────────────────┘
│ asyncpg / SQLAlchemy 2
┌────────────────────────▼─────────────────────────────────┐
│ PostgreSQL 16 + pgvector │
│ tables: users, categories, documents, │
│ document_chunks (768-dim vectors) │
└───────────────────────────────────────────────────────────┘
│ local filesystem
/app/uploads (named Docker volume)
| Extension | Format |
|---|---|
.pdf |
PDF (any version) |
.docx |
Microsoft Word |
.pptx |
Microsoft PowerPoint |
.hwp |
Hangul Word Processor |
.hwpx |
Hangul Word Processor X |
Categories are seeded automatically on first deployment:
- PHF20 – PHF20 research documents
- MTM6 – MTM6 documents
- LPMYO – LPMYO documents
- Sarcopenia – Sarcopenia documents
- Metabolic Syndrome – Metabolic Syndrome documents
- KDDF – KDDF documents
- Investor Relations – Investor Relations materials
- Patents – Patent documents
| Tool | Version | Notes |
|---|---|---|
| Docker | 24+ | Required for containerised deployment |
| Docker Compose | 2.20+ | Bundled with Docker Desktop |
| Git | any |
For local development without Docker:
| Tool | Version |
|---|---|
| Python | 3.12+ |
| Node | 22+ |
| PostgreSQL + pgvector | 16+ |
# 1. Clone the repository
git clone <repo-url>
cd mitos-rag
# 2. Configure environment
cp .env.example .env
# Edit .env and set at minimum:
# SECRET_KEY (strong random string)
# OPENAI_API_KEY (required if LLM_PROVIDER=openai)
# 3. Build and start all services
docker compose up --build
# 4. Access the app
# Frontend: http://localhost:5173
# API docs: http://localhost:8000/docsOn first start, Docker will:
- Download the
pgvector/pgvector:pg16image and start PostgreSQL - Build the Python backend image (downloads ~2 GB including sentence-transformers)
- Build the Node frontend image
- Run Alembic migrations and seed the 8 default categories
- Start the FastAPI server and Vite dev server
Note: The first build takes 5-15 minutes depending on your internet connection because
sentence-transformersand its PyTorch dependency are large. Subsequent builds are cached.
- Open http://localhost:5173
- Click Register and create your account
- Upload a document from the Documents page (PDF, DOCX, PPTX, or HWP)
- Once processing is complete (status turns Ready), open the Chat page
- Ask questions in natural language — relevant source citations appear alongside answers
Copy .env.example to .env and customise:
| Variable | Default | Description |
|---|---|---|
POSTGRES_DB |
mitos_rag |
Database name |
POSTGRES_USER |
mitos |
Database user |
POSTGRES_PASSWORD |
mitos |
Database password |
| Variable | Default | Description |
|---|---|---|
DATABASE_URL |
postgresql+asyncpg://... |
Full async DSN; must point at the db service |
SECRET_KEY |
(must change) | JWT signing secret; use a random 32+ char string |
ALGORITHM |
HS256 |
JWT algorithm |
ACCESS_TOKEN_EXPIRE_MINUTES |
60 |
Token lifetime in minutes |
UPLOAD_ROOT |
/app/uploads |
Container path for uploaded files |
CORS_ORIGINS |
http://localhost:5173 |
Comma-separated allowed origins |
EMBEDDING_PROVIDER |
sentence-transformers |
sentence-transformers or openai |
EMBEDDING_MODEL_NAME |
BAAI/bge-base-en-v1.5 |
HuggingFace model name (768-dim) |
CHUNK_SIZE |
800 |
Max tokens per chunk |
CHUNK_OVERLAP |
120 |
Overlap tokens between chunks |
LLM_PROVIDER |
disabled |
openai to enable answer generation |
LLM_MODEL |
gpt-4.1-mini |
OpenAI model name |
OPENAI_API_KEY |
(empty) | Required when LLM_PROVIDER=openai |
LOG_LEVEL |
INFO |
DEBUG, INFO, WARNING, ERROR |
| Variable | Default | Description |
|---|---|---|
VITE_API_BASE_URL |
http://localhost:8000/api/v1 |
Backend base URL seen by browser |
cd backend
python -m venv .venv
# Windows:
.\.venv\Scripts\activate
# Linux/macOS:
source .venv/bin/activate
pip install -r requirements.txt -r requirements-test.txt
# Create a .env file with DATABASE_URL pointing to your local Postgres
cp ../.env.example .env
# Edit DATABASE_URL to use localhost instead of "db"
# Run migrations
alembic upgrade head
# Start the development server
uvicorn app.main:app --reload --port 8000cd frontend
npm install
npm run devcd backend
# Start a test PostgreSQL database (or use your existing dev DB)
export TEST_DATABASE_URL=postgresql+asyncpg://mitos:mitos@localhost:5432/mitos_rag_test
# Run all tests
pytest
# Run only chunking unit tests (no database required)
pytest tests/test_chunking.py
# Run with verbose output
pytest -vInteractive Swagger UI is available at http://localhost:8000/docs when the backend is running.
Main endpoint groups:
| Prefix | Description |
|---|---|
GET /api/v1/health |
Service health check |
POST /api/v1/auth/... |
Register, login, current user |
GET/POST/DELETE /api/v1/documents/... |
Upload, list, detail, delete |
GET/POST /api/v1/categories/... |
List and manage categories |
POST /api/v1/chat/query |
RAG question answering |
POST /api/v1/chat/retrieve |
Vector similarity search only |
GET /api/v1/dashboard |
Aggregate statistics |
- Check logs:
docker compose logs backend - Ensure the
dbservice is healthy before backend starts (the health check handles this automatically) - Verify
SECRET_KEYis set in.env
- Set
EMBEDDING_PROVIDER=sentence-transformersin.envand restart
- Embedding model download may still be in progress on first run
- Check
docker compose logs backendfor download progress
- Change
portsindocker-compose.yml(e.g."8001:8000") and updateVITE_API_BASE_URL
- The
pgvector/pgvector:pg16image includes the extension; Alembic migrations install it withCREATE EXTENSION IF NOT EXISTS vector
- Add your frontend origin to
CORS_ORIGINSin.env, e.g.CORS_ORIGINS=http://localhost:5173,http://myserver:5173