Mitos RAG

Internal document search and Q&A platform for Mitos Therapeutics. Upload PDF, DOCX, PPTX, and HWP research documents, and chat with them using retrieval-augmented generation (RAG).

Architecture

┌──────────────────────────────────────────────────────────┐
│                      Browser / Client                     │
│              React + Vite + TypeScript (port 5173)        │
└────────────────────────┬─────────────────────────────────┘
                         │ HTTP / REST
┌────────────────────────▼─────────────────────────────────┐
│                     FastAPI Backend                       │
│                      Python 3.12 (port 8000)             │
│                                                           │
│  ┌─────────────┐  ┌──────────────┐  ┌─────────────────┐  │
│  │  Auth/JWT   │  │  Doc Service │  │   RAG Service   │  │
│  │  (passlib)  │  │  (extract,   │  │  (embedding +   │  │
│  │             │  │   chunk,     │  │   vector search  │  │
│  │             │  │   embed)     │  │   + LLM)        │  │
│  └─────────────┘  └──────────────┘  └─────────────────┘  │
└────────────────────────┬─────────────────────────────────┘
                         │ asyncpg / SQLAlchemy 2
┌────────────────────────▼─────────────────────────────────┐
│               PostgreSQL 16 + pgvector                    │
│         tables: users, categories, documents,             │
│                 document_chunks (768-dim vectors)          │
└───────────────────────────────────────────────────────────┘
                         │ local filesystem
                   /app/uploads  (named Docker volume)

Supported File Types

Extension	Format
`.pdf`	PDF (any version)
`.docx`	Microsoft Word
`.pptx`	Microsoft PowerPoint
`.hwp`	Hangul Word Processor
`.hwpx`	Hangul Word Processor X

Document Categories

Categories are seeded automatically on first deployment:

PHF20 – PHF20 research documents
MTM6 – MTM6 documents
LPMYO – LPMYO documents
Sarcopenia – Sarcopenia documents
Metabolic Syndrome – Metabolic Syndrome documents
KDDF – KDDF documents
Investor Relations – Investor Relations materials
Patents – Patent documents

Prerequisites

Tool	Version	Notes
Docker	24+	Required for containerised deployment
Docker Compose	2.20+	Bundled with Docker Desktop
Git	any

For local development without Docker:

Tool	Version
Python	3.12+
Node	22+
PostgreSQL + pgvector	16+

Quick Start (Docker)

# 1. Clone the repository
git clone <repo-url>
cd mitos-rag

# 2. Configure environment
cp .env.example .env
# Edit .env and set at minimum:
#   SECRET_KEY          (strong random string)
#   OPENAI_API_KEY      (required if LLM_PROVIDER=openai)

# 3. Build and start all services
docker compose up --build

# 4. Access the app
#   Frontend:  http://localhost:5173
#   API docs:  http://localhost:8000/docs

On first start, Docker will:

Download the pgvector/pgvector:pg16 image and start PostgreSQL
Build the Python backend image (downloads ~2 GB including sentence-transformers)
Build the Node frontend image
Run Alembic migrations and seed the 8 default categories
Start the FastAPI server and Vite dev server

Note: The first build takes 5-15 minutes depending on your internet connection because sentence-transformers and its PyTorch dependency are large. Subsequent builds are cached.

First-time Usage

Open http://localhost:5173
Click Register and create your account
Upload a document from the Documents page (PDF, DOCX, PPTX, or HWP)
Once processing is complete (status turns Ready), open the Chat page
Ask questions in natural language — relevant source citations appear alongside answers

Environment Variables Reference

Copy .env.example to .env and customise:

PostgreSQL

Variable	Default	Description
`POSTGRES_DB`	`mitos_rag`	Database name
`POSTGRES_USER`	`mitos`	Database user
`POSTGRES_PASSWORD`	`mitos`	Database password

Backend

Variable	Default	Description
`DATABASE_URL`	`postgresql+asyncpg://...`	Full async DSN; must point at the `db` service
`SECRET_KEY`	(must change)	JWT signing secret; use a random 32+ char string
`ALGORITHM`	`HS256`	JWT algorithm
`ACCESS_TOKEN_EXPIRE_MINUTES`	`60`	Token lifetime in minutes
`UPLOAD_ROOT`	`/app/uploads`	Container path for uploaded files
`CORS_ORIGINS`	`http://localhost:5173`	Comma-separated allowed origins
`EMBEDDING_PROVIDER`	`sentence-transformers`	`sentence-transformers` or `openai`
`EMBEDDING_MODEL_NAME`	`BAAI/bge-base-en-v1.5`	HuggingFace model name (768-dim)
`CHUNK_SIZE`	`800`	Max tokens per chunk
`CHUNK_OVERLAP`	`120`	Overlap tokens between chunks
`LLM_PROVIDER`	`disabled`	`openai` to enable answer generation
`LLM_MODEL`	`gpt-4.1-mini`	OpenAI model name
`OPENAI_API_KEY`	(empty)	Required when `LLM_PROVIDER=openai`
`LOG_LEVEL`	`INFO`	`DEBUG`, `INFO`, `WARNING`, `ERROR`

Frontend

Variable	Default	Description
`VITE_API_BASE_URL`	`http://localhost:8000/api/v1`	Backend base URL seen by browser

Development Setup (without Docker)

Backend

cd backend
python -m venv .venv
# Windows:
.\.venv\Scripts\activate
# Linux/macOS:
source .venv/bin/activate

pip install -r requirements.txt -r requirements-test.txt

# Create a .env file with DATABASE_URL pointing to your local Postgres
cp ../.env.example .env
# Edit DATABASE_URL to use localhost instead of "db"

# Run migrations
alembic upgrade head

# Start the development server
uvicorn app.main:app --reload --port 8000

Frontend

cd frontend
npm install
npm run dev

Running Tests

cd backend

# Start a test PostgreSQL database (or use your existing dev DB)
export TEST_DATABASE_URL=postgresql+asyncpg://mitos:mitos@localhost:5432/mitos_rag_test

# Run all tests
pytest

# Run only chunking unit tests (no database required)
pytest tests/test_chunking.py

# Run with verbose output
pytest -v

API Documentation

Interactive Swagger UI is available at http://localhost:8000/docs when the backend is running.

Main endpoint groups:

Prefix	Description
`GET /api/v1/health`	Service health check
`POST /api/v1/auth/...`	Register, login, current user
`GET/POST/DELETE /api/v1/documents/...`	Upload, list, detail, delete
`GET/POST /api/v1/categories/...`	List and manage categories
`POST /api/v1/chat/query`	RAG question answering
`POST /api/v1/chat/retrieve`	Vector similarity search only
`GET /api/v1/dashboard`	Aggregate statistics

Troubleshooting

Backend container exits immediately

Check logs: docker compose logs backend
Ensure the db service is healthy before backend starts (the health check handles this automatically)
Verify SECRET_KEY is set in .env

"Embedding provider is disabled" error

Set EMBEDDING_PROVIDER=sentence-transformers in .env and restart

Documents stay in "processing" status

Embedding model download may still be in progress on first run
Check docker compose logs backend for download progress

Port already in use

Change ports in docker-compose.yml (e.g. "8001:8000") and update VITE_API_BASE_URL

pgvector extension missing

The pgvector/pgvector:pg16 image includes the extension; Alembic migrations install it with CREATE EXTENSION IF NOT EXISTS vector

CORS errors in browser

Add your frontend origin to CORS_ORIGINS in .env, e.g. CORS_ORIGINS=http://localhost:5173,http://myserver:5173

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
docs		docs
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
plan-mitos-rag.md		plan-mitos-rag.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mitos RAG

Architecture

Supported File Types

Document Categories

Prerequisites

Quick Start (Docker)

First-time Usage

Environment Variables Reference

PostgreSQL

Backend

Frontend

Development Setup (without Docker)

Backend

Frontend

Running Tests

API Documentation

Troubleshooting

Backend container exits immediately

"Embedding provider is disabled" error

Documents stay in "processing" status

Port already in use

pgvector extension missing

CORS errors in browser

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mitos RAG

Architecture

Supported File Types

Document Categories

Prerequisites

Quick Start (Docker)

First-time Usage

Environment Variables Reference

PostgreSQL

Backend

Frontend

Development Setup (without Docker)

Backend

Frontend

Running Tests

API Documentation

Troubleshooting

Backend container exits immediately

"Embedding provider is disabled" error

Documents stay in "processing" status

Port already in use

pgvector extension missing

CORS errors in browser

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages