Production-oriented RAG stack: FastAPI, FAISS + SQLite, OpenAI or local embeddings (sentence-transformers), OpenAI or Ollama for the LLM, and a minimal Streamlit UI.
- Python 3.13+ (recommended baseline)
Create and activate a virtual environment first so dependencies stay isolated from the system Python.
Windows (PowerShell)
cd c:\Repos\rag
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -e .
# Optional: pip install -e ".[local]" # sentence-transformers for EMBEDDING_PROVIDER=local
# Optional: pip install -e ".[ui]" # Streamlit UI
# Contributors: pip install -e ".[dev]"Runtime dependencies are in pyproject.toml. Extras: local (local embeddings), ui (Streamlit). dev is for tests and tooling (ruff, mypy, pytest, pytest-cov).
If script execution is blocked by policy, for the current user once:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUserLinux / macOS
cd /path/to/rag
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
# Optional: pip install -e ".[local]" ".[ui]" # as needed
# Contributors: pip install -e ".[dev]"Your prompt should show (.venv).
copy .env.example .env # Windows
# cp .env.example .env # UnixEdit .env: OpenAI key (if using EMBEDDING_PROVIDER=openai and/or LLM_PROVIDER=openai), models, relevance threshold, and DATA_DIR / STORAGE_DIR.
With .venv activated:
uvicorn app.main:app --reload --host 127.0.0.1 --port 8000Docs: http://127.0.0.1:8000/docs
Install the UI extra (pip install -e ".[ui]" or pip install -e ".[dev,ui]" for contributors). In a second terminal, activate .venv, then:
streamlit run ui/streamlit_app.pySet RAG_API_URL (default http://127.0.0.1:8000) to match the API.
If the API enforces RAG_API_KEY, set the same key in the Streamlit process environment.
Install dev tools: pip install -e ".[dev]".
pytest
ruff format --check app tests ui
ruff check app tests ui
mypy appLocal pytest does not enable coverage by default. CI runs pytest with --cov=app --cov-report=term-missing --cov-fail-under=65 (see .github/workflows/ci.yml). To match CI locally:
pytest --cov=app --cov-report=term-missing --cov-fail-under=65pyproject.toml— dependencies, extras (local,ui,dev), ruff/mypy/pytest configapp/— FastAPI app, ingestion / retrieval / generation services, configdata/— uploaded filesstorage/— FAISS index and SQLite chunk metadataembeddings/— optional model cache (HF_HOME=embeddingsin.env)ui/streamlit_app.py— chat, upload, and document removal
| Method | Path | Description |
|---|---|---|
| POST | /ingest |
Upload .txt / .pdf (same filename replaces existing chunks) |
| POST | /query |
Question → answer with sources and metrics |
| POST | /query/stream |
SSE stream with sources, token events, and terminal done/error events |
| GET | /documents |
List indexed documents |
| DELETE | /documents/{filename} |
Remove document from index and data/ (safe basename only) |
| GET | /health |
Service health, vector count, and optional LLM probe |
The same methods are available under the /v1 prefix (for example POST /v1/query). Legacy paths without /v1 remain available and include deprecation headers.
- Logs: JSON via
structlog; each request logsmethod,path,status_code,duration_ms; response includesX-Request-ID. Clients (including Streamlit) may sendX-Request-IDto correlate with logs. - Optional auth: set
RAG_API_KEYto requireX-API-KeyorAuthorization: Beareron API routes (/healthstays open). Control docs/openapi exemption withAPI_KEY_EXEMPT_DOCS. - Ingest size cap:
MAX_INGEST_BYTES(default 20MB); oversize uploads return HTTP 413. - Health LLM probe: set
HEALTH_CHECK_LLM=trueto includellm_ok/llm_errorinGET /health(short outbound check). - CORS:
CORS_ALLOW_ORIGINS(*or comma-separated origins) andCORS_ALLOW_CREDENTIALS(must befalsewhen origins contain*). - LLM / OpenAI HTTP timeout:
OPENAI_TIMEOUT_SECONDS(OpenAI embeddings and chat, and Ollama chat requests). - Rate limits: default
60/minuteglobal; endpoint-specific limits include ingest10/minuteand query/query_stream30/minute. - Legacy route policy: non-versioned routes remain available for compatibility and include
Deprecation+Linkheaders pointing to/v1. - Streamlit exposure: do not expose Streamlit directly on the public internet; place it behind reverse proxy + SSO/VPN/IP allowlist.
- Index files: written with
faiss.serialize_indexfor Unicode paths on Windows; olderwrite_indexfiles are loaded via a temporary ASCII path.
See .env.example for all environment variables.
apiimage installs the base package only (OpenAI embeddings by default). To runEMBEDDING_PROVIDER=localin containers, rebuild with build argINSTALL_EXTRAS=local(orlocal,uiif needed).uiimage is built withINSTALL_EXTRAS=uiso Streamlit is included.
See AGENTS.md for architecture notes and editing rules for this repository.