Config-driven Retrieval-Augmented Generation service: FastAPI + Supabase (PostgREST + pgvector) + pluggable chat providers (Gemini by default; optional OpenAI/DeepSeek fallbacks). Embeddings are implemented via Gemini in this repo; see rag/embeddings.py / rag/gemini_client.py.
The worker stays domain-agnostic: callers supply project_id, task, text, optional metadata, docs_filters, and instructions. Nothing in the code assumes pharmacies, tickets, or a specific CRM.
- Resolve tenant config from env vars:
{PREFIX}_VECTOR_RPC, optional persistence tables, defaults. - Embed the query text (Gemini by default); optional
EMBEDDING_DIMENSIONcheck/truncation for pgvector compatibility. - Call your Postgres RPC (e.g.
match_documents) with the embedding + filters — this is how pgvector search is exposed through Supabase. - Build a prompt from retrieved snippets + instructions + metadata.
- Complete with
LLM_PROVIDER(Gemini), with optional fallback when other provider keys are configured. - Optionally persist success rows / failure rows to tenant tables.
flowchart LR
Client -->|POST /api/rag/interpret| API[FastAPI]
API --> Emb[Embeddings]
Emb --> RPC[Supabase RPC + pgvector]
RPC --> LLM[Chat LLM]
LLM --> API
API -->|optional| DB[(interpretations / failures)]
| Path | Role |
|---|---|
main.py |
FastAPI app, auth, /api/rag/interpret |
rag/config.py |
Global + per-project env loading; project_id → uppercase prefix |
rag/embeddings.py |
generate_embedding (Gemini/OpenAI providers) |
rag/llm.py |
Gemini + optional DeepSeek/OpenAI chat fallback |
rag/prompts.py |
Prompt assembly |
rag/retrieve.py |
supabase.rpc(...) wrapper + filter forwarding |
rag/service.py |
Orchestration + persistence hooks |
models/request.py |
Pydantic request/response models |
sql/example_match_documents.sql |
Example RPC you install in Supabase |
tests/test_rag.py |
Mocked unit tests (pytest) |
| Variable | Required | Description |
|---|---|---|
SUPABASE_URL |
yes | Supabase project URL |
SUPABASE_SERVICE_KEY |
yes | Service role key (server only — never in browsers) |
WORKER_API_KEY |
no | If set, all requests must send Authorization: Bearer <token> |
EMBEDDING_PROVIDER |
no | gemini (default). Supported: gemini, openai |
EMBEDDING_MODEL |
no | Default gemini-embedding-001 |
GEMINI_API_KEY |
yes* | *Required when using Gemini embeddings |
OPENAI_API_KEY |
yes* | *Required when using OpenAI embeddings |
EMBEDDING_DIMENSION |
no | If set, embedding length must match (catches model/table mismatch early) |
LLM_PROVIDER |
no | gemini (default); optional: openai, deepseek |
LLM_MODEL |
no | Primary chat model env (Gemini uses GEMINI_LLM_MODEL) |
LLM_TEMPERATURE |
no | Default 0.2 |
GEMINI_LLM_MODEL |
no | Default gemini-3.1-flash-lite-preview |
GEMINI_LLM_TEMPERATURE |
no | Default 0.2 |
DEEPSEEK_API_KEY |
no | DeepSeek (OpenAI-compatible) API key (optional fallback) |
DEEPSEEK_BASE_URL |
no | Default https://api.deepseek.com |
OPENAI_LLM_MODEL |
no | Used as OpenAI fallback when applicable (default gpt-4o-mini) |
DEEPSEEK_LLM_MODEL |
no | Used as DeepSeek fallback when applicable (default deepseek-chat) |
LOG_PII |
no | true to log prompt bodies (dev only) |
LOG_LEVEL |
no | e.g. INFO, DEBUG |
Example: JSON project_id: "greatrx" reads GREATRX_*.
| Variable | Required | Description |
|---|---|---|
{PREFIX}_VECTOR_RPC |
yes | PostgREST RPC name performing similarity search |
{PREFIX}_INTERPRETATIONS_TABLE |
no | If set, successful runs are inserted here |
{PREFIX}_FAILED_RUNS_TABLE |
no | If set, failures are recorded for retry/debug |
{PREFIX}_DEFAULT_INSTRUCTIONS |
no | Used when the request omits instructions |
{PREFIX}_DOCS_* |
no | Documentation / operator hints only (not used for SQL table name at runtime) |
{PREFIX}_WORKER_API_KEY |
no | If global WORKER_API_KEY is unset, this tenant can require its own bearer token |
Auth rule: If WORKER_API_KEY is set, it is required. Otherwise, if {PREFIX}_WORKER_API_KEY is set for that tenant, that bearer token is required. If neither is set, the endpoint is open (fine for local demos only).
- Apply
sql/example_match_documents.sqlin the SQL editor (adjust table name, columns, andvector(N)dimension). - Ensure the function parameter names match what the worker sends: at minimum
p_query_embedding,p_match_count,p_category,p_tags(seerag/retrieve.py). Add more parameters in SQL and pass them viadocs_filtersusingp_*keys or plain names (auto-prefixed withp_). - Set
{PREFIX}_VECTOR_RPCto the function name (e.g.match_documents).
Suggested persistence schemas (align with rag/service.py inserts):
create table if not exists interpretations (
id uuid primary key default gen_random_uuid(),
task text,
external_ref jsonb not null,
interpretation text not null,
metadata jsonb,
docs_used jsonb,
confidence double precision,
model text,
processing_time_ms int,
created_at timestamptz not null default now()
);
create table if not exists interpretation_failures (
id uuid primary key default gen_random_uuid(),
task text,
request_payload jsonb,
error text,
stage text,
created_at timestamptz not null default now()
);{
"project_id": "greatrx",
"task": "interpret_ticket_group",
"data": {
"text": "Tickets about Lipitor shortage at Hollywood CVS...",
"metadata": { "pharmacy_ids": [216, 220], "medication_id": 5 },
"ticket_group_id": 123
},
"instructions": "Optional override for default instructions.",
"docs_filters": {
"category": "pharmacy_playbook",
"tags": ["shortage", "lipitor"]
},
"match_count": 5
}Response:
{
"interpretation": "...",
"confidence": 0.87,
"docs_used": [42],
"processing_time_ms": 1200,
"external_ref": {
"metadata": { "pharmacy_ids": [216, 220], "medication_id": 5 },
"ticket_group_id": 123
},
"model": "deepseek-chat"
}confidence is derived from the maximum similarity value returned by your RPC when present; otherwise null.
Loads all rows for an issue_group_id from a Supabase table (default issue_group_comments with column body, or point at tickets and list the text columns), concatenates them into the query text, then runs the same RAG + LLM path as /api/rag/interpret (vector RPC + retrieval).
{
"project_id": "greatrx",
"issue_group_id": 101,
"task": "interpret_issue_group",
"instructions": null,
"docs_filters": { "category": "playbook", "tags": ["shortage"] },
"match_count": 5
}Database setup (one script): sql/issue_groups_and_interpretations.sql.
Optional env (defaults shown):
| Variable | Default | Notes |
|---|---|---|
{PREFIX}_ISSUE_GROUP_SOURCE_TABLE |
(unset) | If set, used as the table name (e.g. tickets). Otherwise {PREFIX}_ISSUE_GROUP_COMMENTS_TABLE or issue_group_comments. |
{PREFIX}_ISSUE_GROUP_COMMENTS_TABLE |
issue_group_comments |
Legacy override when SOURCE_TABLE is unset. |
{PREFIX}_ISSUE_GROUP_ID_COLUMN |
issue_group_id |
FK column on that table. |
{PREFIX}_ISSUE_GROUP_ORDER_COLUMN |
id |
.order(..., desc=False) when loading rows. |
{PREFIX}_ISSUE_GROUP_TEXT_COLUMNS |
(unset) | Comma-separated columns merged per row (e.g. title,description). If unset, a single {PREFIX}_COMMENT_BODY_COLUMN (default body) is used. |
{PREFIX}_COMMENT_BODY_COLUMN |
body |
Single-column mode only. |
Set {PREFIX}_INTERPRETATIONS_TABLE to a dedicated table (e.g. interpretations), not knowledge_base.
To call several groups in sequence (e.g. 101–105) with a delay between requests for API rate/cost pacing, use scripts/run_interpret_issue_groups.sh and set WORKER_BASE_URL, optional WORKER_API_KEY, and optional DELAY_SECONDS (default 120).
Returns { "status": "ok" }.
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# edit .env — include OPENAI_API_KEY, Supabase keys, and GREATRX_VECTOR_RPC, etc.
uvicorn main:app --reload --host 0.0.0.0 --port 8000Your match_documents RPC queries public.knowledge_base.embedding. If that column is NULL, retrieval will return no matches.
You can backfill embeddings in two ways:
- A standalone script (one-time / offline)
- An admin HTTP endpoint (incremental; call it whenever you add new rows)
To backfill embeddings for rows where embedding IS NULL:
. .venv/bin/activate
export SUPABASE_URL=...
export SUPABASE_SERVICE_KEY=...
export EMBEDDING_PROVIDER=gemini
export EMBEDDING_MODEL=gemini-embedding-001
export EMBEDDING_DIMENSION=1536
python scripts/backfill_embeddings.pyOptional env vars:
EMBEDDINGS_TABLE(defaultknowledge_base)EMBEDDINGS_CONTENT_COLUMN(defaultcontent)EMBEDDINGS_EMBEDDING_COLUMN(defaultembedding)EMBEDDINGS_BACKFILL_BATCH_SIZE(default50)EMBEDDINGS_SLEEP_SECONDS(default0.0)
POST /api/ops/backfill-embeddings
This endpoint computes embeddings for rows where embedding_column IS NULL and writes them back. It is incremental: once a row has an embedding, it is skipped.
Request body:
{
"project_id": "greatrx",
"limit": 100,
"batch_size": 50,
"sleep_seconds": 0.0,
"dry_run": false,
"table": "knowledge_base",
"content_column": "content",
"embedding_column": "embedding"
}Notes:
table/content_column/embedding_columnare optional; if omitted, the worker uses tenant env config ({PREFIX}_DOCS_TABLE,{PREFIX}_DOCS_CONTENT_COLUMN,{PREFIX}_DOCS_EMBEDDING_COLUMN) and otherwise defaults toknowledge_base/content/embedding.- The endpoint uses the same embedding provider/model as
/api/rag/interpret(your Gemini settings, in Gemini-only mode).
With Railway CLI linked to a project:
railway run uvicorn main:app --host 0.0.0.0 --port 8000Tests:
pytest -q- Create a new Railway service from this repo (GitHub or
railway up). - Set Variables to match
.env.example(Supabase URL/key, model keys,{PREFIX}_VECTOR_RPC, optional tables). - Railway sets
PORT;Procfileuses uvicorn on0.0.0.0. runtime.txtrequests Python 3.11.x (Nixpacks/Heroku-style builders honor it; if your builder ignores it, set Python 3.11 in the platform UI).
be sure to not be in a legacy geo. if it wont change, log out and log back in
- Another LLM: add a branch in
rag/llm.py(or small provider modules) using the same OpenAI-compatible message shape. - Another embedding model: implement a new provider in
rag/embeddings.pyand setEMBEDDING_PROVIDER+ dimension checks. - Stricter security: keep the service behind your backend only; rotate
SUPABASE_SERVICE_KEY; useWORKER_API_KEYor per-tenant keys in production.
MIT (add a LICENSE file if you publish as OSS).