A FastAPI service that takes HTML in and returns the same HTML with Hebrew word-level corrections wrapped in <mark class="error" data-corrections="...">…</mark>. Inference is remote via LiteLLM; no model weights are bundled, so the image is small and CPU-only.
This is the first stage of the broader editors-agent project. Future stages will reuse the same /correct HTTP contract with different correction backends.
cp .env.example .env # fill in at least one provider key
docker compose up --buildService listens on http://localhost:8000.
Interactive API docs: http://localhost:8000/docs.
Liveness and readiness probe.
curl -s http://localhost:8000/health{
"status": "ok",
"default_model": "gpt-5.5",
"has_anthropic_key": true,
"has_openai_key": true,
"has_google_key": false
}Raw HTML in, raw HTML out.
| Field | Where | Required | Default | Notes |
|---|---|---|---|---|
| body | request body | yes | — | HTML document or fragment, UTF-8 |
X-Model |
request header | no | DEFAULT_MODEL |
LiteLLM model id override |
X-Service-Tier |
request header | no | DEFAULT_SERVICE_TIER |
priority / flex / default (OpenAI only) |
Response:
- Status
200,Content-Type: text/html; charset=utf-8, body = corrected HTML - Headers:
X-Model,X-Chunks,X-Total-Ms,X-Cost-Usd, optionalX-Warning
Error envelope (status 400 / 413 / 500): {"detail": "..."}.
curl -X POST http://localhost:8000/correct \
-H "Content-Type: text/html; charset=utf-8" \
--data-binary @article.htmlPrometheus exposition (FastAPI default instrumentation). Excluded from OpenAPI.
All settings come from environment variables. For local dev they can be loaded from a .env file in the working directory.
| Name | Default | Required | Notes |
|---|---|---|---|
ANTHROPIC_API_KEY |
— | for claude-* models |
secret |
OPENAI_API_KEY |
— | for gpt-* models |
secret |
GOOGLE_API_KEY |
— | for gemini/* models |
secret |
DEFAULT_MODEL |
gpt-5.5 |
no | LiteLLM model id used when X-Model is absent |
DEFAULT_SERVICE_TIER |
priority |
no | Default OpenAI service tier: priority / flex / default / off. Used when X-Service-Tier is absent |
VERIFIER_MODEL |
claude-haiku-4-5 |
no | Validator model. Set to off to skip verification |
MAX_HTML_BYTES |
5242880 (5 MiB) |
no | Request body cap |
LOG_LEVEL |
INFO |
no | Standard Python logging level |
LOG_FORMAT |
json |
no | json for prod, text for local dev |
PORT |
8000 |
no | Host port mapping (docker compose only) |
docker build -t <registry>/error-corrector:<tag> .
docker push <registry>/error-corrector:<tag>The image:
- runs
uvicorn app.main:appon port8000 - runs as non-root
appuser(uid 1001) - includes an in-image
HEALTHCHECKhitting/health - has no persistence — the service is fully stateless
apiVersion: v1
kind: Secret
metadata:
name: error-corrector-keys
type: Opaque
stringData:
ANTHROPIC_API_KEY: <redacted>
# OPENAI_API_KEY: <redacted>
# GOOGLE_API_KEY: <redacted>apiVersion: apps/v1
kind: Deployment
metadata:
name: error-corrector
spec:
replicas: 2
selector:
matchLabels:
app: error-corrector
template:
metadata:
labels:
app: error-corrector
spec:
containers:
- name: app
image: <registry>/error-corrector:<tag>
ports:
- containerPort: 8000
name: http
envFrom:
- secretRef:
name: error-corrector-keys
env:
- name: DEFAULT_MODEL
value: gpt-5.5
- name: DEFAULT_SERVICE_TIER
value: priority
- name: VERIFIER_MODEL
value: claude-haiku-4-5
- name: LOG_FORMAT
value: json
readinessProbe:
httpGet:
path: /health
port: http
periodSeconds: 10
livenessProbe:
httpGet:
path: /health
port: http
periodSeconds: 30
failureThreshold: 3
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "1000m"
memory: "768Mi"Resource numbers are a starting point. The service is CPU-light (HTTP glue plus a Node subprocess for parse5) but each POST /correct triggers N synchronous LLM calls — one per Hebrew text node, plus batched verifier calls. Latency scales with article length, so scale horizontally via HPA, not vertically.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: error-corrector
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: error-corrector
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70Standard ClusterIP Service on port 80 → containerPort 8000, exposed through the org's Ingress / Gateway. Nothing service-specific is required.
Stateless and immutable — a standard Application manifest pointing at the deployment-manifest path is enough. This repo ships the container image only.
- Logs: structured JSON to stdout (
LOG_FORMAT=json). Notable events:extract_nodes— per request, withn_nodesandn_relevantpipeline_complete— per request, withmodel,n_chunks,total_ms,cost_usd,invariant_ok,warnings_countpipeline_failed— uncaught pipeline exception with tracebackdetector_parse_failed— model returned invalid JSON for a text node
- Metrics: Prometheus exposition at
GET /metrics. Includes the default FastAPI instrumentation (http_requests_total,http_request_duration_seconds, etc.). Scrape with the standardServiceMonitor/PodMonitor.
- Stateless: every request is independent. No DB, no cache, no on-disk state. Pods are interchangeable.
- Latency scales with the number of Hebrew text nodes in the input. A multi-paragraph article on Sonnet typically completes in a few seconds.
- Cost is reported per request via the
X-Cost-Usdresponse header. Token accounting uses LiteLLM's pricing tables with a heuristic fallback. - Manual reconstruction: the model only returns JSON edit candidates. The service inserts
<mark>tags into the original HTML deterministically in code. If stripping inserted marks would not exactly recover the input HTML, the service returns the original HTML and surfaces a warning viaX-Warning. - Provider compatibility: requests with optional parameters that a provider rejects (e.g.
service_tier,response_format) are retried once without the unsupported parameter, and the omission is surfaced viaX-Warning. - No model weights: all inference is remote via LiteLLM.
error-corrector/
├── pyproject.toml # Python deps (PEP 621)
├── package.json # parse5 (Node)
├── Dockerfile # Python 3.12 + Node 20, non-root, in-image HEALTHCHECK
├── docker-compose.yml # local dev only
├── .env.example
├── README.md
└── app/
├── main.py # FastAPI app, logging + metrics setup
├── routes.py # GET /health, POST /correct
├── config.py # pydantic-settings
├── logging_setup.py # structlog JSON config
├── pipeline.py # extract → detect → verify → reconstruct
├── llm/
│ ├── client.py # litellm wrapper with retries
│ ├── prompts.py # detector + verifier prompts and schemas
│ └── costs.py # token / cost accounting
├── html/
│ ├── nodes.py # parse5 subprocess driver
│ └── markup.py # tokenization, mark insertion, mark stripping
└── tools/
└── html-text-nodes.mjs # parse5-based extractor (Node)