Skip to content

Haaretz/editors-agent

Repository files navigation

error-corrector

A FastAPI service that takes HTML in and returns the same HTML with Hebrew word-level corrections wrapped in <mark class="error" data-corrections="...">…</mark>. Inference is remote via LiteLLM; no model weights are bundled, so the image is small and CPU-only.

This is the first stage of the broader editors-agent project. Future stages will reuse the same /correct HTTP contract with different correction backends.


Quick start (local)

cp .env.example .env       # fill in at least one provider key
docker compose up --build

Service listens on http://localhost:8000.

Interactive API docs: http://localhost:8000/docs.


API

GET /health

Liveness and readiness probe.

curl -s http://localhost:8000/health
{
  "status": "ok",
  "default_model": "gpt-5.5",
  "has_anthropic_key": true,
  "has_openai_key": true,
  "has_google_key": false
}

POST /correct

Raw HTML in, raw HTML out.

Field Where Required Default Notes
body request body yes HTML document or fragment, UTF-8
X-Model request header no DEFAULT_MODEL LiteLLM model id override
X-Service-Tier request header no DEFAULT_SERVICE_TIER priority / flex / default (OpenAI only)

Response:

  • Status 200, Content-Type: text/html; charset=utf-8, body = corrected HTML
  • Headers: X-Model, X-Chunks, X-Total-Ms, X-Cost-Usd, optional X-Warning

Error envelope (status 400 / 413 / 500): {"detail": "..."}.

curl -X POST http://localhost:8000/correct \
  -H "Content-Type: text/html; charset=utf-8" \
  --data-binary @article.html

GET /metrics

Prometheus exposition (FastAPI default instrumentation). Excluded from OpenAPI.


Configuration

All settings come from environment variables. For local dev they can be loaded from a .env file in the working directory.

Name Default Required Notes
ANTHROPIC_API_KEY for claude-* models secret
OPENAI_API_KEY for gpt-* models secret
GOOGLE_API_KEY for gemini/* models secret
DEFAULT_MODEL gpt-5.5 no LiteLLM model id used when X-Model is absent
DEFAULT_SERVICE_TIER priority no Default OpenAI service tier: priority / flex / default / off. Used when X-Service-Tier is absent
VERIFIER_MODEL claude-haiku-4-5 no Validator model. Set to off to skip verification
MAX_HTML_BYTES 5242880 (5 MiB) no Request body cap
LOG_LEVEL INFO no Standard Python logging level
LOG_FORMAT json no json for prod, text for local dev
PORT 8000 no Host port mapping (docker compose only)

Deployment (Kubernetes / ArgoCD)

Image build & push

docker build -t <registry>/error-corrector:<tag> .
docker push <registry>/error-corrector:<tag>

The image:

  • runs uvicorn app.main:app on port 8000
  • runs as non-root appuser (uid 1001)
  • includes an in-image HEALTHCHECK hitting /health
  • has no persistence — the service is fully stateless

Secret for provider API keys

apiVersion: v1
kind: Secret
metadata:
  name: error-corrector-keys
type: Opaque
stringData:
  ANTHROPIC_API_KEY: <redacted>
  # OPENAI_API_KEY: <redacted>
  # GOOGLE_API_KEY: <redacted>

Suggested Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: error-corrector
spec:
  replicas: 2
  selector:
    matchLabels:
      app: error-corrector
  template:
    metadata:
      labels:
        app: error-corrector
    spec:
      containers:
        - name: app
          image: <registry>/error-corrector:<tag>
          ports:
            - containerPort: 8000
              name: http
          envFrom:
            - secretRef:
                name: error-corrector-keys
          env:
            - name: DEFAULT_MODEL
              value: gpt-5.5
            - name: DEFAULT_SERVICE_TIER
              value: priority
            - name: VERIFIER_MODEL
              value: claude-haiku-4-5
            - name: LOG_FORMAT
              value: json
          readinessProbe:
            httpGet:
              path: /health
              port: http
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /health
              port: http
            periodSeconds: 30
            failureThreshold: 3
          resources:
            requests:
              cpu: "200m"
              memory: "256Mi"
            limits:
              cpu: "1000m"
              memory: "768Mi"

Resource numbers are a starting point. The service is CPU-light (HTTP glue plus a Node subprocess for parse5) but each POST /correct triggers N synchronous LLM calls — one per Hebrew text node, plus batched verifier calls. Latency scales with article length, so scale horizontally via HPA, not vertically.

HorizontalPodAutoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: error-corrector
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: error-corrector
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Service / Ingress

Standard ClusterIP Service on port 80 → containerPort 8000, exposed through the org's Ingress / Gateway. Nothing service-specific is required.

ArgoCD

Stateless and immutable — a standard Application manifest pointing at the deployment-manifest path is enough. This repo ships the container image only.


Observability

  • Logs: structured JSON to stdout (LOG_FORMAT=json). Notable events:
    • extract_nodes — per request, with n_nodes and n_relevant
    • pipeline_complete — per request, with model, n_chunks, total_ms, cost_usd, invariant_ok, warnings_count
    • pipeline_failed — uncaught pipeline exception with traceback
    • detector_parse_failed — model returned invalid JSON for a text node
  • Metrics: Prometheus exposition at GET /metrics. Includes the default FastAPI instrumentation (http_requests_total, http_request_duration_seconds, etc.). Scrape with the standard ServiceMonitor / PodMonitor.

Operational notes

  • Stateless: every request is independent. No DB, no cache, no on-disk state. Pods are interchangeable.
  • Latency scales with the number of Hebrew text nodes in the input. A multi-paragraph article on Sonnet typically completes in a few seconds.
  • Cost is reported per request via the X-Cost-Usd response header. Token accounting uses LiteLLM's pricing tables with a heuristic fallback.
  • Manual reconstruction: the model only returns JSON edit candidates. The service inserts <mark> tags into the original HTML deterministically in code. If stripping inserted marks would not exactly recover the input HTML, the service returns the original HTML and surfaces a warning via X-Warning.
  • Provider compatibility: requests with optional parameters that a provider rejects (e.g. service_tier, response_format) are retried once without the unsupported parameter, and the omission is surfaced via X-Warning.
  • No model weights: all inference is remote via LiteLLM.

Repo layout

error-corrector/
├── pyproject.toml         # Python deps (PEP 621)
├── package.json           # parse5 (Node)
├── Dockerfile             # Python 3.12 + Node 20, non-root, in-image HEALTHCHECK
├── docker-compose.yml     # local dev only
├── .env.example
├── README.md
└── app/
    ├── main.py            # FastAPI app, logging + metrics setup
    ├── routes.py          # GET /health, POST /correct
    ├── config.py          # pydantic-settings
    ├── logging_setup.py   # structlog JSON config
    ├── pipeline.py        # extract → detect → verify → reconstruct
    ├── llm/
    │   ├── client.py      # litellm wrapper with retries
    │   ├── prompts.py     # detector + verifier prompts and schemas
    │   └── costs.py       # token / cost accounting
    ├── html/
    │   ├── nodes.py       # parse5 subprocess driver
    │   └── markup.py      # tokenization, mark insertion, mark stripping
    └── tools/
        └── html-text-nodes.mjs   # parse5-based extractor (Node)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages