Otoroshi LLM Extension v0.0.78
What's New
OCR Models (new entity type) — #176
OCR (Optical Character Recognition) is now a first-class entity type — the OCR Model — alongside Audio, Image, Embedding and Moderation
Models. It enables text extraction from images and PDF documents through a unified, Mistral-inspired API.
- New
OcrModelentity: dedicated datastore, in-memory state, and a new OCR models admin UI page (Monaco-based config editor) - Supported providers:
- Mistral 🇫🇷 🇪🇺 —
mistral-ocr-latest,mistral-ocr-2505 - AlphaEdge 🇫🇷 🇪🇺 —
alpha-digit-max,alpha-digit-medium
- Mistral 🇫🇷 🇪🇺 —
- Three ways to call OCR:
- Dedicated plugin —
Cloud APIM - OCR backendexposesPOST /ocr - Unified API — the
OpenAI Compatible APIplugin now exposesPOST /ocr(via the newocr_model_refs), alongside chat, audio, image,
embedding and moderation - Workflow function — the new
ocr_callfunction for agentic pipelines
- Dedicated plugin —
- Flexible input handling: remote URL, base64 data-uri, inline base64, raw byte array, or multipart file upload — over two transports (JSON
body Mistral-style, ormultipart/form-data) - OCR through text models: OCR can also flow through a regular LLM provider — call
/chat/completionswith an image/PDF content part and get
the extracted text back as a standard chat completion (reuses existing OpenAI clients, model constraints, caching, budgets and observability) - Vault integration for API tokens, model constraints (allow/block lists), and
max_size_uploadnow also applies to OCR uploads
AlphaEdge provider (new) — #175
New French/EU 🇫🇷 🇪🇺 provider specialized in speech transcription and OCR. Authentication uses the X-API-Key header.
- Speech-to-Text (STT) — model
alpha-audio-v1, withenable_diarization(speaker diarization) andenable_postcorrect(linguistic
post-correction: punctuation, capitalization, spelling, stuttering removal); both can be overridden per request - OCR —
alpha-digit-max/alpha-digit-medium, usable either as a dedicated OCR Model or as a standard text/LLM provider - Optional
pdf_passwordfor protected PDFs, comma-separated token rotation, and vault references
Documentation
- New OCR documentation section: introduction, providers, plugins, OCR-through-text-models, and the
ocr_callworkflow function - Updated Audio STT and OpenAI-Compatible API docs for the new endpoints and AlphaEdge