Otoroshi LLM Extension v0.0.78

What's New

OCR Models (new entity type) — #176

OCR (Optical Character Recognition) is now a first-class entity type — the OCR Model — alongside Audio, Image, Embedding and Moderation
Models. It enables text extraction from images and PDF documents through a unified, Mistral-inspired API.

New OcrModel entity: dedicated datastore, in-memory state, and a new OCR models admin UI page (Monaco-based config editor)
Supported providers:
- Mistral 🇫🇷 🇪🇺 — mistral-ocr-latest, mistral-ocr-2505
- AlphaEdge 🇫🇷 🇪🇺 — alpha-digit-max, alpha-digit-medium
Three ways to call OCR:
- Dedicated plugin — Cloud APIM - OCR backend exposes POST /ocr
- Unified API — the OpenAI Compatible API plugin now exposes POST /ocr (via the new ocr_model_refs), alongside chat, audio, image,
  embedding and moderation
- Workflow function — the new ocr_call function for agentic pipelines
Flexible input handling: remote URL, base64 data-uri, inline base64, raw byte array, or multipart file upload — over two transports (JSON
body Mistral-style, or multipart/form-data)
OCR through text models: OCR can also flow through a regular LLM provider — call /chat/completions with an image/PDF content part and get
the extracted text back as a standard chat completion (reuses existing OpenAI clients, model constraints, caching, budgets and observability)
Vault integration for API tokens, model constraints (allow/block lists), and max_size_upload now also applies to OCR uploads

AlphaEdge provider (new) — #175

New French/EU 🇫🇷 🇪🇺 provider specialized in speech transcription and OCR. Authentication uses the X-API-Key header.

Speech-to-Text (STT) — model alpha-audio-v1, with enable_diarization (speaker diarization) and enable_postcorrect (linguistic
post-correction: punctuation, capitalization, spelling, stuttering removal); both can be overridden per request
OCR — alpha-digit-max / alpha-digit-medium, usable either as a dedicated OCR Model or as a standard text/LLM provider
Optional pdf_password for protected PDFs, comma-separated token rotation, and vault references

Documentation

New OCR documentation section: introduction, providers, plugins, OCR-through-text-models, and the ocr_call workflow function
Updated Audio STT and OpenAI-Compatible API docs for the new endpoints and AlphaEdge

Release Infos

the documentation is available here
release is available here

Contributors

@mathieuancelin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.0.78

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Otoroshi LLM Extension v0.0.78

What's New

OCR Models (new entity type) — #176

AlphaEdge provider (new) — #175

Documentation

Release Infos

Contributors

Contributors

Uh oh!