Skip to content

0.0.78

Latest

Choose a tag to compare

@github-actions github-actions released this 02 Jun 14:00
· 41 commits to main since this release

Otoroshi LLM Extension v0.0.78

What's New

OCR Models (new entity type) — #176

OCR (Optical Character Recognition) is now a first-class entity type — the OCR Model — alongside Audio, Image, Embedding and Moderation
Models. It enables text extraction from images and PDF documents through a unified, Mistral-inspired API.

  • New OcrModel entity: dedicated datastore, in-memory state, and a new OCR models admin UI page (Monaco-based config editor)
  • Supported providers:
    • Mistral 🇫🇷 🇪🇺 — mistral-ocr-latest, mistral-ocr-2505
    • AlphaEdge 🇫🇷 🇪🇺 — alpha-digit-max, alpha-digit-medium
  • Three ways to call OCR:
    • Dedicated pluginCloud APIM - OCR backend exposes POST /ocr
    • Unified API — the OpenAI Compatible API plugin now exposes POST /ocr (via the new ocr_model_refs), alongside chat, audio, image,
      embedding and moderation
    • Workflow function — the new ocr_call function for agentic pipelines
  • Flexible input handling: remote URL, base64 data-uri, inline base64, raw byte array, or multipart file upload — over two transports (JSON
    body Mistral-style, or multipart/form-data)
  • OCR through text models: OCR can also flow through a regular LLM provider — call /chat/completions with an image/PDF content part and get
    the extracted text back as a standard chat completion (reuses existing OpenAI clients, model constraints, caching, budgets and observability)
  • Vault integration for API tokens, model constraints (allow/block lists), and max_size_upload now also applies to OCR uploads

AlphaEdge provider (new) — #175

New French/EU 🇫🇷 🇪🇺 provider specialized in speech transcription and OCR. Authentication uses the X-API-Key header.

  • Speech-to-Text (STT) — model alpha-audio-v1, with enable_diarization (speaker diarization) and enable_postcorrect (linguistic
    post-correction: punctuation, capitalization, spelling, stuttering removal); both can be overridden per request
  • OCRalpha-digit-max / alpha-digit-medium, usable either as a dedicated OCR Model or as a standard text/LLM provider
  • Optional pdf_password for protected PDFs, comma-separated token rotation, and vault references

Documentation

  • New OCR documentation section: introduction, providers, plugins, OCR-through-text-models, and the ocr_call workflow function
  • Updated Audio STT and OpenAI-Compatible API docs for the new endpoints and AlphaEdge

Release Infos

  • the documentation is available here
  • release is available here

Contributors