Otoroshi LLM Extension v0.0.79

This release introduces web search as a first-class capability — a new Search Engine entity usable as an LLM tool, an HTTP API and a
workflow function — together with a new AI Router, OpenRouter multi-modal support, per-provider circuit breaking, LLM
response headers, and mock responses.

What's New

Search Engines (new entity type) — [#177]

Web search is now a first-class entity type — the Search Engine — alongside Audio, Image, OCR, Embedding and Moderation Models.
Each Search Engine wraps a provider behind a single search(query) operation and returns a normalized result ({ provider, query, answer?, results: [{ title, url, snippet, score?, published_date? }] }), so the same prompt/code works whatever the underlying engine.
Every provider is implemented in pure HTTP (no new dependency).

7 providers: Staan.ai (Qwant) 🇫🇷 🇪🇺 — sovereign European search, Tavily, Brave Search, SearXNG (self-hosted), Google Custom Search, SearchApi, and DuckDuckGo.
Use it as an LLM tool — reference Search Engines on an LLM provider or an AI Agent (search_engines) and the model gets a
search tool it can call autonomously to fetch fresh, sourced web results. Works on every tool-capable provider (OpenAI, Azure OpenAI,
Anthropic, Mistral, Groq, Cohere, Ollama, xAI), through the same tool loop as WASM and MCP tools.
Use it as an HTTP API — the Cloud APIM - Search engine backend plugin exposes a POST search route.
Use it in workflows — the new search_engine_search workflow function.
Supports max_results, market/locale, and domain include/exclude filters; tokens support vault references and comma-separated
rotation.
Supports max_results, market/locale, and domain include/exclude filters; tokens support vault references and comma-separated rotation.

AI Router (auto / code / fusion routing)

The otoroshi provider can now act as a router: like the load balancer it references other providers, but instead of round-robin it routes
each request to the best candidate, cascading to the next-best on failure. It exposes three models:

code-router — picks the cheapest candidate above a quality floor (min_coding_score), using a curated Artificial Analysis coding
index for quality and the litellm price catalog for cost. Candidates: code_router_refs.
auto-router — prompt-aware per-request routing: a judge LLM reads the prompt and the candidate list (quality + cost) and picks the
best-suited model, honoring a cost_quality_tradeoff (0–10). Candidates: auto_router_refs, judge: auto_router_classifier_ref.
fusion-router — queries several candidates and synthesizes their answers into a single consolidated response.
allowed_models wildcard filtering (e.g. anthropic/*, openai/gpt-5*), settable on the provider or per request.

OpenRouter as a multi-modal provider

OpenRouter can now be used beyond chat — as an audio (TTS & STT), image (generation & editing), and video provider. Image
generation/editing go through /chat/completions with modalities, returning base64 data-URL images.

Per-provider circuit breaker (cooldown)

Opt-in circuit breaking for provider fallback and load balancing. Configure it on a provider with circuit_breaker: { enabled, consecutive_failures, cooldown }: after a run of consecutive failures the circuit opens for the cooldown window (the provider is skipped /
requests route elsewhere), then half-opens to probe recovery. Disabled by default, so existing providers are unaffected (per-node,
in-memory).

Load balancing best_response_time now uses a decaying p95 window for more stable routing.

LLM response headers

New Cloud APIM - LLM response headers plugin exposes the LLM call metadata — model, provider, token usage, latency and cost — as
x-otoroshi-llm-* response headers, so clients, dashboards and logs can read them without parsing the response body (non-streaming
responses). Inspired by litellm's x-litellm-*. Additional cost headers are also surfaced through the unified OpenAI-compatible plugin.

Mock responses

New mock-response decorator: send a mock_response field in the request body to short-circuit the call and return a canned answer while
still exercising the whole pipeline (guardrails, cache, budgets, observability). Prefix the value with Exception: to simulate an error and test
provider fallbacks. Inspired by litellm's mock_response.

Budgets

Added reset buttons for budgets directly from the dashboard.

Improvements & fixes

Workflow functions accept more input formats.
Clearer error messages across providers.
Drop unsupported sampling params for Opus 4.7.
Fix image-generation response content type.
Fix embedding editor size and minor dashboard UI tweaks.

Release Infos

the documentation is available here
release is available here

Contributors

@mathieuancelin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.0.79

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Otoroshi LLM Extension v0.0.79

What's New

Search Engines (new entity type) — [#177]

AI Router (auto / code / fusion routing)

OpenRouter as a multi-modal provider

Per-provider circuit breaker (cooldown)

LLM response headers

Mock responses

Budgets

Improvements & fixes

Release Infos

Contributors

Contributors

Uh oh!