Otoroshi LLM Extension v0.0.79
This release introduces web search as a first-class capability — a new Search Engine entity usable as an LLM tool, an HTTP API and a
workflow function — together with a new AI Router, OpenRouter multi-modal support, per-provider circuit breaking, LLM
response headers, and mock responses.
What's New
Search Engines (new entity type) — [#177]
Web search is now a first-class entity type — the Search Engine — alongside Audio, Image, OCR, Embedding and Moderation Models.
Each Search Engine wraps a provider behind a single search(query) operation and returns a normalized result ({ provider, query, answer?, results: [{ title, url, snippet, score?, published_date? }] }), so the same prompt/code works whatever the underlying engine.
Every provider is implemented in pure HTTP (no new dependency).
-
7 providers: Staan.ai (Qwant) 🇫🇷 🇪🇺 — sovereign European search,
Tavily,Brave Search,SearXNG(self-hosted),Google Custom Search,SearchApi, andDuckDuckGo. -
Use it as an LLM tool — reference Search Engines on an LLM provider or an AI Agent (
search_engines) and the model gets a
searchtool it can call autonomously to fetch fresh, sourced web results. Works on every tool-capable provider (OpenAI, Azure OpenAI,
Anthropic, Mistral, Groq, Cohere, Ollama, xAI), through the same tool loop as WASM and MCP tools. -
Use it as an HTTP API — the
Cloud APIM - Search engine backendplugin exposes aPOSTsearch route. -
Use it in workflows — the new
search_engine_searchworkflow function. -
Supports
max_results,market/locale, and domaininclude/excludefilters; tokens support vault references and comma-separated
rotation. -
Supports
max_results,market/locale, and domaininclude/excludefilters; tokens support vault references and comma-separated rotation.
AI Router (auto / code / fusion routing)
The otoroshi provider can now act as a router: like the load balancer it references other providers, but instead of round-robin it routes
each request to the best candidate, cascading to the next-best on failure. It exposes three models:
code-router— picks the cheapest candidate above a quality floor (min_coding_score), using a curated Artificial Analysis coding
index for quality and the litellm price catalog for cost. Candidates:code_router_refs.auto-router— prompt-aware per-request routing: a judge LLM reads the prompt and the candidate list (quality + cost) and picks the
best-suited model, honoring acost_quality_tradeoff(0–10). Candidates:auto_router_refs, judge:auto_router_classifier_ref.fusion-router— queries several candidates and synthesizes their answers into a single consolidated response.allowed_modelswildcard filtering (e.g.anthropic/*,openai/gpt-5*), settable on the provider or per request.
OpenRouter as a multi-modal provider
OpenRouter can now be used beyond chat — as an audio (TTS & STT), image (generation & editing), and video provider. Image
generation/editing go through /chat/completions with modalities, returning base64 data-URL images.
Per-provider circuit breaker (cooldown)
Opt-in circuit breaking for provider fallback and load balancing. Configure it on a provider with circuit_breaker: { enabled, consecutive_failures, cooldown }: after a run of consecutive failures the circuit opens for the cooldown window (the provider is skipped /
requests route elsewhere), then half-opens to probe recovery. Disabled by default, so existing providers are unaffected (per-node,
in-memory).
- Load balancing
best_response_timenow uses a decaying p95 window for more stable routing.
LLM response headers
New Cloud APIM - LLM response headers plugin exposes the LLM call metadata — model, provider, token usage, latency and cost — as
x-otoroshi-llm-* response headers, so clients, dashboards and logs can read them without parsing the response body (non-streaming
responses). Inspired by litellm's x-litellm-*. Additional cost headers are also surfaced through the unified OpenAI-compatible plugin.
Mock responses
New mock-response decorator: send a mock_response field in the request body to short-circuit the call and return a canned answer while
still exercising the whole pipeline (guardrails, cache, budgets, observability). Prefix the value with Exception: to simulate an error and test
provider fallbacks. Inspired by litellm's mock_response.
Budgets
- Added reset buttons for budgets directly from the dashboard.
Improvements & fixes
- Workflow functions accept more input formats.
- Clearer error messages across providers.
- Drop unsupported sampling params for Opus 4.7.
- Fix image-generation response content type.
- Fix embedding editor size and minor dashboard UI tweaks.