Skip to content

0.0.79

Latest

Choose a tag to compare

@github-actions github-actions released this 16 Jun 10:55
· 6 commits to main since this release

Otoroshi LLM Extension v0.0.79

This release introduces web search as a first-class capability — a new Search Engine entity usable as an LLM tool, an HTTP API and a
workflow function — together with a new AI Router, OpenRouter multi-modal support, per-provider circuit breaking, LLM
response headers
, and mock responses.

What's New

Search Engines (new entity type) — [#177]

Web search is now a first-class entity type — the Search Engine — alongside Audio, Image, OCR, Embedding and Moderation Models.
Each Search Engine wraps a provider behind a single search(query) operation and returns a normalized result ({ provider, query, answer?, results: [{ title, url, snippet, score?, published_date? }] }), so the same prompt/code works whatever the underlying engine.
Every provider is implemented in pure HTTP (no new dependency).

  • 7 providers: Staan.ai (Qwant) 🇫🇷 🇪🇺 — sovereign European search, Tavily, Brave Search, SearXNG (self-hosted), Google Custom Search, SearchApi, and DuckDuckGo.

  • Use it as an LLM tool — reference Search Engines on an LLM provider or an AI Agent (search_engines) and the model gets a
    search tool it can call autonomously to fetch fresh, sourced web results. Works on every tool-capable provider (OpenAI, Azure OpenAI,
    Anthropic, Mistral, Groq, Cohere, Ollama, xAI), through the same tool loop as WASM and MCP tools.

  • Use it as an HTTP API — the Cloud APIM - Search engine backend plugin exposes a POST search route.

  • Use it in workflows — the new search_engine_search workflow function.

  • Supports max_results, market/locale, and domain include/exclude filters; tokens support vault references and comma-separated
    rotation.

  • Supports max_results, market/locale, and domain include/exclude filters; tokens support vault references and comma-separated rotation.

AI Router (auto / code / fusion routing)

The otoroshi provider can now act as a router: like the load balancer it references other providers, but instead of round-robin it routes
each request to the best candidate, cascading to the next-best on failure. It exposes three models:

  • code-router — picks the cheapest candidate above a quality floor (min_coding_score), using a curated Artificial Analysis coding
    index for quality and the litellm price catalog for cost. Candidates: code_router_refs.
  • auto-routerprompt-aware per-request routing: a judge LLM reads the prompt and the candidate list (quality + cost) and picks the
    best-suited model, honoring a cost_quality_tradeoff (0–10). Candidates: auto_router_refs, judge: auto_router_classifier_ref.
  • fusion-router — queries several candidates and synthesizes their answers into a single consolidated response.
  • allowed_models wildcard filtering (e.g. anthropic/*, openai/gpt-5*), settable on the provider or per request.

OpenRouter as a multi-modal provider

OpenRouter can now be used beyond chat — as an audio (TTS & STT), image (generation & editing), and video provider. Image
generation/editing go through /chat/completions with modalities, returning base64 data-URL images.

Per-provider circuit breaker (cooldown)

Opt-in circuit breaking for provider fallback and load balancing. Configure it on a provider with circuit_breaker: { enabled, consecutive_failures, cooldown }: after a run of consecutive failures the circuit opens for the cooldown window (the provider is skipped /
requests route elsewhere), then half-opens to probe recovery. Disabled by default, so existing providers are unaffected (per-node,
in-memory).

  • Load balancing best_response_time now uses a decaying p95 window for more stable routing.

LLM response headers

New Cloud APIM - LLM response headers plugin exposes the LLM call metadata — model, provider, token usage, latency and cost — as
x-otoroshi-llm-* response headers, so clients, dashboards and logs can read them without parsing the response body (non-streaming
responses). Inspired by litellm's x-litellm-*. Additional cost headers are also surfaced through the unified OpenAI-compatible plugin.

Mock responses

New mock-response decorator: send a mock_response field in the request body to short-circuit the call and return a canned answer while
still exercising the whole pipeline (guardrails, cache, budgets, observability). Prefix the value with Exception: to simulate an error and test
provider fallbacks. Inspired by litellm's mock_response.

Budgets

  • Added reset buttons for budgets directly from the dashboard.

Improvements & fixes

  • Workflow functions accept more input formats.
  • Clearer error messages across providers.
  • Drop unsupported sampling params for Opus 4.7.
  • Fix image-generation response content type.
  • Fix embedding editor size and minor dashboard UI tweaks.

Release Infos

  • the documentation is available here
  • release is available here

Contributors