-
Notifications
You must be signed in to change notification settings - Fork 297
Description
Summary
The platform defines both primary_model_id and fallback_model_id, but failover behavior is inconsistent across chat/channel/background paths. Some paths only do config-level fallback, some attempt runtime fallback, and some ignore fallback entirely.
This issue proposes one shared failover policy and one shared executor to make behavior consistent and observable.
Current Behavior
- Model schema supports primary + fallback
primary_model_id: uuid.UUID | None = None
fallback_model_id: uuid.UUID | None = None- Web chat does primary-first + config-level fallback
if agent.primary_model_id:
... load primary ...
if agent.fallback_model_id:
... load fallback ...
if not llm_model and fallback_llm_model:
llm_model = fallback_llm_model- Runtime fallback exists, but lower layer often returns error text instead of raising
## call_llm catches and returns string
except LLMError as e:
return f"[LLM Error] {e}"
except Exception as e:
return f"[LLM call error] ..."This weakens outer except-driven fallback logic.
- Slack/Teams/Discord/WeCom/DingTalk reuse Feishu
_call_agent_llm
These paths share the same failover characteristics as Feishu.
- Background services mostly do one-shot selection (primary or fallback)
model_id = agent.primary_model_id or agent.fallback_model_idNo runtime failover retry after selected model fails.
- Some paths are primary-only
Files:backend/app/services/trigger_daemon.py,backend/app/api/gateway.py,backend/app/services/agent_manager.py
Primary is used directly; fallback is not part of the path.
Proposed Solution
A) Add one shared failover executor
Create llm_failover and route all LLM entrypoints through it.
e.g.
async def invoke_with_failover(primary_model, fallback_model, invoke_once, context):
...B) Unified switching rules
- Try primary if available.
- If primary missing/unavailable, use fallback directly.
- If primary fails with retryable error, retry once on fallback.
- If error is non-retryable (auth/validation/schema), do not switch.
- Max attempts per request: 2 (primary + fallback).
C) Retryable error scope (To be discussed)
- Network timeout / connection errors
- Provider 429
- Provider 5xx
- Explicit transient provider errors