What problem does this solve?
Follow-up to #572: token-budget fallback is currently being addressed in the WebSocket main chat path and the Feishu path, but Clawith does not have one shared model invocation entry point for every channel.
Current code paths show several independent ingress lanes:
- WebSocket main chat:
backend/app/api/websocket.py builds conversation context and calls call_llm_with_failover(...).
- Feishu:
backend/app/api/feishu.py loads channel history and routes through _call_agent_llm(...), including primary/fallback model handling.
- WeCom / WeChat / Discord and similar channels load history in their own service files before calling
_call_agent_llm(...) or another LLM helper.
- Background service paths such as heartbeat, task execution, reporting, supervision, and agent-to-agent/service flows may call
call_llm(...) or related helpers directly.
Because these paths assemble history and invoke models independently, a fix in one path can leave other channels exposed to the same oversized-history/token-budget failure. The behavior may diverge by channel even when the agent, model, and conversation history are otherwise the same.
Concrete examples from the current codebase:
backend/app/api/websocket.py truncates conversation context before call_llm_with_failover(...).
backend/app/api/feishu.py normalizes and truncates history inside _call_agent_llm(...).
backend/app/services/wecom_stream.py, backend/app/services/wechat_channel.py, and backend/app/services/discord_gateway.py each load recent ChatMessage history before invoking the agent LLM path.
- Other service-level callers under
backend/app/services/ use call_llm(...)/agent LLM helpers for non-channel work.
Proposed solution
Discuss and design a shared abstraction for model-call context preparation, rather than patching each channel independently.
The shared layer should probably own:
- history normalization before replaying persisted chat messages into the LLM;
- token-budget-aware context trimming using the target model's context window;
- consistent primary/fallback model behavior for retryable failures;
- consistent handling of tool-call/tool-result message pairs, so trimming does not break provider protocol requirements;
- channel metadata only as input parameters, not duplicated call logic;
- a way for background/task/agent-to-agent calls to opt into the same budget governance without pretending they are user chat channels.
This does not necessarily mean every channel must use a single monolithic function. The discussion should decide whether the right shape is:
- a reusable
prepare_llm_messages(...) helper used by all entry points;
- a shared channel invocation service that wraps history loading + model resolution + fallback;
- or a smaller token-budget guard inserted immediately before any
call_llm(...) / _call_agent_llm(...) / call_llm_with_failover(...) call.
Acceptance criteria for the discussion
- Inventory all current LLM invocation entry points, including WebSocket, Feishu, WeCom, WeChat, Teams/Slack/Discord-style channels, heartbeat/task/reporting, and agent-to-agent flows.
- Decide which paths must share token-budget fallback semantics and which paths need special behavior.
- Define the shared API boundary and ownership so future channel additions do not need to rediscover the same trimming/fallback rules.
- Add regression coverage around at least one non-WebSocket channel with oversized history once the abstraction is implemented.
Willing to contribute?
What problem does this solve?
Follow-up to #572: token-budget fallback is currently being addressed in the WebSocket main chat path and the Feishu path, but Clawith does not have one shared model invocation entry point for every channel.
Current code paths show several independent ingress lanes:
backend/app/api/websocket.pybuilds conversation context and callscall_llm_with_failover(...).backend/app/api/feishu.pyloads channel history and routes through_call_agent_llm(...), including primary/fallback model handling._call_agent_llm(...)or another LLM helper.call_llm(...)or related helpers directly.Because these paths assemble history and invoke models independently, a fix in one path can leave other channels exposed to the same oversized-history/token-budget failure. The behavior may diverge by channel even when the agent, model, and conversation history are otherwise the same.
Concrete examples from the current codebase:
backend/app/api/websocket.pytruncates conversation context beforecall_llm_with_failover(...).backend/app/api/feishu.pynormalizes and truncates history inside_call_agent_llm(...).backend/app/services/wecom_stream.py,backend/app/services/wechat_channel.py, andbackend/app/services/discord_gateway.pyeach load recentChatMessagehistory before invoking the agent LLM path.backend/app/services/usecall_llm(...)/agent LLM helpers for non-channel work.Proposed solution
Discuss and design a shared abstraction for model-call context preparation, rather than patching each channel independently.
The shared layer should probably own:
This does not necessarily mean every channel must use a single monolithic function. The discussion should decide whether the right shape is:
prepare_llm_messages(...)helper used by all entry points;call_llm(...)/_call_agent_llm(...)/call_llm_with_failover(...)call.Acceptance criteria for the discussion
Willing to contribute?