[Feature] Unify token-budget fallback across channel LLM entry points

## What problem does this solve?

Follow-up to #572: token-budget fallback is currently being addressed in the WebSocket main chat path and the Feishu path, but Clawith does not have one shared model invocation entry point for every channel.

Current code paths show several independent ingress lanes:

- WebSocket main chat: `backend/app/api/websocket.py` builds conversation context and calls `call_llm_with_failover(...)`.
- Feishu: `backend/app/api/feishu.py` loads channel history and routes through `_call_agent_llm(...)`, including primary/fallback model handling.
- WeCom / WeChat / Discord and similar channels load history in their own service files before calling `_call_agent_llm(...)` or another LLM helper.
- Background service paths such as heartbeat, task execution, reporting, supervision, and agent-to-agent/service flows may call `call_llm(...)` or related helpers directly.

Because these paths assemble history and invoke models independently, a fix in one path can leave other channels exposed to the same oversized-history/token-budget failure. The behavior may diverge by channel even when the agent, model, and conversation history are otherwise the same.

Concrete examples from the current codebase:

- `backend/app/api/websocket.py` truncates conversation context before `call_llm_with_failover(...)`.
- `backend/app/api/feishu.py` normalizes and truncates history inside `_call_agent_llm(...)`.
- `backend/app/services/wecom_stream.py`, `backend/app/services/wechat_channel.py`, and `backend/app/services/discord_gateway.py` each load recent `ChatMessage` history before invoking the agent LLM path.
- Other service-level callers under `backend/app/services/` use `call_llm(...)`/agent LLM helpers for non-channel work.

## Proposed solution

Discuss and design a shared abstraction for model-call context preparation, rather than patching each channel independently.

The shared layer should probably own:

- history normalization before replaying persisted chat messages into the LLM;
- token-budget-aware context trimming using the target model's context window;
- consistent primary/fallback model behavior for retryable failures;
- consistent handling of tool-call/tool-result message pairs, so trimming does not break provider protocol requirements;
- channel metadata only as input parameters, not duplicated call logic;
- a way for background/task/agent-to-agent calls to opt into the same budget governance without pretending they are user chat channels.

This does not necessarily mean every channel must use a single monolithic function. The discussion should decide whether the right shape is:

- a reusable `prepare_llm_messages(...)` helper used by all entry points;
- a shared channel invocation service that wraps history loading + model resolution + fallback;
- or a smaller token-budget guard inserted immediately before any `call_llm(...)` / `_call_agent_llm(...)` / `call_llm_with_failover(...)` call.

## Acceptance criteria for the discussion

- Inventory all current LLM invocation entry points, including WebSocket, Feishu, WeCom, WeChat, Teams/Slack/Discord-style channels, heartbeat/task/reporting, and agent-to-agent flows.
- Decide which paths must share token-budget fallback semantics and which paths need special behavior.
- Define the shared API boundary and ownership so future channel additions do not need to rediscover the same trimming/fallback rules.
- Add regression coverage around at least one non-WebSocket channel with oversized history once the abstraction is implemented.

## Willing to contribute?

- [ ] I'd be interested in working on this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Unify token-budget fallback across channel LLM entry points #576

What problem does this solve?

Proposed solution

Acceptance criteria for the discussion

Willing to contribute?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature] Unify token-budget fallback across channel LLM entry points #576

Description

What problem does this solve?

Proposed solution

Acceptance criteria for the discussion

Willing to contribute?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions