feat(chat): make per-stage token limits and temperature configurable via agents.yaml by DarkGenius · Pull Request #348 · HKUDS/DeepTutor

DarkGenius · 2026-04-20T06:26:29Z

Description

The agentic chat pipeline currently has six hardcoded max_tokens values
(thinking=1200, observing=1200, responding=1800, answer_now=1800,
acting=1500, react_fallback=800) and a hardcoded temperature=0.2 in
AgenticChatPipeline._completion_kwargs. These limits cannot be tuned without
editing source code, and the 1800-token cap on responding truncates long
final responses mid-sentence on real-world prompts (reproduced on a Russian
LaTeX-heavy "VC-dimension proof plan" question — output cut around the 1800-token
boundary, ~4500–4800 chars).

This PR lifts these knobs into data/user/settings/agents.yaml under a new
capabilities.chat block, mirroring the existing per-capability config pattern
(capabilities.solve, capabilities.research, etc.), but with per-stage
sub-sections to give granular control:

capabilities:
  chat:
    temperature: 0.2
    responding:     { max_tokens: 8000 }
    answer_now:     { max_tokens: 8000 }
    thinking:       { max_tokens: 2000 }
    observing:      { max_tokens: 2000 }
    acting:         { max_tokens: 2000 }
    react_fallback: { max_tokens: 1500 }

Defaults are bumped (responding/answer_now: 1800 → 8000) to fix the truncation
bug out of the box. _ChatLimits.from_config provides safe fallbacks if the
yaml block is missing entirely (legacy installs), and coerces malformed values
(strings, non-dict stage entries) back to defaults instead of crashing.

Implementation notes

New get_chat_params() and DEFAULT_CHAT_PARAMS in
deeptutor/services/config/loader.py. Kept separate from get_agent_params
because the chat capability has a nested per-stage shape, while
get_agent_params returns flat {temperature, max_tokens}.
DEFAULT_AGENTS_SETTINGS in deeptutor/services/setup/init.py extended so
fresh installs auto-generate the chat block in their agents.yaml.
AgenticChatPipeline.__init__ loads the config once into self._chat_limits
/ self._chat_temperature; the six callsites and _completion_kwargs read
from these instance attrs.

Related Issues

N/A (internal bug report — chat answer truncation on long responses)

Module(s) Affected

Checklist

I have read and followed the contribution guidelines.
My code follows the project's coding standards.
I have run pre-commit run --all-files and fixed any issues.
I have added relevant tests for my changes.
I have updated the documentation (if necessary).
My changes do not introduce any new security vulnerabilities.

Test plan

New tests/services/config/test_chat_params_config.py (10 tests):
- get_chat_params() returns defaults when agents.yaml is missing or the
  capabilities.chat block is absent.
- Partial overrides (single stage / temperature only) merge correctly with
  defaults.
- Full round-trip with all stages overridden.
- _ChatLimits.from_config handles empty dict, DEFAULT_CHAT_PARAMS,
  string-numbers, garbage strings, and non-dict stage values.
Existing related tests still pass: tests/agents/chat/,
tests/capabilities/test_answer_now.py, tests/services/config/
(70 passed locally; nothing depended on the old hardcoded literals).

Additional Notes

Backwards compatible: existing user agents.yaml files that don't have
the chat block keep working through DEFAULT_CHAT_PARAMS deep-merge in
get_chat_params(). Behaviour changes only because the defaults are
larger, which is the intended fix.
temperature is now a single value shared by all chat stages
(matches the existing single-knob pattern of other capabilities).

pancacake · 2026-04-20T15:53:01Z

Thanks for your contribution!

…via agents.yaml The agentic chat pipeline had six hardcoded `max_tokens` values (thinking/observing/responding/answer_now/acting/react_fallback) and a hardcoded `temperature=0.2`, which caused the final response to be truncated mid-sentence on long answers (e.g. ~1800-token cap on `responding`). Lift these to `capabilities.chat.<stage>.max_tokens` / `capabilities.chat.temperature` in `agents.yaml` so users can tune budgets without code changes. New defaults bump responding/answer_now to 8000 tokens to fix truncation out of the box, with safe fallbacks in `_ChatLimits.from_config` if the yaml block is missing.

…space) Output of `pre-commit run --from-ref upstream/dev --to-ref HEAD`: - ruff (legacy alias): removed unused imports `complete as llm_complete` in agentic_pipeline.py (left over after my refactor — no callsites); re-sorted imports in services/config/__init__.py. - ruff-format: re-wrapped a few long lines that crossed the 100-col budget after my changes (stream-message calls, tool-binding tuples, test-fixture dict literals); collapsed a single-element multi-line list back to one line. - trailing-whitespace: stripped pre-existing trailing spaces in two docstrings of services/setup/init.py that happened to be in the same file as my edits. No behavioural changes; tests still pass (70 passed in the same suite — chat/, capabilities/test_answer_now.py, services/config/). Note: the bandit hook fails with "Unknown test found in profile: B104". This is an upstream config/version mismatch in pyproject.toml (`tool.bandit.skips` lists B104, but bandit 1.8.0 no longer recognises that ID). It is independent of this PR.

Rebased onto dev (post-#348) and resolved two UI conflicts (`web/app/(workspace)/chat/[[...sessionId]]/page.tsx` drops the now-dead `replaySnapshot` helper; `web/components/chat/home/ChatMessages.tsx` switches the last-assistant action button from the old `RotateCcw`/Retry to `RefreshCcw`/Regenerate). Smoke tests green (199 passed) and all required checks pass. Merging: adds a `regenerate` capability wired through CLI (`/regenerate` · `/retry`), WebSocket (`type: "regenerate"`), and Web UI (per-message Regenerate button). Backend rolls back the trailing assistant via `SQLiteSessionStore.delete_message`, reuses `start_turn` through `_persist_user_message=False` / `_regenerate=True`, skips `memory_service.refresh_from_turn` on regeneration, and surfaces non-fatal `regenerate_busy` / `nothing_to_regenerate` errors. Also includes empty-response diagnostics in `_stage_responding`. 18 new tests.

- New `assets/releases/ver1-2-1.md` covering #348 (per-stage chat token limits), #349 (Regenerate across CLI/WS/Web UI), the regenerate UI harmony polish, and bug fixes #347 / #345 / #352. - README release-notes block updated to surface v1.2.1 above v1.2.0. Made-with: Cursor

DarkGenius added 2 commits April 21, 2026 00:07

pancacake force-pushed the feat/configurable-chat-token-limits branch from 90a511f to 7a2e4d1 Compare April 20, 2026 16:10

pancacake merged commit 444b12a into HKUDS:dev Apr 20, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(chat): make per-stage token limits and temperature configurable via agents.yaml#348

feat(chat): make per-stage token limits and temperature configurable via agents.yaml#348
pancacake merged 2 commits intoHKUDS:devfrom
DarkGenius:feat/configurable-chat-token-limits

DarkGenius commented Apr 20, 2026 •

edited

Loading

Uh oh!

pancacake commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DarkGenius commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Module(s) Affected

Checklist

Test plan

Additional Notes

Uh oh!

pancacake commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DarkGenius commented Apr 20, 2026 •

edited

Loading