Skip to content

feat(chat): make per-stage token limits and temperature configurable via agents.yaml#348

Merged
pancacake merged 2 commits intoHKUDS:devfrom
DarkGenius:feat/configurable-chat-token-limits
Apr 20, 2026
Merged

feat(chat): make per-stage token limits and temperature configurable via agents.yaml#348
pancacake merged 2 commits intoHKUDS:devfrom
DarkGenius:feat/configurable-chat-token-limits

Conversation

@DarkGenius
Copy link
Copy Markdown
Contributor

@DarkGenius DarkGenius commented Apr 20, 2026

Description

The agentic chat pipeline currently has six hardcoded max_tokens values
(thinking=1200, observing=1200, responding=1800, answer_now=1800,
acting=1500, react_fallback=800) and a hardcoded temperature=0.2 in
AgenticChatPipeline._completion_kwargs. These limits cannot be tuned without
editing source code, and the 1800-token cap on responding truncates long
final responses mid-sentence on real-world prompts (reproduced on a Russian
LaTeX-heavy "VC-dimension proof plan" question — output cut around the 1800-token
boundary, ~4500–4800 chars).

This PR lifts these knobs into data/user/settings/agents.yaml under a new
capabilities.chat block, mirroring the existing per-capability config pattern
(capabilities.solve, capabilities.research, etc.), but with per-stage
sub-sections to give granular control:

capabilities:
  chat:
    temperature: 0.2
    responding:     { max_tokens: 8000 }
    answer_now:     { max_tokens: 8000 }
    thinking:       { max_tokens: 2000 }
    observing:      { max_tokens: 2000 }
    acting:         { max_tokens: 2000 }
    react_fallback: { max_tokens: 1500 }

Defaults are bumped (responding/answer_now: 1800 → 8000) to fix the truncation
bug out of the box. _ChatLimits.from_config provides safe fallbacks if the
yaml block is missing entirely (legacy installs), and coerces malformed values
(strings, non-dict stage entries) back to defaults instead of crashing.

Implementation notes

  • New get_chat_params() and DEFAULT_CHAT_PARAMS in
    deeptutor/services/config/loader.py. Kept separate from get_agent_params
    because the chat capability has a nested per-stage shape, while
    get_agent_params returns flat {temperature, max_tokens}.
  • DEFAULT_AGENTS_SETTINGS in deeptutor/services/setup/init.py extended so
    fresh installs auto-generate the chat block in their agents.yaml.
  • AgenticChatPipeline.__init__ loads the config once into self._chat_limits
    / self._chat_temperature; the six callsites and _completion_kwargs read
    from these instance attrs.

Related Issues

  • N/A (internal bug report — chat answer truncation on long responses)

Module(s) Affected

  • agents (chat pipeline)
  • api
  • config (loader + setup defaults)
  • core
  • knowledge
  • logging
  • services
  • tools
  • utils
  • web (Frontend)
  • docs (Documentation)
  • scripts
  • tests
  • Other: ...

Checklist

  • I have read and followed the contribution guidelines.
  • My code follows the project's coding standards.
  • I have run pre-commit run --all-files and fixed any issues.
  • I have added relevant tests for my changes.
  • I have updated the documentation (if necessary).
  • My changes do not introduce any new security vulnerabilities.

Test plan

  • New tests/services/config/test_chat_params_config.py (10 tests):
    • get_chat_params() returns defaults when agents.yaml is missing or the
      capabilities.chat block is absent.
    • Partial overrides (single stage / temperature only) merge correctly with
      defaults.
    • Full round-trip with all stages overridden.
    • _ChatLimits.from_config handles empty dict, DEFAULT_CHAT_PARAMS,
      string-numbers, garbage strings, and non-dict stage values.
  • Existing related tests still pass: tests/agents/chat/,
    tests/capabilities/test_answer_now.py, tests/services/config/
    (70 passed locally; nothing depended on the old hardcoded literals).

Additional Notes

  • Backwards compatible: existing user agents.yaml files that don't have
    the chat block keep working through DEFAULT_CHAT_PARAMS deep-merge in
    get_chat_params(). Behaviour changes only because the defaults are
    larger, which is the intended fix.
  • temperature is now a single value shared by all chat stages
    (matches the existing single-knob pattern of other capabilities).

@pancacake
Copy link
Copy Markdown
Collaborator

Thanks for your contribution!

…via agents.yaml

The agentic chat pipeline had six hardcoded `max_tokens` values
(thinking/observing/responding/answer_now/acting/react_fallback) and a
hardcoded `temperature=0.2`, which caused the final response to be
truncated mid-sentence on long answers (e.g. ~1800-token cap on
`responding`). Lift these to `capabilities.chat.<stage>.max_tokens` /
`capabilities.chat.temperature` in `agents.yaml` so users can tune
budgets without code changes. New defaults bump responding/answer_now
to 8000 tokens to fix truncation out of the box, with safe fallbacks
in `_ChatLimits.from_config` if the yaml block is missing.
…space)

Output of `pre-commit run --from-ref upstream/dev --to-ref HEAD`:

- ruff (legacy alias): removed unused imports `complete as llm_complete`
  in agentic_pipeline.py (left over after my refactor — no callsites);
  re-sorted imports in services/config/__init__.py.
- ruff-format: re-wrapped a few long lines that crossed the 100-col
  budget after my changes (stream-message calls, tool-binding tuples,
  test-fixture dict literals); collapsed a single-element multi-line
  list back to one line.
- trailing-whitespace: stripped pre-existing trailing spaces in two
  docstrings of services/setup/init.py that happened to be in the same
  file as my edits.

No behavioural changes; tests still pass (70 passed in the same
suite — chat/, capabilities/test_answer_now.py, services/config/).

Note: the bandit hook fails with "Unknown test found in profile: B104".
This is an upstream config/version mismatch in pyproject.toml
(`tool.bandit.skips` lists B104, but bandit 1.8.0 no longer recognises
that ID). It is independent of this PR.
@pancacake pancacake force-pushed the feat/configurable-chat-token-limits branch from 90a511f to 7a2e4d1 Compare April 20, 2026 16:10
@pancacake pancacake merged commit 444b12a into HKUDS:dev Apr 20, 2026
4 checks passed
pancacake pushed a commit that referenced this pull request Apr 20, 2026
Rebased onto dev (post-#348) and resolved two UI conflicts (`web/app/(workspace)/chat/[[...sessionId]]/page.tsx` drops the now-dead `replaySnapshot` helper; `web/components/chat/home/ChatMessages.tsx` switches the last-assistant action button from the old `RotateCcw`/Retry to `RefreshCcw`/Regenerate). Smoke tests green (199 passed) and all required checks pass.

Merging: adds a `regenerate` capability wired through CLI (`/regenerate` · `/retry`), WebSocket (`type: "regenerate"`), and Web UI (per-message Regenerate button). Backend rolls back the trailing assistant via `SQLiteSessionStore.delete_message`, reuses `start_turn` through `_persist_user_message=False` / `_regenerate=True`, skips `memory_service.refresh_from_turn` on regeneration, and surfaces non-fatal `regenerate_busy` / `nothing_to_regenerate` errors. Also includes empty-response diagnostics in `_stage_responding`. 18 new tests.
pancacake added a commit that referenced this pull request Apr 20, 2026
- New `assets/releases/ver1-2-1.md` covering #348 (per-stage chat token
  limits), #349 (Regenerate across CLI/WS/Web UI), the regenerate UI
  harmony polish, and bug fixes #347 / #345 / #352.
- README release-notes block updated to surface v1.2.1 above v1.2.0.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants