fix: widen Kimi completion budget#17
Conversation
7Sageer
left a comment
There was a problem hiding this comment.
LGTM — the refactor is clean (splitting desired into hardCap/fallback) and unit coverage looks good.
One non-blocking item to confirm before merge: the default path now sends max_completion_tokens ≈ remaining context (~255k on 256k-context models, vs. the old 32k). This is safe only if the Kimi backend has no separate per-request output cap below the context window and does not pessimistically reserve scheduling budget by max_completion_tokens — worth a quick check with the backend owner.
Minor follow-up: the same helper is also used by compaction (compaction/full.ts:453-461), which this PR doesn't touch; its inline comment now describes the old "clamp to reserved size" behavior and should be realigned.
Background
Kimi reasoning models use
max_completion_tokensfor bothreasoning_contentand the finalcontent. The previous default path resolved a singledesiredbudget and then appliedmin(desired, remaining). Because the defaultdesiredvalue was 32000, ordinary turns still sentmax_completion_tokens: 32000even when the model had much more output room left in the context window.That can trigger a real failure mode: thinking may consume the entire 32k budget, and the backend can return HTTP 200 with thinking content but no final summary/content. The empty-summary compaction guard is handled in a separate PR; this PR fixes the upstream default budget that makes that failure more likely.
Changes
CompletionBudgetConfig.hardCapis an explicit user-configured maximum.CompletionBudgetConfig.fallbackis used only when the model context window is unknown.max_context_tokensis known and no hard cap is configured, use the safe remaining window:max_context_tokens - estimated_input - safety_margin.min(hardCap, remaining).loop_control.reserved_context_size, then 32000.KIMI_MODEL_MAX_COMPLETION_TOKENStakes priority over legacyKIMI_MODEL_MAX_TOKENS.0or negative values disable client-side clamping entirely.completionBudgetConfigso the configuration object is not confused with the final cap sent to the backend.Behavior Impact
By default, Kimi ordinary turns are no longer capped at 32k when the model context window is known. Instead, they use the safe remaining context window. This reduces the chance that a reasoning model spends the entire output budget on thinking and returns no final content.
Callers that want the old 32k behavior can set:
Callers that want to leave completion-token handling entirely to the backend can set:
Verification
pnpm vitest run packages/agent-core/test/utils/completion-budget.test.ts packages/agent-core/test/agent/kosong-llm.test.tspnpm run typecheckpnpm --dir docs run buildgit diff --check