Skip to content

fix: widen Kimi completion budget#17

Merged
wbxl2000 merged 1 commit into
mainfrom
fix-completion-budget-remaining
May 25, 2026
Merged

fix: widen Kimi completion budget#17
wbxl2000 merged 1 commit into
mainfrom
fix-completion-budget-remaining

Conversation

@wbxl2000
Copy link
Copy Markdown
Collaborator

@wbxl2000 wbxl2000 commented May 25, 2026

Background

Kimi reasoning models use max_completion_tokens for both reasoning_content and the final content. The previous default path resolved a single desired budget and then applied min(desired, remaining). Because the default desired value was 32000, ordinary turns still sent max_completion_tokens: 32000 even when the model had much more output room left in the context window.

That can trigger a real failure mode: thinking may consume the entire 32k budget, and the backend can return HTTP 200 with thinking content but no final summary/content. The empty-summary compaction guard is handled in a separate PR; this PR fixes the upstream default budget that makes that failure more likely.

Changes

  • Split completion budget semantics into configuration-level and request-level values:
    • CompletionBudgetConfig.hardCap is an explicit user-configured maximum.
    • CompletionBudgetConfig.fallback is used only when the model context window is unknown.
  • Change default cap calculation:
    • When max_context_tokens is known and no hard cap is configured, use the safe remaining window: max_context_tokens - estimated_input - safety_margin.
    • When a hard cap is configured, use min(hardCap, remaining).
    • When the context window is unknown, fall back to loop_control.reserved_context_size, then 32000.
  • Preserve environment variable behavior:
    • KIMI_MODEL_MAX_COMPLETION_TOKENS takes priority over legacy KIMI_MODEL_MAX_TOKENS.
    • Positive integers are explicit hard caps.
    • 0 or negative values disable client-side clamping entirely.
  • Rename ordinary-turn plumbing to completionBudgetConfig so the configuration object is not confused with the final cap sent to the backend.
  • Update English and Chinese environment variable docs and add a changeset.

Behavior Impact

By default, Kimi ordinary turns are no longer capped at 32k when the model context window is known. Instead, they use the safe remaining context window. This reduces the chance that a reasoning model spends the entire output budget on thinking and returns no final content.

Callers that want the old 32k behavior can set:

KIMI_MODEL_MAX_COMPLETION_TOKENS=32000

Callers that want to leave completion-token handling entirely to the backend can set:

KIMI_MODEL_MAX_COMPLETION_TOKENS=0

Verification

  • pnpm vitest run packages/agent-core/test/utils/completion-budget.test.ts packages/agent-core/test/agent/kosong-llm.test.ts
  • pnpm run typecheck
  • pnpm --dir docs run build
  • git diff --check

Copy link
Copy Markdown
Collaborator

@7Sageer 7Sageer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — the refactor is clean (splitting desired into hardCap/fallback) and unit coverage looks good.

One non-blocking item to confirm before merge: the default path now sends max_completion_tokens ≈ remaining context (~255k on 256k-context models, vs. the old 32k). This is safe only if the Kimi backend has no separate per-request output cap below the context window and does not pessimistically reserve scheduling budget by max_completion_tokens — worth a quick check with the backend owner.

Minor follow-up: the same helper is also used by compaction (compaction/full.ts:453-461), which this PR doesn't touch; its inline comment now describes the old "clamp to reserved size" behavior and should be realigned.

@wbxl2000 wbxl2000 merged commit bfbd522 into main May 25, 2026
6 checks passed
@wbxl2000 wbxl2000 deleted the fix-completion-budget-remaining branch May 25, 2026 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants