fix: use max output size for completion budget#318
Conversation
🦋 Changeset detectedLatest commit: 7a746a8 The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
@kermanx could you take a look at this PR when you have time? This fixes the The PR is focused on using |
|
the problem has been solved in v0.10.1, so i will close this pr. |
|
@qkunio the bug is not fully resolved and it still exists during compaction, please see my comment in the other issue. |
Related Issue
Resolve #306
Problem
According to the OpenAI Chat Completions API reference,
max_tokensis “the maximum number of tokens that can be generated in the chat completion” (source). Kimi Code currently derives this provider request field frommax_context_sizeinstead of the configuredmax_output_size, which can cause OpenAI-compatible providers to reject requests with 400 errors when the context window exceeds the provider's output token limit.What changed
This change preserves each model alias's configured output limit and uses it as the completion token cap for runtime requests. The model context size still describes the context window, while
max_output_sizenow controls the output budget that is ultimately sent through provider-specificmax_tokenshandling.This prevents OpenAI-compatible providers from using
max_context_sizeas the request output limit whenmax_output_sizeis available.Checklist
gen-changesetsskill, or this PR needs no changeset.gen-docsskill, or this PR needs no doc update.