feat(zai): expose configurable max output tokens for GLM models (#161)#274
feat(zai): expose configurable max output tokens for GLM models (#161)#274proyectoauraorg wants to merge 3 commits into
Conversation
|
Warning Review limit reached
More reviews will be available in 14 minutes and 53 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (7)
📝 WalkthroughWalkthroughAdds an optional ChangesMax Output Tokens Capability for Z.A.I GLM
Sequence DiagramsequenceDiagram
participant ThinkingBudget as ThinkingBudget (Webview UI)
participant SettingsStore as SettingsStore (setApiConfigurationField)
participant ProviderSettings as ProviderSettingsManager
participant ZAiHandler as ZAiHandler (Server)
participant ZAiAPI as Z.ai API (create)
ThinkingBudget->>SettingsStore: user changes modelMaxTokens (persist)
SettingsStore->>ProviderSettings: saved apiConfiguration/profile
ProviderSettings->>ZAiHandler: runtime loads apiConfiguration
ZAiHandler->>ZAiAPI: create(request with max_tokens = modelMaxTokens)
ZAiAPI-->>ZAiHandler: response
ZAiHandler-->>ThinkingBudget: result/state
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@webview-ui/src/components/settings/__tests__/ThinkingBudget.spec.tsx`:
- Around line 320-326: The test fixture glmApiConfiguration is widening
apiProvider to string causing a TS2322 error; fix by typing the fixture with the
ProviderSettings literal type (e.g., change the declaration of
glmApiConfiguration to use "satisfies ProviderSettings" — optionally combined
with "as const" to preserve literal types) so the apiProvider remains the
literal "zai" when passed into ThinkingBudget with defaultProps.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 995ad486-eba2-459e-9c28-6802c53ac7db
📒 Files selected for processing (5)
packages/types/src/model.tspackages/types/src/providers/zai.tssrc/api/providers/__tests__/zai.spec.tswebview-ui/src/components/settings/ThinkingBudget.tsxwebview-ui/src/components/settings/__tests__/ThinkingBudget.spec.tsx
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
Profile export strips The export cleanup at line 572 deletes Suggested fix: const keepMaxTokens = supportsReasoningBudget || modelInfo.supportsMaxTokens
if (!supportsReasoningBudget) {
delete configs[name].modelMaxThinkingTokens
}
if (!keepMaxTokens) {
delete configs[name].modelMaxTokens
}(This file isn't in the PR diff so can't be posted as an inline comment.) |
| ) : null | ||
|
|
||
| // Models with supportsReasoningBinary (binary reasoning) show a simple on/off toggle | ||
| if (isReasoningSupported) { |
There was a problem hiding this comment.
If a model ever has both supportsReasoningBinary: true and supportsMaxTokens: true, the standalone slider won't render — this block returns before maxOutputTokensControl is reached. Worth including {maxOutputTokensControl} here alongside the checkbox, even if no current model hits this?
There was a problem hiding this comment.
Great catch — you're right that supportsReasoningBinary models (like deepseek-v3.2) would have their modelMaxThinkingTokens incorrectly stripped under the current logic.\n\nI've updated the guard to include supportsReasoningBinary in the reasoning-capable check:\n\nts\nconst supportsReasoning =\n modelInfo.supportsReasoningBudget ||\n modelInfo.requiredReasoningBudget ||\n modelInfo.supportsReasoningBinary\n\nif (supportsReasoning) {\n // keep modelMaxThinkingTokens — budget, required, or binary\n} else {\n delete configs[name].modelMaxThinkingTokens\n}\n\n\nThis ensures binary-reasoning models retain the field while non-reasoning models still get it cleaned up. Thanks for flagging this!
|
While reviewing the round-trip behavior for this feature, I found that |
…Code-Org#161) Z.ai GLM models (glm-5.1, glm-5-turbo) default to a 20% output clamp and the runtime already honors an explicit modelMaxTokens, but the settings UI never surfaced a control for it — those models only exposed reasoning effort. Adds an optional supportsMaxTokens capability flag (set on the GLM models), and reuses the ThinkingBudget settings component to render a max-output-tokens slider gated on supportsMaxTokens && !supportsReasoningBudget. The slider defaults to the existing output clamp when unset, so behavior is unchanged until the user edits it; an explicit value persists as modelMaxTokens. Closes Zoo-Code-Org#161
a99ea3c to
6cdbb6b
Compare
|
@edelauna Thanks for the detailed code review identifying the I've created a fix PR that addresses the scoping issue: PR #332 The fix extracts Could you please review PR #332 when you get a chance? 🙏 |
Summary
Z.ai GLM models (
glm-5.1,glm-5-turbo) default to a 20% output clamp. The runtime already honors an explicitmodelMaxTokens(createStreamWithThinking:max_tokens = options.modelMaxTokens || getModelMaxOutputTokens(...)), but the settings UI never surfaced a control — those models only exposed reasoning effort. This closes that gap.Change
supportsMaxTokensonmodelInfoSchema, set on the configurable GLM models in bothinternationalZAiModelsandmainlandZAiModels.ThinkingBudgetsettings component to render a max-output-tokens slider, gated onsupportsMaxTokens && !supportsReasoningBudget. It defaults to the runtime's existing clamp when unset (behavior unchanged until the user edits), and persists an explicit value asmodelMaxTokens.Tests
zai.spec.ts): the flag is present on the configurable GLM models (absent on glm-4.7), and an explicitmodelMaxTokensoverride is sent asmax_tokensinstead of the clamp.ThinkingBudget.spec.tsx): slider renders alongside reasoning effort, defaults to the clamp when unset, reflects an override, does NOT persist on initial render (init vs user-edit), persists on user change, and does not render when the flag is absent.Closes #161
Summary by CodeRabbit
New Features
Chores
Tests