Govern completion length by token budget only#249
Closed
FuJacob wants to merge 1 commit into
Closed
Conversation
Remove the explicit word-range cue from both the local-model and Apple Intelligence prompts so completion length is governed solely by the shared token budget (request.maxPredictionTokens). Bump suggestedPredictionTokenBudget 50% (11/18/30 -> 17/27/45) so the cap has room to land on a natural stopping point instead of hard-truncating mid-thought. The completionLengthInstruction parameter stays wired for a one-line revert.
Owner
Author
|
Superseded — these changes already shipped to main. This branch was the base of |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Completion length was enforced two ways: an in-prompt word-range cue ("Return only the next 7 to 12 words.") and a token cap. This makes the token cap the single source of truth — the word-range cue is removed from both the local-model (
LlamaPromptRenderer) and Apple Intelligence (FoundationModelPromptRenderer) prompts, andsuggestedPredictionTokenBudgetis bumped ~50% (11/18/30 → 17/27/45) so the cap has room to stop on a natural boundary instead of hard-truncating mid-thought. Both engines already read the samerequest.maxPredictionTokens, so the cap stays in sync across them.Validation
Note: local
xcodebuild testcould not run the app-hosted bundle due to a Team ID code-signing mismatch on this machine (documented limitation); CI runs with a valid signing identity. Test logic was updated to match the new behavior:LlamaPromptRendererTests/PromptPolicyTests/CustomRulesTestsnow assert the word-range cue is absent from both prompts.ModelAndPresentationValueTestsupdated to the new token budgets (17/27/45).Linked issues
Risk / rollout notes
3-7,7-12,12-20) become ceilings rather than targets. The token budgets allow roughly ~12 / ~20 / ~33 words at the top end (~0.75 words/token), so shorter presets can now overshoot their label. The model still tends to stop at sentence boundaries on its own.completionLengthInstructionstays wired throughSuggestionRequestand both renderers (Llama via_ =), so re-enabling the in-prompt cue is a one-line revert in each renderer.