Skip to content

Govern completion length by token budget only#249

Closed
FuJacob wants to merge 1 commit into
mainfrom
experiment/token-cap-only-completion-length
Closed

Govern completion length by token budget only#249
FuJacob wants to merge 1 commit into
mainfrom
experiment/token-cap-only-completion-length

Conversation

@FuJacob
Copy link
Copy Markdown
Owner

@FuJacob FuJacob commented May 25, 2026

Summary

Completion length was enforced two ways: an in-prompt word-range cue ("Return only the next 7 to 12 words.") and a token cap. This makes the token cap the single source of truth — the word-range cue is removed from both the local-model (LlamaPromptRenderer) and Apple Intelligence (FoundationModelPromptRenderer) prompts, and suggestedPredictionTokenBudget is bumped ~50% (11/18/30 → 17/27/45) so the cap has room to stop on a natural boundary instead of hard-truncating mid-thought. Both engines already read the same request.maxPredictionTokens, so the cap stays in sync across them.

Validation

xcodebuild -project Cotabby.xcodeproj -scheme Cotabby -destination 'platform=macOS' build-for-testing
# ** TEST BUILD SUCCEEDED **

swiftlint lint --quiet
# clean for changed files

Note: local xcodebuild test could not run the app-hosted bundle due to a Team ID code-signing mismatch on this machine (documented limitation); CI runs with a valid signing identity. Test logic was updated to match the new behavior:

  • LlamaPromptRendererTests / PromptPolicyTests / CustomRulesTests now assert the word-range cue is absent from both prompts.
  • ModelAndPresentationValueTests updated to the new token budgets (17/27/45).

Linked issues

Risk / rollout notes

  • Behavior change to an existing user flow. With no in-prompt target, the word-count presets (3-7, 7-12, 12-20) become ceilings rather than targets. The token budgets allow roughly ~12 / ~20 / ~33 words at the top end (~0.75 words/token), so shorter presets can now overshoot their label. The model still tends to stop at sentence boundaries on its own.
  • completionLengthInstruction stays wired through SuggestionRequest and both renderers (Llama via _ =), so re-enabling the in-prompt cue is a one-line revert in each renderer.
  • No schema, settings, or pbxproj migrations.

Remove the explicit word-range cue from both the local-model and Apple
Intelligence prompts so completion length is governed solely by the shared
token budget (request.maxPredictionTokens). Bump suggestedPredictionTokenBudget
50% (11/18/30 -> 17/27/45) so the cap has room to land on a natural stopping
point instead of hard-truncating mid-thought. The completionLengthInstruction
parameter stays wired for a one-line revert.
@FuJacob
Copy link
Copy Markdown
Owner Author

FuJacob commented May 25, 2026

Superseded — these changes already shipped to main. This branch was the base of fix/ghost-text-size-stabilization, which merged as #251. Because #251 was a squash merge, GitHub can't see this PR's original commits in main's history, so it stayed 'open' even though the token-budget completion-length change is live (the prompt renderers on main are byte-identical to this branch). Closing to clean up; nothing to merge.

@FuJacob FuJacob closed this May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant