Govern completion length by token budget only by FuJacob · Pull Request #249 · FuJacob/cotabby

FuJacob · 2026-05-25T13:33:38Z

Summary

Completion length was enforced two ways: an in-prompt word-range cue ("Return only the next 7 to 12 words.") and a token cap. This makes the token cap the single source of truth — the word-range cue is removed from both the local-model (LlamaPromptRenderer) and Apple Intelligence (FoundationModelPromptRenderer) prompts, and suggestedPredictionTokenBudget is bumped ~50% (11/18/30 → 17/27/45) so the cap has room to stop on a natural boundary instead of hard-truncating mid-thought. Both engines already read the same request.maxPredictionTokens, so the cap stays in sync across them.

Validation

xcodebuild -project Cotabby.xcodeproj -scheme Cotabby -destination 'platform=macOS' build-for-testing
# ** TEST BUILD SUCCEEDED **

swiftlint lint --quiet
# clean for changed files

Note: local xcodebuild test could not run the app-hosted bundle due to a Team ID code-signing mismatch on this machine (documented limitation); CI runs with a valid signing identity. Test logic was updated to match the new behavior:

LlamaPromptRendererTests / PromptPolicyTests / CustomRulesTests now assert the word-range cue is absent from both prompts.
ModelAndPresentationValueTests updated to the new token budgets (17/27/45).

Linked issues

Risk / rollout notes

Behavior change to an existing user flow. With no in-prompt target, the word-count presets (3-7, 7-12, 12-20) become ceilings rather than targets. The token budgets allow roughly ~12 / ~20 / ~33 words at the top end (~0.75 words/token), so shorter presets can now overshoot their label. The model still tends to stop at sentence boundaries on its own.
completionLengthInstruction stays wired through SuggestionRequest and both renderers (Llama via _ =), so re-enabling the in-prompt cue is a one-line revert in each renderer.
No schema, settings, or pbxproj migrations.

Remove the explicit word-range cue from both the local-model and Apple Intelligence prompts so completion length is governed solely by the shared token budget (request.maxPredictionTokens). Bump suggestedPredictionTokenBudget 50% (11/18/30 -> 17/27/45) so the cap has room to land on a natural stopping point instead of hard-truncating mid-thought. The completionLengthInstruction parameter stays wired for a one-line revert.

FuJacob · 2026-05-25T14:20:17Z

Superseded — these changes already shipped to main. This branch was the base of fix/ghost-text-size-stabilization, which merged as #251. Because #251 was a squash merge, GitHub can't see this PR's original commits in main's history, so it stayed 'open' even though the token-budget completion-length change is live (the prompt renderers on main are byte-identical to this branch). Closing to clean up; nothing to merge.

FuJacob closed this May 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Govern completion length by token budget only#249

Govern completion length by token budget only#249
FuJacob wants to merge 1 commit into
mainfrom
experiment/token-cap-only-completion-length

FuJacob commented May 25, 2026

Uh oh!

FuJacob commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

FuJacob commented May 25, 2026

Summary

Validation

Linked issues

Risk / rollout notes

Uh oh!

FuJacob commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant