Add token-aware prompt budgeting as an opt-in path#531
Merged
Conversation
The base-model prompt is budgeted in characters as a deliberate ~4-chars-per-token approximation. That ratio is far off for code and non-Latin text, where it can under- or over-fill the real context window. This adds a token-aware path that swaps in an estimated token count, exactly as PromptSectionBudget's own comment anticipated, without paying for the runtime tokenizer on the main-actor prompt path. - TokenCountEstimator is a pure, cheap, word-aware heuristic (roughly four characters per token within a word, every word at least one token) — closer to real subword tokenization than a single global ratio, deterministic for tests. - PromptSectionBudget gains an additive allocate(_:totalTokens:estimate:) that fills by priority against an estimated-token budget, converting each section's token cap to a character cap via that content's own density so the existing character-based truncate is reused unchanged. The character allocate is untouched. - BaseCompletionPromptRenderer takes an optional tokenBudget; nil keeps the character path, so shipped behavior is unchanged. The estimator, the token allocator, and the renderer's token path are all unit-tested (the caret prefix stays un-starved under a tight token budget). Wiring a caller to pass a real token budget is the follow-up: the right budget value and the quality delta need on-device validation, so it stays opt-in until then.
cc1990f to
f865fdc
Compare
- PromptSectionBudget: clamp remainingTokens at zero. A truncated slice can be
token-denser than the section average, so deducting its estimate could drive the
remaining budget negative and wrongly drop the next section even when it fits.
- TokenCountEstimator: split on punctuation as well as whitespace, so contractions
("can't") and punctuation-joined identifiers ("foo.bar") aren't undercounted as a
single word.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The base-model prompt is budgeted in characters as a deliberate ~4-chars-per-token approximation (see
PromptSectionBudget's own comment). That ratio is far off for code and non-Latin text. This adds a token-aware budgeting path — the swap the comment anticipated — without paying for the runtime tokenizer on the main-actor prompt path.TokenCountEstimator(new, pure, tested): a cheap word-aware heuristic (roughly four characters per token within a word, every word at least one token), closer to subword tokenization than a single global ratio and deterministic for tests.PromptSectionBudget.allocate(_:totalTokens:estimate:)(new, additive): fills by priority against an estimated-token budget, converting each section's token cap to a character cap via that content's own density so the existing character-basedtruncateis reused unchanged. The characterallocateis untouched.BaseCompletionPromptRenderertakes an optionaltokenBudget; nil keeps the character path.Validation
Linked issues
None. Prompting parity: token-aware (vs flat character) section budgeting.
Risk / rollout notes
tokenBudgetdefaults to nil, so the character path is taken and shipped behavior is byte-for-byte unchanged; the existing budget and renderer tests pass untouched.project.pbxprojregenerated by XcodeGen for the two new files.Greptile Summary
This PR introduces an opt-in token-aware prompt budgeting path as a drop-in complement to the existing character-based allocator. The
tokenBudgetparameter defaults tonilso no shipped behaviour changes until a caller is wired in a follow-up.TokenCountEstimator(new): a pure word-aware heuristic that splits on both whitespace and punctuation, giving closer approximations for code and punctuation-heavy prose without a real tokenizer on the main-actor path.PromptSectionBudget.allocate(_:totalTokens:estimate:)(new): fills sections by priority against an estimated-token budget, converting each section's token cap to chars via that section's own density so the existingtruncatehelper is reused unchanged; amax(0,…)clamp prevents density-inverted truncated slices from blocking subsequent sections.BaseCompletionPromptRenderer: routes to the new token allocator only whentokenBudgetis non-nil, leaving the character path byte-for-byte identical.Confidence Score: 5/5
Safe to merge: the new token path is fully opt-in (nil default), all existing tests pass unchanged, and the two issues raised in the prior review round have been addressed.
The change is additive and isolated behind a nil-default parameter, so no existing behaviour can regress. The new allocator correctly handles priority ordering, the density-inversion clamp, and empty/whitespace content. The estimator now splits on punctuation as well as whitespace, matching what real subword tokenizers do. No caller yet passes a non-nil tokenBudget, so the new path has zero production exposure until explicitly wired.
No files require special attention. The one observation is a test-quality note in PromptSectionBudgetTests.swift around an assertion that holds only for uniform-density data.
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[BaseCompletionPromptRenderer.prompt] --> B{tokenBudget != nil?} B -- Yes --> C[PromptSectionBudget.allocate\ntotalTokens: tokenBudget\nestimate: TokenCountEstimator.estimate] B -- No --> D[PromptSectionBudget.allocate\ntotalChars: contextBudget] C --> E[Sort sections by priority descending] D --> F[Sort sections by priority descending] E --> G[For each section\ncompute charsPerToken density\nconvert remainingTokens to remainingChars\ncap = min maxChars, content.count, remainingChars] F --> H[For each section\ncap = min maxChars, content.count, remaining] G --> I{cap >= minChars?} H --> I I -- No --> J[Drop section, continue] I -- Yes --> K[truncate + trim] K --> L{truncated empty?} L -- Yes --> J L -- No --> M[Keep section\ndeduct from budget\nclamp to 0] M --> N[Return sections in original order]Reviews (2): Last reviewed commit: "Address review feedback on token budgeti..." | Re-trigger Greptile