Skip to content

Add token-aware prompt budgeting as an opt-in path#531

Merged
FuJacob merged 2 commits into
mainfrom
feat/token-budgeting
Jun 2, 2026
Merged

Add token-aware prompt budgeting as an opt-in path#531
FuJacob merged 2 commits into
mainfrom
feat/token-budgeting

Conversation

@FuJacob
Copy link
Copy Markdown
Owner

@FuJacob FuJacob commented Jun 2, 2026

Summary

The base-model prompt is budgeted in characters as a deliberate ~4-chars-per-token approximation (see PromptSectionBudget's own comment). That ratio is far off for code and non-Latin text. This adds a token-aware budgeting path — the swap the comment anticipated — without paying for the runtime tokenizer on the main-actor prompt path.

  • TokenCountEstimator (new, pure, tested): a cheap word-aware heuristic (roughly four characters per token within a word, every word at least one token), closer to subword tokenization than a single global ratio and deterministic for tests.
  • PromptSectionBudget.allocate(_:totalTokens:estimate:) (new, additive): fills by priority against an estimated-token budget, converting each section's token cap to a character cap via that content's own density so the existing character-based truncate is reused unchanged. The character allocate is untouched.
  • BaseCompletionPromptRenderer takes an optional tokenBudget; nil keeps the character path.

Validation

xcodebuild ... test ... CODE_SIGNING_ALLOWED=NO CODE_SIGNING_REQUIRED=NO \
  -only-testing:CotabbyTests/TokenCountEstimatorTests \
  -only-testing:CotabbyTests/PromptSectionBudgetTests \
  -only-testing:CotabbyTests/BaseCompletionPromptRendererTests
# ** TEST SUCCEEDED **
#   estimator: empty=0, every word >=1 token, longer text estimates more, scales with word count
#   token allocate: priority fill, drops low priority when tight, respects the token budget
#   renderer: the caret prefix stays un-starved under a tight token budget; char-path tests unchanged

swiftlint --strict   # exit 0 (CI-equivalent)
xcodegen generate    # registered the new source + test file

Linked issues

None. Prompting parity: token-aware (vs flat character) section budgeting.

Risk / rollout notes

  • Opt-in, no behavior change. tokenBudget defaults to nil, so the character path is taken and shipped behavior is byte-for-byte unchanged; the existing budget and renderer tests pass untouched.
  • This lands the pure, tested estimator and token allocator. Wiring a caller to pass a real token budget is the follow-up: the right budget value and the quality delta over the character approximation need on-device validation, so it stays opt-in until then. The estimator is intentionally approximate and used only for relative budgeting, never a hard token limit.
  • project.pbxproj regenerated by XcodeGen for the two new files.

Greptile Summary

This PR introduces an opt-in token-aware prompt budgeting path as a drop-in complement to the existing character-based allocator. The tokenBudget parameter defaults to nil so no shipped behaviour changes until a caller is wired in a follow-up.

  • TokenCountEstimator (new): a pure word-aware heuristic that splits on both whitespace and punctuation, giving closer approximations for code and punctuation-heavy prose without a real tokenizer on the main-actor path.
  • PromptSectionBudget.allocate(_:totalTokens:estimate:) (new): fills sections by priority against an estimated-token budget, converting each section's token cap to chars via that section's own density so the existing truncate helper is reused unchanged; a max(0,…) clamp prevents density-inverted truncated slices from blocking subsequent sections.
  • BaseCompletionPromptRenderer: routes to the new token allocator only when tokenBudget is non-nil, leaving the character path byte-for-byte identical.

Confidence Score: 5/5

Safe to merge: the new token path is fully opt-in (nil default), all existing tests pass unchanged, and the two issues raised in the prior review round have been addressed.

The change is additive and isolated behind a nil-default parameter, so no existing behaviour can regress. The new allocator correctly handles priority ordering, the density-inversion clamp, and empty/whitespace content. The estimator now splits on punctuation as well as whitespace, matching what real subword tokenizers do. No caller yet passes a non-nil tokenBudget, so the new path has zero production exposure until explicitly wired.

No files require special attention. The one observation is a test-quality note in PromptSectionBudgetTests.swift around an assertion that holds only for uniform-density data.

Important Files Changed

Filename Overview
Cotabby/Support/TokenCountEstimator.swift New pure estimator; splits on both whitespace and punctuation (addressing the previous-thread finding), correctly floors at 1 token per word, returns 0 for empty/whitespace-only input.
Cotabby/Support/PromptSectionBudget.swift Adds token-aware allocate overload; correctly converts remaining tokens to chars via per-section density, clamps to 0 on over-deduction (addressing prior thread). The max(0,…) clamp means a density-inverted truncated slice can push total token usage over totalTokens, but the relaxation is design-intentional.
Cotabby/Support/BaseCompletionPromptRenderer.swift Clean opt-in: tokenBudget defaults to nil, preserving existing char-path behavior byte-for-byte; the if-let branch routes to the new token allocate only when a budget is supplied.
CotabbyTests/PromptSectionBudgetTests.swift Three new token-allocate tests; priority fill and budget-drop are well-covered. The test_tokenAllocate_respectsTokenBudget assertion holds only because the test data is uniform density — non-uniform data can produce used > totalTokens by design.
CotabbyTests/TokenCountEstimatorTests.swift Good relational test coverage (empty=0, minimum 1 token/word, monotone growth, punctuation boundary splitting); locks behaviour without over-specifying exact counts.
CotabbyTests/BaseCompletionPromptRendererTests.swift New test verifies the highest-priority caret prefix survives a tight token budget (8 tokens); existing char-path tests left unchanged as claimed.
Cotabby.xcodeproj/project.pbxproj XcodeGen-regenerated; correctly registers TokenCountEstimator.swift in the main target and TokenCountEstimatorTests.swift in the test target.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[BaseCompletionPromptRenderer.prompt] --> B{tokenBudget != nil?}
    B -- Yes --> C[PromptSectionBudget.allocate\ntotalTokens: tokenBudget\nestimate: TokenCountEstimator.estimate]
    B -- No --> D[PromptSectionBudget.allocate\ntotalChars: contextBudget]
    C --> E[Sort sections by priority descending]
    D --> F[Sort sections by priority descending]
    E --> G[For each section\ncompute charsPerToken density\nconvert remainingTokens to remainingChars\ncap = min maxChars, content.count, remainingChars]
    F --> H[For each section\ncap = min maxChars, content.count, remaining]
    G --> I{cap >= minChars?}
    H --> I
    I -- No --> J[Drop section, continue]
    I -- Yes --> K[truncate + trim]
    K --> L{truncated empty?}
    L -- Yes --> J
    L -- No --> M[Keep section\ndeduct from budget\nclamp to 0]
    M --> N[Return sections in original order]
Loading

Fix All in Codex Fix All in Claude Code

Reviews (2): Last reviewed commit: "Address review feedback on token budgeti..." | Re-trigger Greptile

The base-model prompt is budgeted in characters as a deliberate ~4-chars-per-token
approximation. That ratio is far off for code and non-Latin text, where it can
under- or over-fill the real context window. This adds a token-aware path that
swaps in an estimated token count, exactly as PromptSectionBudget's own comment
anticipated, without paying for the runtime tokenizer on the main-actor prompt
path.

- TokenCountEstimator is a pure, cheap, word-aware heuristic (roughly four
  characters per token within a word, every word at least one token) — closer to
  real subword tokenization than a single global ratio, deterministic for tests.
- PromptSectionBudget gains an additive allocate(_:totalTokens:estimate:) that
  fills by priority against an estimated-token budget, converting each section's
  token cap to a character cap via that content's own density so the existing
  character-based truncate is reused unchanged. The character allocate is untouched.
- BaseCompletionPromptRenderer takes an optional tokenBudget; nil keeps the
  character path, so shipped behavior is unchanged.

The estimator, the token allocator, and the renderer's token path are all
unit-tested (the caret prefix stays un-starved under a tight token budget). Wiring
a caller to pass a real token budget is the follow-up: the right budget value and
the quality delta need on-device validation, so it stays opt-in until then.
@FuJacob FuJacob force-pushed the feat/token-budgeting branch from cc1990f to f865fdc Compare June 2, 2026 03:52
Comment thread Cotabby/Support/PromptSectionBudget.swift
Comment thread Cotabby/Support/TokenCountEstimator.swift
- PromptSectionBudget: clamp remainingTokens at zero. A truncated slice can be
  token-denser than the section average, so deducting its estimate could drive the
  remaining budget negative and wrongly drop the next section even when it fits.
- TokenCountEstimator: split on punctuation as well as whitespace, so contractions
  ("can't") and punctuation-joined identifiers ("foo.bar") aren't undercounted as a
  single word.
@FuJacob FuJacob merged commit 1913ad0 into main Jun 2, 2026
4 checks passed
@FuJacob FuJacob deleted the feat/token-budgeting branch June 2, 2026 04:44
@FuJacob FuJacob mentioned this pull request Jun 2, 2026
Merged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant