Skip to content

fix(compaction): use tiktoken for exact context tracking and enforce …#1376

Closed
krisclarkdev wants to merge 9 commits into
Hmbown:mainfrom
krisclarkdev:bugfix-context-autocompact
Closed

fix(compaction): use tiktoken for exact context tracking and enforce …#1376
krisclarkdev wants to merge 9 commits into
Hmbown:mainfrom
krisclarkdev:bugfix-context-autocompact

Conversation

@krisclarkdev
Copy link
Copy Markdown

@krisclarkdev krisclarkdev commented May 10, 2026

…payload limits

This commit addresses the issue where the token calculation under-counted compared to the upstream API (LiteLLM/llama.cpp).

Changes include:

  • Integrated iktoken-rs for precise token counting instead of character-based heuristics.

  • Updated the UI context bar to reflect real-time estimates instead of delayed API counts.

  • Added a pre-flight payload check in the compaction execution to drop the oldest unpinned messages if the payload exceeds API limits.

Summary

Testing

  • cargo test --all-features
  • cargo fmt --all -- --check
  • cargo clippy --all-targets --all-features

Checklist

  • Updated docs or comments as needed
  • Added or updated tests where relevant
  • Verified TUI behavior manually if UI changes

Resolves #1440

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the previous character-based token estimation with accurate tokenization using the tiktoken-rs library and introduces a pre-flight check to truncate message history when it exceeds the model's context window. The review feedback highlights several performance concerns, specifically the overhead of re-tokenizing large conversation histories within the TUI render loop and the summary creation process. It is recommended to cache these estimates and optimize the message-dropping logic to avoid O(N^2) complexity. Additionally, a suggestion was made to use safer arithmetic to prevent potential overflows during context window comparisons.

Comment thread crates/tui/src/tui/sidebar.rs Outdated
Comment thread crates/tui/src/tui/sidebar.rs Outdated
Comment thread crates/tui/src/compaction.rs Outdated
Comment thread crates/tui/src/compaction.rs Outdated
@krisclarkdev krisclarkdev force-pushed the bugfix-context-autocompact branch 2 times, most recently from a1d9a52 to 26409b8 Compare May 11, 2026 16:59
@krisclarkdev
Copy link
Copy Markdown
Author

Hey! Just a quick heads-up on the latest pushes:

  1. I addressed the performance and O(N^2) feedback from gemini-code-assist. The token estimation is now fully cached in SessionState so it doesn't block the render loop, and the message dropping logic was optimized to O(N).
  2. I linked this PR to Token的消耗还是比Claude 多很多,希望能改进,在同一需求改动的情况下 #1440 since fixing this token inflation bug directly prevents the engine from prematurely (and expensively) triggering the auto-compactor.
  3. (Apologies for the brief moment where a massive 240k-line diff appeared! A stray testproject/ directory got accidentally staged during a cargo fmt. I did a clean reset and force-pushed, so the commit history is completely clean again).

Everything is compiling locally with all tests passing. Let me know if you need anything else!

@krisclarkdev krisclarkdev force-pushed the bugfix-context-autocompact branch from 26409b8 to 975b91f Compare May 12, 2026 14:28
krisclarkdev and others added 6 commits May 12, 2026 17:12
…payload limits

This commit addresses the issue where the token calculation under-counted compared to the upstream API (LiteLLM/llama.cpp).

Changes include:

- Integrated 	iktoken-rs for precise token counting instead of character-based heuristics.

- Updated the UI context bar to reflect real-time estimates instead of delayed API counts.

- Added a pre-flight payload check in the compaction execution to drop the oldest unpinned messages if the payload exceeds API limits.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
- Optimized create_summary logic in compaction.rs to calculate token estimates in O(N) instead of O(N^2) by applying subtractive calculations across dropped payload messages instead of fully recalculating the array.

- Replaced standard arithmetic with saturating_add and saturating_mul in context limit logic for absolute safety.

- Cached estimated_context_tokens in App::SessionState instead of running tiktoken synchronously per-frame in the render loop. Safe accessor methods (push_api_message, set_api_messages) have been implemented to keep the cache continuously valid.
@krisclarkdev krisclarkdev force-pushed the bugfix-context-autocompact branch from 975b91f to 3c8ab04 Compare May 13, 2026 00:58
…compact

# Conflicts:
#	crates/tui/src/tui/sidebar.rs
@krisclarkdev
Copy link
Copy Markdown
Author

@Hmbown I've updated the branch to resolve the merge conflicts and sync with the latest main. It's ready for a look if you get a chance

@Hmbown
Copy link
Copy Markdown
Owner

Hmbown commented May 23, 2026

This PR was opened before the v0.8.41 rebrand and is now stale. Feel free to rebase onto current main and reopen. 鲸鱼兄弟们等你 🐋

@Hmbown Hmbown closed this May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Token的消耗还是比Claude 多很多,希望能改进,在同一需求改动的情况下

2 participants