fix(compaction): use tiktoken for exact context tracking and enforce …#1376
fix(compaction): use tiktoken for exact context tracking and enforce …#1376krisclarkdev wants to merge 9 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request replaces the previous character-based token estimation with accurate tokenization using the tiktoken-rs library and introduces a pre-flight check to truncate message history when it exceeds the model's context window. The review feedback highlights several performance concerns, specifically the overhead of re-tokenizing large conversation histories within the TUI render loop and the summary creation process. It is recommended to cache these estimates and optimize the message-dropping logic to avoid O(N^2) complexity. Additionally, a suggestion was made to use safer arithmetic to prevent potential overflows during context window comparisons.
a1d9a52 to
26409b8
Compare
|
Hey! Just a quick heads-up on the latest pushes:
Everything is compiling locally with all tests passing. Let me know if you need anything else! |
26409b8 to
975b91f
Compare
…payload limits This commit addresses the issue where the token calculation under-counted compared to the upstream API (LiteLLM/llama.cpp). Changes include: - Integrated iktoken-rs for precise token counting instead of character-based heuristics. - Updated the UI context bar to reflect real-time estimates instead of delayed API counts. - Added a pre-flight payload check in the compaction execution to drop the oldest unpinned messages if the payload exceeds API limits.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
- Optimized create_summary logic in compaction.rs to calculate token estimates in O(N) instead of O(N^2) by applying subtractive calculations across dropped payload messages instead of fully recalculating the array. - Replaced standard arithmetic with saturating_add and saturating_mul in context limit logic for absolute safety. - Cached estimated_context_tokens in App::SessionState instead of running tiktoken synchronously per-frame in the render loop. Safe accessor methods (push_api_message, set_api_messages) have been implemented to keep the cache continuously valid.
975b91f to
3c8ab04
Compare
…compact # Conflicts: # crates/tui/src/tui/sidebar.rs
|
@Hmbown I've updated the branch to resolve the merge conflicts and sync with the latest main. It's ready for a look if you get a chance |
|
This PR was opened before the v0.8.41 rebrand and is now stale. Feel free to rebase onto current |
…payload limits
This commit addresses the issue where the token calculation under-counted compared to the upstream API (LiteLLM/llama.cpp).
Changes include:
Integrated iktoken-rs for precise token counting instead of character-based heuristics.
Updated the UI context bar to reflect real-time estimates instead of delayed API counts.
Added a pre-flight payload check in the compaction execution to drop the oldest unpinned messages if the payload exceeds API limits.
Summary
Testing
cargo test --all-featurescargo fmt --all -- --checkcargo clippy --all-targets --all-featuresChecklist
Resolves #1440