Skip to content

fix+style: resolve CodeQL false-positive and ruff format violations for PR #6660#6738

Open
rin259 wants to merge 2 commits intoAstrBotDevs:masterfrom
rin259:fix/codeql-and-format
Open

fix+style: resolve CodeQL false-positive and ruff format violations for PR #6660#6738
rin259 wants to merge 2 commits intoAstrBotDevs:masterfrom
rin259:fix/codeql-and-format

Conversation

@rin259
Copy link
Contributor

@rin259 rin259 commented Mar 21, 2026

Summary

Combined fix for PR #6660 (perf/context-compression-v2):

  1. CodeQL fix: Rename prev_tokens -> token_count_before in _run_compression() — clears the false-positive pycleartext-logging-sensitive-data alert. The value is a plain integer token count, not an auth token.

  2. format-check fix: Ruff format corrections for 3 files:

    • astrbot/core/agent/context/compressor.py
    • astrbot/core/agent/context/manager.py
    • astrbot/core/agent/context/token_counter.py

    Fixes trailing whitespace and line-length violations.


Merge after #6660, or merge both together.

Summary by Sourcery

Optimize context compression pipeline with improved token counting, caching, and statistics, and add comprehensive tests for the new behavior.

New Features:

  • Introduce caching and statistics for token counting in EstimateTokenCounter, including per-message overhead and cache management APIs.
  • Add fingerprinting, cached token usage, and statistics/reset APIs to ContextManager to reduce redundant token computations and track compression behavior.
  • Add summary cache and cache management to LLMSummaryCompressor to avoid redundant LLM summarization calls.

Enhancements:

  • Refine token estimation logic to better handle mixed Chinese/English text, digits, and special characters.
  • Improve compression flow in ContextManager and compressors to minimize repeated token counts and clarify logging and docstrings.

Tests:

  • Add tests covering token estimation accuracy, cache key behavior, cache clearing, truncation compressor thresholds, history splitting, summary cache behavior, and ContextManager stats and compression behavior.

rin and others added 2 commits March 20, 2026 11:13
- Token 估算算法改进(中英数特字符分别计算)
- 添加 Token 计数缓存和摘要缓存
- ContextManager 添加指纹机制减少重复计算
- 修复缓存键碰撞和 overhead 重复计算 bug
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Mar 21, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on enhancing the robustness and efficiency of the context compression system. It resolves a CodeQL false-positive by clarifying variable naming and ensures code style consistency by fixing Ruff formatting issues. Crucially, it introduces significant performance improvements through intelligent caching for token counting and LLM summaries, alongside a more precise token estimation algorithm, all backed by new, comprehensive unit tests.

Highlights

  • CodeQL False-Positive Resolution: Addressed a CodeQL alert by renaming a variable (prev_tokens to token_count_before) to clarify its non-sensitive nature, resolving a false-positive pycleartext-logging-sensitive-data alert.
  • Ruff Formatting Compliance: Corrected Ruff format violations, including trailing whitespace and line-length issues, in three core context management files (compressor.py, manager.py, token_counter.py).
  • Enhanced Context Compression Caching: Implemented caching mechanisms for both LLM-generated summaries and token counts within the LLMSummaryCompressor and ContextManager to prevent redundant computations and improve performance.
  • Improved Token Estimation Algorithm: Updated the EstimateTokenCounter with a more accurate estimation algorithm for mixed-language (Chinese/English), digit, and special character content, and added per-message overhead.
  • Comprehensive Test Coverage: Introduced a new dedicated test file (tests/test_context_compression.py) to validate the functionality and optimizations of the context compression module, including token estimation and caching.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@dosubot dosubot bot added the area:core The bug / feature is about astrbot's core, backend label Mar 21, 2026
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • ContextManager currently reaches into EstimateTokenCounter via the private _get_cache_key method to build its fingerprint; consider promoting this to a public API or extracting a shared utility so callers don’t depend on internal implementation details.
  • The summary cache key in _generate_summary_cache_key only uses the first 50 characters of each message’s content and ignores fields like tool_calls, which increases the risk of collisions and stale summaries for different histories; consider incorporating more of the message structure (or a stable hash of the full content) into the key.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- ContextManager currently reaches into EstimateTokenCounter via the private `_get_cache_key` method to build its fingerprint; consider promoting this to a public API or extracting a shared utility so callers don’t depend on internal implementation details.
- The summary cache key in `_generate_summary_cache_key` only uses the first 50 characters of each message’s content and ignores fields like `tool_calls`, which increases the risk of collisions and stale summaries for different histories; consider incorporating more of the message structure (or a stable hash of the full content) into the key.

## Individual Comments

### Comment 1
<location path="astrbot/core/agent/context/compressor.py" line_range="149-158" />
<code_context>
     return system_messages, messages_to_summarize, recent_messages


+def _generate_summary_cache_key(messages: list[Message]) -> str:
+    """Generate a cache key for summary based on full history.
+
+    Uses role and content from all messages to create a collision-resistant key.
+    """
+    if not messages:
+        return ""
+
+    key_parts = []
+    for msg in messages:
+        content = msg.content if isinstance(msg.content, str) else str(msg.content)
+        key_parts.append(f"{msg.role}:{content[:50]}")
+
+    return "|".join(key_parts)
+
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Summary cache key truncates content to 50 characters, which can cause semantically different histories to share the same summary.

Because the key uses `role:content[:50]` per message, different conversations that only diverge after 50 characters can collide and share an incorrect cached summary. To reduce this risk, you could either increase the limit or derive the key from a hash of the full (or longer) serialized messages. If you keep the 50‑char limit, consider adding other distinguishing data (e.g., message count/timestamps) or disabling caching for very long messages to bound incorrect reuse.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +149 to +158
def _generate_summary_cache_key(messages: list[Message]) -> str:
"""Generate a cache key for summary based on full history.

Uses role and content from all messages to create a collision-resistant key.
"""
if not messages:
return ""

key_parts = []
for msg in messages:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Summary cache key truncates content to 50 characters, which can cause semantically different histories to share the same summary.

Because the key uses role:content[:50] per message, different conversations that only diverge after 50 characters can collide and share an incorrect cached summary. To reduce this risk, you could either increase the limit or derive the key from a hash of the full (or longer) serialized messages. If you keep the 50‑char limit, consider adding other distinguishing data (e.g., message count/timestamps) or disabling caching for very long messages to bound incorrect reuse.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several optimizations to the context compression logic, including caching for token counting and summarization, along with improved token estimation. It also adds a comprehensive test suite for these new features.

My review focuses on the new caching implementations. I've identified a potential correctness issue with the summary cache key generation that could lead to collisions. I've also suggested improvements to the cache eviction strategies and pointed out some code duplication. Overall, the changes are a great step towards improving performance, and the addition of tests is highly valuable.

Comment on lines +149 to +162
def _generate_summary_cache_key(messages: list[Message]) -> str:
"""Generate a cache key for summary based on full history.

Uses role and content from all messages to create a collision-resistant key.
"""
if not messages:
return ""

key_parts = []
for msg in messages:
content = msg.content if isinstance(msg.content, str) else str(msg.content)
key_parts.append(f"{msg.role}:{content[:50]}")

return "|".join(key_parts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of _generate_summary_cache_key truncates message content to the first 50 characters (content[:50]) to create the cache key. This can lead to cache key collisions if different messages share the same prefix but have different content afterwards. This could result in serving an incorrect cached summary, which is a correctness issue. To prevent this, you should use a collision-resistant hash of the full message content, similar to the approach in EstimateTokenCounter._get_cache_key.

Suggested change
def _generate_summary_cache_key(messages: list[Message]) -> str:
"""Generate a cache key for summary based on full history.
Uses role and content from all messages to create a collision-resistant key.
"""
if not messages:
return ""
key_parts = []
for msg in messages:
content = msg.content if isinstance(msg.content, str) else str(msg.content)
key_parts.append(f"{msg.role}:{content[:50]}")
return "|".join(key_parts)
def _generate_summary_cache_key(messages: list[Message]) -> str:
"""Generate a cache key for summary based on full history.
Uses a hash of role and content from all messages to create a collision-resistant key.
"""
if not messages:
return ""
h = 0
for msg in messages:
content = msg.content if isinstance(msg.content, str) else str(msg.content)
h = hash((h, msg.role, content))
return str(h)

Comment on lines +267 to +272
if len(self._summary_cache) < self._max_cache_size:
self._summary_cache[cache_key] = summary_content
else:
# 简单的缓存淘汰
self._summary_cache.pop(next(iter(self._summary_cache)))
self._summary_cache[cache_key] = summary_content
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for adding an item to the cache is duplicated in the if and else blocks. This can be refactored to reduce code duplication and improve readability.

Additionally, the current cache eviction strategy is FIFO (pop(next(iter(self._summary_cache)))). A Least Recently Used (LRU) policy would likely be more performant by keeping more relevant items in the cache. You could implement this using collections.OrderedDict.

Suggested change
if len(self._summary_cache) < self._max_cache_size:
self._summary_cache[cache_key] = summary_content
else:
# 简单的缓存淘汰
self._summary_cache.pop(next(iter(self._summary_cache)))
self._summary_cache[cache_key] = summary_content
if self._max_cache_size > 0:
if len(self._summary_cache) >= self._max_cache_size:
# 简单的缓存淘汰
self._summary_cache.pop(next(iter(self._summary_cache)))
self._summary_cache[cache_key] = summary_content

Comment on lines +113 to +117
# 简单的缓存淘汰: 清空一半
keys_to_remove = list(self._cache.keys())[: self._cache_size // 2]
for key in keys_to_remove:
del self._cache[key]
self._cache[cache_key] = total
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current cache eviction strategy is to remove half of the cache items when the cache is full. This can be inefficient as it involves converting dictionary keys to a list (list(self._cache.keys())) and may discard many recently or frequently used entries at once. A more standard and performant approach would be to use a Least Recently Used (LRU) eviction policy. This would evict only the single least recently used item when space is needed and generally provides better hit rates. You could implement this using collections.OrderedDict.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant