Skip to content

feat: add deduplication for WeChat kefu text messages within 15 seconds#7788

Merged
Soulter merged 1 commit intomasterfrom
fix/wecom-kf
Apr 25, 2026
Merged

feat: add deduplication for WeChat kefu text messages within 15 seconds#7788
Soulter merged 1 commit intomasterfrom
fix/wecom-kf

Conversation

@Soulter
Copy link
Copy Markdown
Member

@Soulter Soulter commented Apr 25, 2026

Modifications / 改动点

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果


Checklist / 检查清单

  • 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

  • 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

  • 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

Summary by Sourcery

New Features:

  • Ignore duplicate WeChat kefu text messages received within 15 seconds for the same session to prevent repeated handling.

@auto-assign auto-assign Bot requested review from Fridemn and advent259141 April 25, 2026 08:25
@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. area:platform The bug / feature is about IM platform adapter, such as QQ, Lark, Telegram, WebChat and so on. labels Apr 25, 2026
@Soulter Soulter merged commit 5d79c99 into master Apr 25, 2026
20 checks passed
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The _is_duplicate_wechat_kf_text_message method does a full scan of _wechat_kf_seen_text_messages on every text message to evict expired keys; if this dict can grow large in production, consider a lighter-weight eviction strategy (e.g., periodic cleanup, capped size, or using an ordered structure).
  • Right now deduplication is based on session_id and text.strip() only; if messages differing only in trivial ways (e.g., whitespace changes or common punctuation variants) should also be treated as duplicates, consider centralizing a stronger normalization function for the text content before computing the dedup key.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `_is_duplicate_wechat_kf_text_message` method does a full scan of `_wechat_kf_seen_text_messages` on every text message to evict expired keys; if this dict can grow large in production, consider a lighter-weight eviction strategy (e.g., periodic cleanup, capped size, or using an ordered structure).
- Right now deduplication is based on `session_id` and `text.strip()` only; if messages differing only in trivial ways (e.g., whitespace changes or common punctuation variants) should also be treated as duplicates, consider centralizing a stronger normalization function for the text content before computing the dedup key.

## Individual Comments

### Comment 1
<location path="astrbot/core/platform/sources/wecom/wecom_adapter.py" line_range="420-421" />
<code_context>
         if msgtype == "text":
             text = msg.get("text", {}).get("content", "").strip()
+            if self._is_duplicate_wechat_kf_text_message(abm.session_id, text):
+                logger.debug(
+                    "忽略 15 秒内重复微信客服文本消息 session_id=%s text=%s",
+                    abm.session_id,
+                    text,
</code_context>
<issue_to_address>
**suggestion:** Avoid hard-coding the TTL value in the log message.

The TTL is defined as `WECHAT_KF_TEXT_CONTENT_DEDUP_TTL_SECONDS = 15`, but the log message hard-codes `15 秒`. If the TTL changes, the log will be inaccurate.

Please format the message using the constant (e.g., `%d` or an f-string) so the log always reflects the configured TTL.

Suggested implementation:

```python
            if self._is_duplicate_wechat_kf_text_message(abm.session_id, text):
                logger.debug(
                    "忽略 %d 秒内重复微信客服文本消息 session_id=%s text=%s",
                    WECHAT_KF_TEXT_CONTENT_DEDUP_TTL_SECONDS,
                    abm.session_id,
                    text,
                )

```

If `WECHAT_KF_TEXT_CONTENT_DEDUP_TTL_SECONDS` is not defined in this module, make sure it is either:
1. Imported at the top of `wecom_adapter.py` from the module where it is defined, or
2. Defined in this module if that is where it logically belongs.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +420 to +421
logger.debug(
"忽略 15 秒内重复微信客服文本消息 session_id=%s text=%s",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Avoid hard-coding the TTL value in the log message.

The TTL is defined as WECHAT_KF_TEXT_CONTENT_DEDUP_TTL_SECONDS = 15, but the log message hard-codes 15 秒. If the TTL changes, the log will be inaccurate.

Please format the message using the constant (e.g., %d or an f-string) so the log always reflects the configured TTL.

Suggested implementation:

            if self._is_duplicate_wechat_kf_text_message(abm.session_id, text):
                logger.debug(
                    "忽略 %d 秒内重复微信客服文本消息 session_id=%s text=%s",
                    WECHAT_KF_TEXT_CONTENT_DEDUP_TTL_SECONDS,
                    abm.session_id,
                    text,
                )

If WECHAT_KF_TEXT_CONTENT_DEDUP_TTL_SECONDS is not defined in this module, make sure it is either:

  1. Imported at the top of wecom_adapter.py from the module where it is defined, or
  2. Defined in this module if that is where it logically belongs.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a deduplication mechanism for WeChat KF text messages using a 15-second TTL to prevent redundant processing. Feedback includes suggestions to use a tuple for the deduplication key to avoid collisions, updating the type hint accordingly, and addressing potential performance issues in the cleanup logic which currently iterates over the entire cache for every message. Additionally, the reviewer requested unit tests to verify the new functionality.


self.server = WecomServer(self._event_queue, self.config)
self.agent_id: str | None = None
self._wechat_kf_seen_text_messages: dict[str, float] = {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Update the type hint to reflect the use of a tuple key for deduplication, as suggested in the logic below.

Suggested change
self._wechat_kf_seen_text_messages: dict[str, float] = {}
self._wechat_kf_seen_text_messages: dict[tuple[str, str], float] = {}


self.server.callback = callback

def _is_duplicate_wechat_kf_text_message(self, session_id: str, text: str) -> bool:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

New functionality should be accompanied by corresponding unit tests. Please add tests to verify the deduplication logic, including TTL expiration and session-based isolation.

References
  1. New functionality should be accompanied by corresponding unit tests.

Comment on lines +223 to +229
expired_keys = [
key
for key, expires_at in self._wechat_kf_seen_text_messages.items()
if expires_at <= now
]
for key in expired_keys:
self._wechat_kf_seen_text_messages.pop(key, None)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current cleanup logic iterates through the entire _wechat_kf_seen_text_messages dictionary on every incoming text message. This is an O(N) operation that runs on the main event loop. While the 15-second TTL likely keeps the dictionary small, a high volume of unique messages could lead to performance degradation and block the event loop. Consider cleaning up only periodically or limiting the maximum size of the deduplication cache.

for key in expired_keys:
self._wechat_kf_seen_text_messages.pop(key, None)

dedup_key = f"{session_id}:{normalized_text}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using string concatenation with a colon as a key can lead to collisions if the session_id (which is the external_userid) contains a colon. Using a tuple (session_id, normalized_text) is a safer and more idiomatic way to create a composite key in Python.

Suggested change
dedup_key = f"{session_id}:{normalized_text}"
dedup_key = (session_id, normalized_text)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:platform The bug / feature is about IM platform adapter, such as QQ, Lark, Telegram, WebChat and so on. size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant