fix: keep DingTalk stream reconnecting#8610
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces an exponential backoff reconnection strategy for the DingTalk adapter, replacing the previous fixed retry limit, and adds corresponding unit tests. The review feedback highlights a potential issue where using time.sleep(delay) can block the graceful shutdown of the application for up to 5 minutes, and re-creating self._shutdown_event in each loop iteration may cause race conditions. A solution using a thread-safe threading.Event to wait during the delay and monkey-patching self.terminate is suggested to allow immediate exit upon shutdown.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| def handle_retry(error_msg: str, run_seconds: float) -> None: | ||
| nonlocal retry_count | ||
| logger.error(error_msg) | ||
| if run_seconds >= DINGTALK_RECONNECT_STABLE_SECONDS: | ||
| retry_count = 0 | ||
| retry_count += 1 | ||
| if retry_count < MAX_RETRIES: | ||
| logger.info(f"钉钉适配器尝试重连 ({retry_count}/{MAX_RETRIES})...") | ||
| time.sleep(RETRY_INTERVAL) | ||
| return True | ||
| logger.error("钉钉适配器重连失败,已达最大重试次数") | ||
| return False | ||
|
|
||
| while retry_count < MAX_RETRIES: | ||
| delay = _dingtalk_reconnect_delay(retry_count) | ||
| logger.info( | ||
| f"钉钉适配器将在 {delay} 秒后重连 (第 {retry_count} 次)...", | ||
| ) | ||
| time.sleep(delay) | ||
|
|
||
| while True: | ||
| task = None |
There was a problem hiding this comment.
Using time.sleep(delay) inside the background thread can block the graceful shutdown of the application for up to 300 seconds (5 minutes) if a reconnect delay is active when terminate() is called. Additionally, re-creating self._shutdown_event = threading.Event() in every iteration of the while True loop can lead to race conditions where terminate() sets an old event, causing the new event to wait indefinitely.
We can resolve both issues elegantly by monkey-patching self.terminate to set a thread-safe terminated event, and using terminated.wait(delay) instead of time.sleep(delay). This allows the background thread to wake up and exit immediately upon shutdown.
def handle_retry(error_msg: str, run_seconds: float) -> None:
nonlocal retry_count
logger.error(error_msg)
if run_seconds >= DINGTALK_RECONNECT_STABLE_SECONDS:
retry_count = 0
retry_count += 1
delay = _dingtalk_reconnect_delay(retry_count)
logger.info(
f"钉钉适配器将在 {delay} 秒后重连 (第 {retry_count} 次)...",
)
terminated.wait(delay)
terminated = threading.Event()
original_terminate = self.terminate
async def new_terminate():
terminated.set()
await original_terminate()
self.terminate = new_terminate
while not terminated.is_set():
task = NoneThere was a problem hiding this comment.
Hey - I've left some high level feedback:
- The
handle_retrydocstring still describes a boolean return contract, but the function now returnsNone; updating or removing the docstring will avoid confusion for future readers. - Because the backoff can now reach 300 seconds, consider replacing
time.sleep(delay)with a wait on the shutdown event (e.g.,self._shutdown_event.wait(delay)) so that reconnect sleeps can be interrupted promptly during graceful shutdown.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `handle_retry` docstring still describes a boolean return contract, but the function now returns `None`; updating or removing the docstring will avoid confusion for future readers.
- Because the backoff can now reach 300 seconds, consider replacing `time.sleep(delay)` with a wait on the shutdown event (e.g., `self._shutdown_event.wait(delay)`) so that reconnect sleeps can be interrupted promptly during graceful shutdown.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Summary
Closes #8504
Tests
Summary by Sourcery
Keep the DingTalk stream client reconnecting reliably with exponential backoff and stable-run reset logic after unexpected SDK task exits.
Bug Fixes:
Enhancements:
Tests: