Fix Telegram polling recovery after repeated network errors by Shujakuinkuraudo · Pull Request #7465 · AstrBotDevs/AstrBot

Shujakuinkuraudo · 2026-04-11T19:02:08Z

Summary

rebuild the Telegram Application/Updater/HTTP client after repeated polling NetworkErrors
keep command registration scheduling intact across client rebuilds
add regression tests covering recovery threshold handling and application recreation

Testing

.venv/bin/python -m ruff check astrbot/core/platform/sources/telegram/tg_adapter.py tests/test_telegram_adapter.py tests/fixtures/mocks/telegram.py
.venv/bin/python -m pytest -q tests/test_telegram_adapter.py

Summary by Sourcery

Improve Telegram adapter resilience by rebuilding the application and HTTP client after repeated polling network errors while preserving command scheduling.

Bug Fixes:

Restore Telegram polling after consecutive network failures by recreating the application and HTTP client once a configurable error threshold is reached.
Ensure graceful shutdown of the Telegram application and updater, including command cleanup, when terminating the adapter.

Enhancements:

Extract Telegram application lifecycle management into dedicated helper methods for building, starting, shutting down, and recreating the application.
Track polling error state and recovery thresholds to decide when to trigger application rebuilds without disrupting normal operation.
Refine command registration scheduling so it survives application rebuilds and avoids duplicate scheduler startup.

Tests:

Add unit tests validating that repeated polling errors request an application rebuild once the error threshold is exceeded.
Add unit tests verifying that the Telegram adapter rebuilds the application and restarts polling correctly after repeated network errors.

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

The polling recovery behavior uses hardcoded _polling_recovery_threshold and _polling_failure_window values; consider wiring these to the existing config so they can be tuned without code changes.
_recreate_applicationdoes not check_terminatingbefore rebuilding the application, so a terminate request that sets_polling_recovery_requestedcould race with the polling loop and result in a fresh client being created during shutdown; consider short‑circuiting the rebuild when_terminating` is true.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The polling recovery behavior uses hardcoded `_polling_recovery_threshold` and `_polling_failure_window` values; consider wiring these to the existing config so they can be tuned without code changes.
- _recreate_application` does not check `_terminating` before rebuilding the application, so a terminate request that sets `_polling_recovery_requested` could race with the polling loop and result in a fresh client being created during shutdown; consider short‑circuiting the rebuild when `_terminating` is true.

## Individual Comments

### Comment 1
<location path="astrbot/core/platform/sources/telegram/tg_adapter.py" line_range="84-90" />
<code_context>
-
         self.scheduler = AsyncIOScheduler()
         self._terminating = False
+        self._loop: asyncio.AbstractEventLoop | None = None
+        self._polling_recovery_requested = asyncio.Event()
+        self._consecutive_polling_failures = 0
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Guard against rebuilding the application while termination is in progress

`_recreate_application` doesn’t check `_terminating`. If `_polling_recovery_requested` is set just before or during `terminate()`, the main loop can still call `_recreate_application()`, recreating the app and restarting polling while shutdown is in progress. An early `if self._terminating: return` in `_recreate_application` would avoid this race and unnecessary work.

Suggested implementation:

```python
    def _recreate_application(self) -> None:
        # Avoid recreating the application while termination is in progress.
        # This guards against a race where `_polling_recovery_requested` is set
        # just before or during `terminate()`, which could otherwise restart
        # polling while shutdown is underway.
        if self._terminating:
            return

        self._build_application()

```

If the actual signature or body of `_recreate_application` differs (for example, it is `async def _recreate_application(...)` or has additional logic before `self._build_application()`), adjust the `SEARCH` section to match the existing function header and first body line, and insert the `if self._terminating: return` guard at the very beginning of the function body.
</issue_to_address>

### Comment 2
<location path="tests/test_telegram_adapter.py" line_range="229-236" />
<code_context>
+
+
+@pytest.mark.asyncio
+async def test_telegram_polling_error_requests_rebuild_after_threshold():
+    TelegramPlatformAdapter = _load_telegram_adapter()
+    adapter = TelegramPlatformAdapter(
+        make_platform_config("telegram"),
+        {},
+        asyncio.Queue(),
+    )
+    adapter._loop = asyncio.get_running_loop()
+
+    assert not adapter._polling_recovery_requested.is_set()
+
+    for _ in range(adapter._polling_recovery_threshold):
+        adapter._on_polling_error(Exception("proxy disconnected"))
+
+    await asyncio.sleep(0)
</code_context>
<issue_to_address>
**suggestion (testing):** Use the mocked `NetworkError` type instead of bare `Exception` to keep the test aligned with the implementation.

The recovery logic only handles `telegram.error.NetworkError`, but this test currently passes a plain `Exception` and relies on mocks redefining `NetworkError = Exception`. To avoid this brittle coupling, please construct the error using the mocked `NetworkError` type (e.g. via `module_globals["NetworkError"]` or `create_mock_telegram_modules()["telegram"].error.NetworkError`) so the test clearly depends on `NetworkError` specifically.

```suggestion
    adapter._loop = asyncio.get_running_loop()

    module_globals = TelegramPlatformAdapter.__init__.__globals__
    NetworkError = module_globals["NetworkError"]

    assert not adapter._polling_recovery_requested.is_set()

    for _ in range(adapter._polling_recovery_threshold):
        adapter._on_polling_error(NetworkError("proxy disconnected"))

    await asyncio.sleep(0)
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

astrbot/core/platform/sources/telegram/tg_adapter.py

sourcery-ai · 2026-04-11T19:03:33Z

tests/test_telegram_adapter.py

+    adapter._loop = asyncio.get_running_loop()
+
+    assert not adapter._polling_recovery_requested.is_set()
+
+    for _ in range(adapter._polling_recovery_threshold):
+        adapter._on_polling_error(Exception("proxy disconnected"))
+
+    await asyncio.sleep(0)


suggestion (testing): Use the mocked NetworkError type instead of bare Exception to keep the test aligned with the implementation.

The recovery logic only handles telegram.error.NetworkError, but this test currently passes a plain Exception and relies on mocks redefining NetworkError = Exception. To avoid this brittle coupling, please construct the error using the mocked NetworkError type (e.g. via module_globals["NetworkError"] or create_mock_telegram_modules()["telegram"].error.NetworkError) so the test clearly depends on NetworkError specifically.

Suggested change

adapter._loop = asyncio.get_running_loop()

assert not adapter._polling_recovery_requested.is_set()

for _ in range(adapter._polling_recovery_threshold):

adapter._on_polling_error(Exception("proxy disconnected"))

await asyncio.sleep(0)

adapter._loop = asyncio.get_running_loop()

module_globals = TelegramPlatformAdapter.__init__.__globals__

NetworkError = module_globals["NetworkError"]

assert not adapter._polling_recovery_requested.is_set()

for _ in range(adapter._polling_recovery_threshold):

adapter._on_polling_error(NetworkError("proxy disconnected"))

await asyncio.sleep(0)

gemini-code-assist

Code Review

This pull request implements a polling recovery mechanism for the Telegram adapter to handle repeated network errors by rebuilding the application instance. It introduces logic to track consecutive failures within a time window and refactors the application lifecycle management into dedicated methods. Feedback suggests making the recovery threshold and failure window configurable to allow for environment-specific tuning. Additionally, the shutdown sequence should be adjusted to ensure that final API calls, such as deleting commands, are performed before the application's HTTP session is terminated.

gemini-code-assist · 2026-04-11T19:04:09Z

astrbot/core/platform/sources/telegram/tg_adapter.py

+        self._polling_recovery_threshold = 3
+        self._polling_failure_window = 60.0


Consider making the polling recovery threshold and failure window configurable via the platform configuration, similar to telegram_polling_restart_delay. This allows users to tune the recovery behavior based on their specific network environment and stability requirements.

Suggested change

self._polling_recovery_threshold = 3

self._polling_failure_window = 60.0

self._polling_recovery_threshold = self.config.get("telegram_polling_recovery_threshold", 3)

self._polling_failure_window = float(self.config.get("telegram_polling_failure_window", 60.0))

gemini-code-assist · 2026-04-11T19:04:09Z

astrbot/core/platform/sources/telegram/tg_adapter.py

+    async def _shutdown_application(
+        self,
+        *,
+        delete_commands: bool,
+    ) -> None:
+        updater = self.application.updater
+        if updater is not None:
+            with suppress(Exception):
+                await updater.stop()
+
+        with suppress(Exception):
+            await self.application.stop()
+
+        if delete_commands and self.enable_command_register:
+            with suppress(Exception):
+                await self.client.delete_my_commands()
+
+        shutdown = getattr(self.application, "shutdown", None)
+        if shutdown is not None:
+            with suppress(Exception):
+                await shutdown()


The current shutdown sequence calls self.application.stop() before self.client.delete_my_commands(). In python-telegram-bot v20+, Application.stop() shuts down the bot's internal HTTP session. This will cause subsequent API calls like delete_my_commands() to fail because the underlying connection is closed. It is recommended to perform any final API calls after stopping the updater but before stopping the application entirely.

async def _shutdown_application( self, *, delete_commands: bool, ) -> None: updater = self.application.updater if updater is not None: with suppress(Exception): await updater.stop() if delete_commands and self.enable_command_register: with suppress(Exception): await self.client.delete_my_commands() with suppress(Exception): await self.application.stop() shutdown = getattr(self.application, "shutdown", None) if shutdown is not None: with suppress(Exception): await shutdown()

Shujakuinkuraudo · 2026-04-11T19:24:16Z

Addressed the bot review feedback in fd972922:

skip application rebuild entirely once termination starts
schedule recovery only when the event loop is still open
use the mocked NetworkError type in regression tests
delete Telegram commands before stopping the application during shutdown

Re-ran:

.venv/bin/python -m ruff check astrbot/core/platform/sources/telegram/tg_adapter.py tests/test_telegram_adapter.py tests/fixtures/mocks/telegram.py
.venv/bin/python -m pytest -q tests/test_telegram_adapter.py

Shujakuinkuraudo · 2026-04-11T20:12:56Z

Updated the PR with the additional recovery fix validated in deployment (b21f3773).

This addresses a follow-up failure mode where rebuilding the Telegram application could itself time out during initialize(), leaving the adapter stuck retrying an uninitialized Updater. The adapter now:

tracks whether the current Telegram application finished initialization
recreates a fresh Application/Updater after startup/rebuild exceptions
retries initialization from a clean state instead of reusing a half-initialized updater
includes a regression test covering failed rebuild initialization

Re-ran:

.venv/bin/python -m ruff check astrbot/core/platform/sources/telegram/tg_adapter.py tests/test_telegram_adapter.py tests/fixtures/mocks/telegram.py
.venv/bin/python -m pytest -q tests/test_telegram_adapter.py

Shujakuinkuraudo · 2026-04-11T20:53:47Z

Closing this PR and reopening it as a fresh PR to keep the history clean after squashing the work into a single commit.

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 11, 2026

Shujakuinkuraudo mentioned this pull request Apr 11, 2026

Fix Telegram polling recovery after repeated network errors #7464

Closed

sourcery-ai bot reviewed Apr 11, 2026

View reviewed changes

dosubot bot added the area:platform The bug / feature is about IM platform adapter, such as QQ, Lark, Telegram, WebChat and so on. label Apr 11, 2026

gemini-code-assist bot reviewed Apr 11, 2026

View reviewed changes

Fix Telegram polling recovery after network failures

583ce5c

Shujakuinkuraudo force-pushed the fix/telegram-polling-client-recovery-v2 branch from b21f377 to 583ce5c Compare April 11, 2026 20:17

Shujakuinkuraudo closed this Apr 11, 2026

Shujakuinkuraudo deleted the fix/telegram-polling-client-recovery-v2 branch April 11, 2026 20:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Telegram polling recovery after repeated network errors#7465

Fix Telegram polling recovery after repeated network errors#7465
Shujakuinkuraudo wants to merge 1 commit intoAstrBotDevs:masterfrom
Shujakuinkuraudo:fix/telegram-polling-client-recovery-v2

Shujakuinkuraudo commented Apr 11, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

sourcery-ai bot Apr 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 11, 2026

Uh oh!

gemini-code-assist bot Apr 11, 2026

Uh oh!

Shujakuinkuraudo commented Apr 11, 2026 •

edited

Loading

Uh oh!

Shujakuinkuraudo commented Apr 11, 2026 •

edited

Loading

Uh oh!

Shujakuinkuraudo commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		self._polling_recovery_threshold = 3
		self._polling_failure_window = 60.0

Uh oh!

Conversation

Shujakuinkuraudo commented Apr 11, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Summary by Sourcery

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sourcery-ai bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Shujakuinkuraudo commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Shujakuinkuraudo commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Shujakuinkuraudo commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Shujakuinkuraudo commented Apr 11, 2026 •

edited by sourcery-ai bot

Loading

Shujakuinkuraudo commented Apr 11, 2026 •

edited

Loading

Shujakuinkuraudo commented Apr 11, 2026 •

edited

Loading