Skip to content

Stop SHUTDOWN_MODE middleware from killing the polling loop#23

Merged
AlekShu merged 1 commit into
masterfrom
goose-cf08
May 26, 2026
Merged

Stop SHUTDOWN_MODE middleware from killing the polling loop#23
AlekShu merged 1 commit into
masterfrom
goose-cf08

Conversation

@AlekShu
Copy link
Copy Markdown
Collaborator

@AlekShu AlekShu commented May 26, 2026

Problem

Production was silent: /help never got the farewell reply even with SHUTDOWN_MODE=true and the worker HEALTHY at 3% CPU.

Root cause

node_modules/telegraf/telegraf.js fetchUpdates():

.then((updates) => ... ? this.handleUpdates(updates) ... : [])
.catch((err) => {
  console.error('Failed to process updates.', err)
  this.polling.started = false   // <- permanent kill
  this.polling.offset = 0
  this.polling.stopCallback && ...
})

Any unhandled throw from a middleware flips polling.started = false and the next fetchUpdates() returns on the first line. The bot never calls getUpdates again for the rest of the process lifetime.

shutdownMode was throwing because ctx.reply() asserts ctx.chat and throws on updates that don't have one (my_chat_member, chat_join_request, …). The deploy log shows the kill happening in the very first poll batch after boot:

Bot GooseInvestAlertBot is up and running
SHUTDOWN_MODE is on; skipping cron jobs and monitoring loops.
Error: Telegraf: "reply" isn't available for "undefined::"
    at shutdownMode (/app/dist/middlewares/shutdownMode.js:26:15)
Failed to process updates.

One chatless update in the startup batch took every bot in the process off the air. After the farewell went out, blocked-by-user updates kept arriving exactly because users were closing the bot.

Fix

Two narrow guards in src/middlewares/shutdownMode.ts:

  • if (!ctx.chat) return — chatless updates are no-ops, not throws
  • try/catch around ctx.reply — per-send failures (blocked, rate-limit) are logged and swallowed

The polling loop now survives both.

Test plan

  • npx jest — 71 tests passing, two new cases pinning the failure modes:
    • skips chatless updates instead of throwing on ctx.reply
    • swallows reply failures (e.g. user blocked the bot)
  • npm run lint
  • npm run build
  • After deploy: production /help replies with the farewell within seconds

In production we observed that /help (and every other message) silently
stopped getting any reply once SHUTDOWN_MODE was on. Root cause is in
node_modules/telegraf/telegraf.js fetchUpdates: when handleUpdates
rejects, the .catch branch flips polling.started = false and never
recovers. Any throw escaping a middleware kills long-polling for the
rest of the process lifetime.

shutdownMode was the culprit. ctx.reply asserts ctx.chat and throws on
updates that have no chat (my_chat_member, chat_join_request, …) —
exactly the kind of updates that come in en masse after the farewell
goes out and users start blocking the bot. One such update in the
startup batch was enough to take all four bots off the air.

Two narrow guards:
  * skip the update entirely when ctx.chat is missing
  * try/catch around ctx.reply so per-send failures (blocked-by-user,
    rate-limit, etc.) do not escape either
Both leave the polling loop intact so /help keeps reaching users.
@AlekShu AlekShu merged commit 59b0b6d into master May 26, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant