fix(openclaw): disable channels at scaffold — v2026.4.25 sidecar hang#413
Merged
Conversation
OpenClaw v2026.4.25's startGatewaySidecars() hangs forever when channel plugins (telegram/discord/slack) are loaded with enabled=true but no configured accounts. We were shipping all three enabled at scaffold for a hot-reload path that no longer applies post-flat-fee. Symptom: container logs `[gateway] starting channels and sidecars...` but never completes. HTTP request handler awaits getReadiness() which gates on onSidecarsReady() — never fires → every WS upgrade and HTTP request hangs until the client times out. Each channel is re-enabled per-provider via channel_link_service when the user actually pairs one. Hot-reload works fine when starting from disabled — the original "ship them all enabled to avoid full-gateway- restart cost" reasoning was a tier-upgrade optimization that's dead post-flat-fee anyway. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
prez2307
added a commit
that referenced
this pull request
Apr 29, 2026
v2026.4.25-slim wedges every container start: gateway main thread enters
uninterruptible NFS RPC wait (rpc_wait_bit_killable) on
~/.openclaw/tasks/runs.sqlite via OpenClaw's loopback-NFS layer
(127.0.0.1:21005). Matches upstream issue #73517 ("Gateway task registry
maintenance can hot-loop on stale runs.sqlite"), reproduced against
2026.4.25 (aa36ee6). v2026.4.26 partially fixes the WAL growth side
(#72774) but introduces an unfixed acpx EPERM regression on remote FS
(#73333), so we can't move forward — only back.
2026.4.22 fat predates #73517, has CODEX_HOME (added 4.7) so ChatGPT
OAuth still works, and bundles all plugin runtime deps in-image so
first boot doesn't pay the 90s slim install penalty. We previously
ran on 4.22 fat in PR #406 without this hang.
Schema-compliance changes (zod-schema.agent-defaults.ts at v2026.4.22
requires these three fields, no .optional()):
- agents.defaults.embeddedHarness: {} (line 42)
- agents.defaults.contextLimits: {} (line 115)
- agents.defaults.heartbeat: {} (line 251)
Also reverts the channel-disable defensive patch from #413: the no-account
enabled:true channel-plugin behavior was a v4.25 sidecar bug, not a 4.22
issue. Channels are back to enabled:true so first-pair stays a fast
hot-reload instead of a 6-min full gateway restart on Fargate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
prez2307
added a commit
that referenced
this pull request
Apr 29, 2026
…ang (#415) v2026.4.25-slim wedges every container start: gateway main thread enters uninterruptible NFS RPC wait (rpc_wait_bit_killable) on ~/.openclaw/tasks/runs.sqlite via OpenClaw's loopback-NFS layer (127.0.0.1:21005). Matches upstream issue #73517 ("Gateway task registry maintenance can hot-loop on stale runs.sqlite"), reproduced against 2026.4.25 (aa36ee6). v2026.4.26 partially fixes the WAL growth side (#72774) but introduces an unfixed acpx EPERM regression on remote FS (#73333), so we can't move forward — only back. 2026.4.22 fat predates #73517, has CODEX_HOME (added 4.7) so ChatGPT OAuth still works, and bundles all plugin runtime deps in-image so first boot doesn't pay the 90s slim install penalty. We previously ran on 4.22 fat in PR #406 without this hang. Schema-compliance changes (zod-schema.agent-defaults.ts at v2026.4.22 requires these three fields, no .optional()): - agents.defaults.embeddedHarness: {} (line 42) - agents.defaults.contextLimits: {} (line 115) - agents.defaults.heartbeat: {} (line 251) Also reverts the channel-disable defensive patch from #413: the no-account enabled:true channel-plugin behavior was a v4.25 sidecar bug, not a 4.22 issue. Channels are back to enabled:true so first-pair stays a fast hot-reload instead of a 6-min full gateway restart on Fargate. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
OpenClaw v2026.4.25's
startGatewaySidecars()hangs forever when channel plugins (telegram/discord/slack) are loaded withenabled: truebut no configured accounts.Symptom
Container logs:
…but never completes (no "sidecars ready" line for the entire 30+ minute container lifetime). HTTP request handler awaits
getReadiness()which gates ononSidecarsReady()— that callback fires only afterstartGatewaySidecars()resolves. So every WebSocket upgrade and HTTP request hangs.User-visible: backend logs
RPC health failed: timed out during opening handshake, frontend shows "Starting your container" forever.Root cause walk
getReadiness()returns{ready: false}untilstartupSidecarsReady = true. That flag is set inonSidecarsReadycallback, which fires only afterstartGatewaySidecars()returns. With no-account channels enabled in 4.25, that promise never resolves.We were shipping channels
enabled: truefor a hot-reload path tied to the old tier-upgrade flow. Post-flat-fee that doesn't apply — the user pairs a channel viachannel_link_servicewhich patches the config to flip the specific provider on. Starting from disabled and toggling per-provider is the correct flow.Test plan
pytest tests/unit/containers/test_config_provider_routing.py— 8 passrunningin DDBNote
If after this the sidecar still hangs, the next suspects are
phone-controlortalk-voice(new auto-loaded plugins in 4.25). Both can be disabled viaplugins.denyin a follow-up. Did not include those here so we can isolate which plugin was actually responsible.🤖 Generated with Claude Code