Summary
After Hub restart (SIGTERM / crash / manual restart), stale lark-cli event +subscribe processes survive the Hub shutdown and continue holding the feishu event WebSocket. When the new Hub instance starts, it detects the stale subscriber and silently skips loading the feishu channel — leaving the user with no feishu connectivity until they manually kill the stale process and restart Hub again.
This has happened 5+ times across multiple days (May 6–10).
Reproduction
- Hub is running with feishu channel loaded and
lark-cli event +subscribe active
- Hub receives SIGTERM (or is killed)
- Hub's
feishu.stop() is called, but the lark-cli event child process survives (not properly killed)
- New Hub instance starts
- Startup check finds the stale
lark-cli event process still running → feishu channel skipped
Expected behavior
- Hub shutdown should clean up all child
lark-cli processes (especially the event subscriber)
- Channel watchdog should detect and recover from this scenario (e.g., wait for stale process to exit and retry loading)
- OR: Hub should notify the user via an active channel (e.g., wechat) that feishu was skipped
Actual behavior
Feishu channel silently skipped. User only discovers it when they notice no feishu messages are arriving. The log message is buried in stderr with no user-facing notification.
Relevant logs
[2026-05-07 18:54:05] ERROR: 已有 lark-cli event +subscribe 进程在跑 (pid: 3106, 3107)。本次飞书通道启动跳过。
[2026-05-08 11:46:32] ERROR: 已有 lark-cli event +subscribe 进程在跑 (pid: 89457)。本次飞书通道启动跳过。
[2026-05-09 08:59:36] ERROR: 已有 lark-cli event +subscribe 进程在跑 (pid: 4269)。本次飞书通道启动跳过。
[2026-05-10 11:11:05] ERROR: 已有 lark-cli event +subscribe 进程在跑 (pid: 8293)。本次飞书通道启动跳过。
Each time, the fix is: pkill -f 'lark-cli event' + restart Hub.
Environment
- macOS 14.6.1
- Bun 1.3.13
- forge-hub v0.2.0
- lark-cli: installed via npm, auth valid
forge-hub doctor
✓ Bun installed
✓ Hub server runtime
✓ Hub client runtime
✓ LaunchAgent plist
✓ MCP registered
✓ ffmpeg available
✓ lark-cli available
✓ approval_channels configured: [feishu, wechat]
✓ Hub server running (v0.2.0)
Suggested fix
- In
feishu.ts stop(): ensure lark-cli event child process is killed (not just the parent)
- On startup: if stale subscriber detected, add a retry with backoff (wait for it to exit) instead of skipping permanently
- Channel watchdog: when channel is skipped (not degraded), notify user via an active channel
Summary
After Hub restart (SIGTERM / crash / manual restart), stale
lark-cli event +subscribeprocesses survive the Hub shutdown and continue holding the feishu event WebSocket. When the new Hub instance starts, it detects the stale subscriber and silently skips loading the feishu channel — leaving the user with no feishu connectivity until they manually kill the stale process and restart Hub again.This has happened 5+ times across multiple days (May 6–10).
Reproduction
lark-cli event +subscribeactivefeishu.stop()is called, but thelark-cli eventchild process survives (not properly killed)lark-cli eventprocess still running → feishu channel skippedExpected behavior
lark-cliprocesses (especially the event subscriber)Actual behavior
Feishu channel silently skipped. User only discovers it when they notice no feishu messages are arriving. The log message is buried in stderr with no user-facing notification.
Relevant logs
Each time, the fix is:
pkill -f 'lark-cli event'+ restart Hub.Environment
forge-hub doctorSuggested fix
feishu.ts stop(): ensurelark-cli eventchild process is killed (not just the parent)