Skip to content

feishu channel blocked by stale lark-cli event processes on Hub restart #26

@AmberCXX

Description

@AmberCXX

Summary

After Hub restart (SIGTERM / crash / manual restart), stale lark-cli event +subscribe processes survive the Hub shutdown and continue holding the feishu event WebSocket. When the new Hub instance starts, it detects the stale subscriber and silently skips loading the feishu channel — leaving the user with no feishu connectivity until they manually kill the stale process and restart Hub again.

This has happened 5+ times across multiple days (May 6–10).

Reproduction

  1. Hub is running with feishu channel loaded and lark-cli event +subscribe active
  2. Hub receives SIGTERM (or is killed)
  3. Hub's feishu.stop() is called, but the lark-cli event child process survives (not properly killed)
  4. New Hub instance starts
  5. Startup check finds the stale lark-cli event process still running → feishu channel skipped

Expected behavior

  • Hub shutdown should clean up all child lark-cli processes (especially the event subscriber)
  • Channel watchdog should detect and recover from this scenario (e.g., wait for stale process to exit and retry loading)
  • OR: Hub should notify the user via an active channel (e.g., wechat) that feishu was skipped

Actual behavior

Feishu channel silently skipped. User only discovers it when they notice no feishu messages are arriving. The log message is buried in stderr with no user-facing notification.

Relevant logs

[2026-05-07 18:54:05] ERROR: 已有 lark-cli event +subscribe 进程在跑 (pid: 3106, 3107)。本次飞书通道启动跳过。
[2026-05-08 11:46:32] ERROR: 已有 lark-cli event +subscribe 进程在跑 (pid: 89457)。本次飞书通道启动跳过。
[2026-05-09 08:59:36] ERROR: 已有 lark-cli event +subscribe 进程在跑 (pid: 4269)。本次飞书通道启动跳过。
[2026-05-10 11:11:05] ERROR: 已有 lark-cli event +subscribe 进程在跑 (pid: 8293)。本次飞书通道启动跳过。

Each time, the fix is: pkill -f 'lark-cli event' + restart Hub.

Environment

  • macOS 14.6.1
  • Bun 1.3.13
  • forge-hub v0.2.0
  • lark-cli: installed via npm, auth valid

forge-hub doctor

✓ Bun installed
✓ Hub server runtime
✓ Hub client runtime
✓ LaunchAgent plist
✓ MCP registered
✓ ffmpeg available
✓ lark-cli available
✓ approval_channels configured: [feishu, wechat]
✓ Hub server running (v0.2.0)

Suggested fix

  1. In feishu.ts stop(): ensure lark-cli event child process is killed (not just the parent)
  2. On startup: if stale subscriber detected, add a retry with backoff (wait for it to exit) instead of skipping permanently
  3. Channel watchdog: when channel is skipped (not degraded), notify user via an active channel

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions