fix(session): close zombie active session when tmux backing session i…#98
Conversation
|
@Lotu527 你好 👋 PR #98 已通过双评审(reviewer + Codex merge-gate ✅)。评审中 gate 发现并修复了 2 个阻断项,修复以 2 个 commit 叠在你的 ① restore 探测假阴性会误关活会话 → 升级 tri-state(核心) ② tmux probe 用 shell 测试:新增 如需逐行 diff,我可以把 review 分支推上来给你对比。麻烦评估,无异议我们就按这套合并。🙏 |
…s missing on restore When restoreActiveSessions() iterates persistent-backend sessions and finds the backing tmux/zellij/herdr session is gone, it previously just `continue`d, leaving the DaemonSession registered as active forever. Any incoming message for that chat would match the orphaned entry and be silently dropped (no worker to handle it, no error surfaced to the user). Fix: call closeSession(sessionId) before continuing so the session is removed from both the runtime activeSessions Map and the sessionStore (status → closed), and a warn log is emitted to make the cleanup visible. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PR deepcoldy#98 把 restore 阶段「backing session 不存在」从非破坏的 continue 升级成 破坏性 closeSession,但三个后端的 hasSession() 把「探测命令失败/超时」和 「确实不存在」都折叠成同一个 false。daemon 重启时 herdr server 慢启动 / zellij·tmux 探测瞬时异常会让一个还活着的会话被永久关闭(删 active 索引 + store 标 closed),活 pane 泄漏、下条消息走 auto-create 丢上下文,且 store 已 closed 不再懒恢复。restore 循环遍历全部 active session,一次瞬时失败可 成片误关。 改动: - 新增 SessionProbe = 'exists' | 'missing' | 'unknown'(backend/types.ts)。 - 三后端补 probeSession():仅「命令成功且确认不存在」=missing;失败/超时/ 解析异常=unknown。hasSession() 改为 probeSession()==='exists' 的薄封装, 对所有现有 boolean 调用方行为不变(unknown/missing→false,exists→true)。 - restoreActiveSessions:missing→closeSession(真僵尸);unknown→warn 保留 active 记录走懒恢复(等同旧 continue,恢复窗口不再被提前关死);exists→ auto-fork 重连。 - ensureTerminalWorkerPort 非破坏读路径保持原语义:仅 exists 才唤醒 worker。 测试: - herdr-backend:probeSession 的 exists / missing / present-but-not-running→ missing / list 失败超时→unknown 四态,并验证 hasSession 在 unknown 仍为 false。 - restore-zombie-close(新):missing→closeSession+Map 移除+store closed+不 fork; unknown→不 close+Map 保留+不 fork;exists→fork+不 close。
Codex 复 gate 抓到:原 tmux probeSession 用 shell 字符串 execSync,当 tmux
不在 PATH / 不可执行时,shell 以 clean 退出码 127(command not found)/126
(not executable) 返回,被 `typeof e.status==='number' && !e.signal` 误判成
missing → restore 走破坏性 closeSession,正是 tri-state 要避免的「探测失败驱动
永久 close」。daemon 重启时 backend=tmux、历史 session 仍在 tmux server 里、
但新 daemon 运行环境暂时找不到 tmux 即可触发。
改用 execFileSync('tmux', ['has-session','-t',name], …) 直接执行二进制:
binary 缺失=ENOENT、不可执行=EACCES(均无数字 status)→ unknown;session 真
不存在=clean exit 1 → missing;超时=signal → unknown;存在=exit 0 → exists。
判别逻辑不变(Node 复现确认 execFileSync 不会混入 shell 的 126/127)。
补 test/tmux-probe.test.ts 直测真实 TmuxBackend.probeSession 的 command-failure
分类:absent→missing;command-not-found(ENOENT)/not-executable(EACCES)→unknown;
timeout(signal)→unknown;exit0→exists;并验证 hasSession 在 unknown 仍为 false。
(补上 restore-zombie-close 因 mock probeSession 而未覆盖的 tmux 真实分类缺口。)
f73d8f0 to
d7a88fc
Compare
…s missing on restore
When restoreActiveSessions() iterates persistent-backend sessions and finds the backing tmux/zellij/herdr session is gone, it previously just
continued, leaving the DaemonSession registered as active forever. Any incoming message for that chat would match the orphaned entry and be silently dropped (no worker to handle it, no error surfaced to the user).Fix: call closeSession(sessionId) before continuing so the session is removed from both the runtime activeSessions Map and the sessionStore (status → closed), and a warn log is emitted to make the cleanup visible.