fix(bridge): prevent process leak via PID file singleton guard by syzsunshine219 · Pull Request #1765 · MemTensor/MemOS

syzsunshine219 · 2026-05-19T13:06:20Z

Summary

Bridge processes accumulate indefinitely: each hermes chat session spawns a new bridge.cts, and the previous one stays alive as a "daemon" but is never cleaned up. On long-running servers this results in 10+ zombie bridge pairs.
Introduces a PID file (daemon/bridge.pid) as a lightweight singleton lock. On startup, the new bridge reads the PID file, kills the stale process (SIGTERM → 5s → SIGKILL), writes its own PID, and removes it on all exit paths.
--no-viewer (headless) bridges skip the kill — they coexist with the daemon that owns the viewer port.
The existing install.sh kill logic is preserved as a complementary deployment-time fallback.

Test plan

Start bridge daemon → verify daemon/bridge.pid is created with correct PID
Start a second bridge → verify the first is killed and PID file updated
SIGTERM the running bridge → verify PID file is removed
Run hermes chat twice in sequence → verify only 1 bridge pair remains (no accumulation)
Run --no-viewer bridge alongside daemon → verify both coexist (no kill)

Each `hermes chat` session spawns a new bridge.cts process. When the chat session ends (stdin closes), the bridge transitions to a "staying alive as daemon" state to keep the Memory Viewer accessible. However, the next chat session spawns yet another bridge without killing the previous one, causing unbounded process accumulation (observed: 10+ zombie bridge pairs on long-running servers). Root cause: bridge.cts has no mechanism to detect or clean up a previously running instance before starting. Fix: introduce a PID file (`~/<agent>/memos-plugin/daemon/bridge.pid`) as a lightweight singleton lock: - On startup (unless --no-viewer), read the PID file; if the recorded process is still alive, send SIGTERM and wait up to 5s before SIGKILL. - Write own PID to the file after acquiring the slot. - Remove the PID file on all exit paths (SIGTERM/SIGINT handler, daemon shutdown, headless exit, keepalive viewer-closed check). - --no-viewer (headless) bridges skip the kill — they don't need the port and coexist with the daemon that owns it. The existing install.sh kill logic is preserved as a deployment-time fallback; the two mechanisms are complementary. Co-authored-by: Cursor <cursoragent@cursor.com>

chiefmojo · 2026-05-21T20:08:33Z

Large-DB crash loop: PID singleton kills old bridge before DB lock releases

We reverted a 655MB MemOS DB from v2.0.0-beta.1 to v2.0.5 and hit an infinite bridge restart cycle. The PID singleton guard kills the old bridge, but the new bridge then opens the same SQLite DB and hangs in WAL recovery — long enough to exceed the gateway's 30s timeout. The gateway spawns yet another bridge, which kills the current one, and the cycle repeats indefinitely.

On a smaller 43MB DB the same code works fine — WAL recovery completes before the next bridge spawns. The threshold where it breaks seems to be somewhere between those two, likely dependent on DB size and I/O speed.

Root cause: killExistingBridge() sends SIGTERM but doesn't wait for the old process to fully exit and release file locks. A waitpid() or polling fuser on the DB file before proceeding would prevent the race.

Cross-ref: #1452 describes a related class of startup-hang → restart-loop issues.

syzsunshine219 merged commit 2908e87 into MemTensor:mem-agent-0514 May 19, 2026

chiefmojo mentioned this pull request May 21, 2026

auto-recall can block gateway startup / first-turn path long enough to fail health checks #1452

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(bridge): prevent process leak via PID file singleton guard#1765

fix(bridge): prevent process leak via PID file singleton guard#1765
syzsunshine219 merged 1 commit into
MemTensor:mem-agent-0514from
syzsunshine219:fix/bridge-process-leak-pid-singleton

syzsunshine219 commented May 19, 2026 •

edited

Loading

Uh oh!

chiefmojo commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

syzsunshine219 commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

chiefmojo commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

syzsunshine219 commented May 19, 2026 •

edited

Loading