Session Safety Net: systemd reaper for orphaned Claude Code sessions #652
Replies: 1 comment
-
|
Hey @ganexxa-quantum, thanks for raising this, and sorry it sat for a while. We're changing how LifeOS ships. Instead of cloning a full That's aimed right at what you hit here. The old "one directory, one layout, hope it matches your setup" approach is exactly what broke for so many people, and the new model should handle it far better because your AI does the integration per machine instead of us guessing. So we're closing this in prep for that release. If it still bites you once the skill-based version is out, reopen or file a fresh one and we'll jump on it. Appreciate you taking the time. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
The Problem
PAI Cloud VMs running multiple PAI instances suffer from chronic memory exhaustion caused by Claude Code sessions accumulating without cleanup. On our 4GB RAM + 2GB swap VM with 3 PAI users, we hit 99.96% swap usage from 7+ orphaned sessions consuming 2.2GB.
How sessions become orphaned
claude/exitor closing the terminal tab (X)ptyHostkeeps the terminal alive (enablePersistentSessions: true)Why it's not the user's fault
Neither Claude Code nor PAI provides any built-in protection:
claude ps)Claude Code also has confirmed memory leaks in idle sessions — a single idle session can grow to multi-GB over hours:
PAI upstream has related issues too:
Our Solution: PAI Session Safety Net
A pure-bash systemd timer that runs every 30 minutes. No bun/node dependency — works even under severe memory pressure.
Architecture
Files
/etc/pai-session-reaper.conf/usr/local/bin/pai-session-reaper.sh/etc/systemd/system/pai-session-reaper.service/etc/systemd/system/pai-session-reaper.timerKey Design Decisions
Process identification:
ps -eo user,pid,etimes,rss,comm --no-headers | awk '$5 == "claude"'— matches the executable name cleanly, no false positives from bun/node/hooks.Self-protection: When run manually from inside a Claude session, the script walks ancestor PIDs from
$$up to PID 1 and skips any Claude PID in its ancestry. When run from systemd timer (PPID=1), no Claude PID matches — all eligible sessions are reaped.Graceful kill: SIGTERM first (30s grace period so PAI hooks can fire for session capture), then SIGKILL if still alive.
Memory watchdog:
/proc/meminfodirectly — no forks needed under memory pressureStale bun reporter: Finds bun processes older than 48h that aren't systemd services. Log-only, never auto-kills (too risky — could be Slack bots or schedulers).
Dry-run mode:
DRY_RUN=truein config logs everything without killing. Great for testing.Configuration
Results
First run freed ~470MB and dropped swap from 68% to 57%. The timer has been running reliably since deployment.
Gotcha: bash arithmetic under
set -e((var++))returns exit code 1 when the pre-increment value is 0 (bash treats 0 as falsy). Underset -e, this kills the script. Usevar=$((var + 1))instead. Same issue with[[ condition ]] && action— the short-circuit returns exit code 1 when the condition is false. Useif/thenblocks instead.What's Not Built Yet
claudesessions when a limit is reached (would need a shell wrapper or hook)For Other PAI Users
If you're running PAI on a small VM with multiple users, you're probably hitting this too. The community workaround is
ps | grep claude | kill, but a systemd timer makes it automatic and adds the memory watchdog layer.Happy to share the full script if anyone wants to adapt it.
Beta Was this translation helpful? Give feedback.
All reactions