Bug
isPidAlive() in src/common-utils/daemon-state.ts only checks process.kill(pid, 0). A PID being alive does not prove the process is the bitsocial daemon that wrote the state file (classic stale-pidfile / PID-reuse hazard).
Observed failure
A daemon previously ran inside a Docker container that mounted the host's bitsocial data dir. Inside the container's PID namespace it was PID 8, so it wrote .daemon_states/8-daemon.state into the shared dir. The container went away without graceful shutdown, leaving the file behind. On the host, PID 8 is a kernel thread — so the stale state passed the liveness check forever.
bitsocial update install then:
- Sent SIGINT to PID 8 (an unrelated process — could be any innocent process on a different day)
- Restarted 2 daemons with identical args on the same port; the second died with EADDRINUSE
- Falsely reported "Daemon started (port 9138)" for the second spawn because
waitUntilUsed saw the first daemon's port
Fix plan
- Record the OS-reported process start time (
procStartTime) in the daemon state file at write time
- When checking aliveness, compare the current start time of that PID against the recorded one — mismatch means the PID was reused and the state is stale
- For legacy state files without the field, fall back to a command-line check (
/proc/<pid>/cmdline / ps -o args= must reference bitsocial); kernel threads have an empty cmdline so they are correctly treated as stale
- Regression tests reproducing the PID-reuse scenario
Bug
isPidAlive()insrc/common-utils/daemon-state.tsonly checksprocess.kill(pid, 0). A PID being alive does not prove the process is the bitsocial daemon that wrote the state file (classic stale-pidfile / PID-reuse hazard).Observed failure
A daemon previously ran inside a Docker container that mounted the host's bitsocial data dir. Inside the container's PID namespace it was PID 8, so it wrote
.daemon_states/8-daemon.stateinto the shared dir. The container went away without graceful shutdown, leaving the file behind. On the host, PID 8 is a kernel thread — so the stale state passed the liveness check forever.bitsocial update installthen:waitUntilUsedsaw the first daemon's portFix plan
procStartTime) in the daemon state file at write time/proc/<pid>/cmdline/ps -o args=must reference bitsocial); kernel threads have an empty cmdline so they are correctly treated as stale