Skip to content

Stale daemon state file with reused PID causes update install to signal wrong process and double-restart daemon #66

@Rinse12

Description

@Rinse12

Bug

isPidAlive() in src/common-utils/daemon-state.ts only checks process.kill(pid, 0). A PID being alive does not prove the process is the bitsocial daemon that wrote the state file (classic stale-pidfile / PID-reuse hazard).

Observed failure

A daemon previously ran inside a Docker container that mounted the host's bitsocial data dir. Inside the container's PID namespace it was PID 8, so it wrote .daemon_states/8-daemon.state into the shared dir. The container went away without graceful shutdown, leaving the file behind. On the host, PID 8 is a kernel thread — so the stale state passed the liveness check forever.

bitsocial update install then:

  1. Sent SIGINT to PID 8 (an unrelated process — could be any innocent process on a different day)
  2. Restarted 2 daemons with identical args on the same port; the second died with EADDRINUSE
  3. Falsely reported "Daemon started (port 9138)" for the second spawn because waitUntilUsed saw the first daemon's port

Fix plan

  • Record the OS-reported process start time (procStartTime) in the daemon state file at write time
  • When checking aliveness, compare the current start time of that PID against the recorded one — mismatch means the PID was reused and the state is stale
  • For legacy state files without the field, fall back to a command-line check (/proc/<pid>/cmdline / ps -o args= must reference bitsocial); kernel threads have an empty cmdline so they are correctly treated as stale
  • Regression tests reproducing the PID-reuse scenario

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions