munged is unable to restart if previous daemon did not shutdown gracefully #13

GoogleCodeExporter · 2015-05-15T22:16:37Z

What steps will reproduce the problem?

killall -9 munged; munged

What is the expected output? What do you see instead?

After the old munged process has been killed, a new munged process should be able to run. Instead, the new munged process exits with the error Found existing socket "/var/run/munge/munge.socket.2".

What version of the software are you using? On what operating system?

munge-0.5.10

Please provide any additional information below.

This error occurs because the old process did not shutdown gracefully, and as such, did not unlink its unix domain socket. The new process finds the existing socket and exits since that socket could be in use by another munged process currently running.

There are several ways in which munged can be prevented from shutting down gracefully: a node could kernel panic, or be power-cycled via powerman, the BMC, or other means. If munged is invoked with the --force command-line option (or if this option is specified in the DAEMON_ARGS in /etc/{default,sysconfig}/munge), the old socket will be unlinked and a new socket will be created. But a new munged process should be capable of recovering from this situation without being "forced", while maintaining the ability to detect whether an existing socket is currently in use.

Reported by Don Albert at Bull.com on 2012-03-16.

Original issue reported on code.google.com by chris.m.dunlap on 18 Mar 2012 at 4:48

The text was updated successfully, but these errors were encountered:

GoogleCodeExporter · 2015-05-15T22:16:37Z

This issue was closed by 6988416.

Original comment by chris.m.dunlap on 5 Apr 2012 at 1:19

Changed state: Fixed

GoogleCodeExporter · 2015-05-15T22:16:37Z

This fix breaks Debian GNU/kFreeBSD kfreebsd 7.3-1-amd64:

munged: Error: Failed to lock "/var/run/munge/munge.socket.2.lock": Operation not supported

Original comment by chris.m.dunlap on 5 Jul 2012 at 8:07

Changed state: Started

GoogleCodeExporter · 2015-05-15T22:16:37Z

Upon further analysis, Debian GNU/kFreeBSD kfreebsd 7.3-1-amd64 does not appear to be broken after all. The error above only occurs when the lockfile resides in an nfs mount.

Original comment by chris.m.dunlap on 5 Jul 2012 at 10:34

Changed state: Fixed

GoogleCodeExporter · 2015-05-15T22:16:37Z

This issue was updated by 7da7a1b.

Allow the --force command-line option to override the error generated when failing to obtain the advisory lock for the domain socket. On some systems, these locks are not supported on NFS mounts.

Original comment by chris.m.dunlap on 9 May 2013 at 7:23

GoogleCodeExporter added Priority-Medium labels May 15, 2015

GoogleCodeExporter closed this as completed May 15, 2015

dun added this to the 0.5.11 milestone Jun 4, 2015

dun added bug and removed auto-migrated labels Jun 5, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

munged is unable to restart if previous daemon did not shutdown gracefully #13

munged is unable to restart if previous daemon did not shutdown gracefully #13

GoogleCodeExporter commented May 15, 2015

GoogleCodeExporter commented May 15, 2015

GoogleCodeExporter commented May 15, 2015

GoogleCodeExporter commented May 15, 2015

GoogleCodeExporter commented May 15, 2015

munged is unable to restart if previous daemon did not shutdown gracefully #13

munged is unable to restart if previous daemon did not shutdown gracefully #13

Comments

GoogleCodeExporter commented May 15, 2015

GoogleCodeExporter commented May 15, 2015

GoogleCodeExporter commented May 15, 2015

GoogleCodeExporter commented May 15, 2015

GoogleCodeExporter commented May 15, 2015