Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

munged is unable to restart if previous daemon did not shutdown gracefully #13

Closed
GoogleCodeExporter opened this issue May 15, 2015 · 4 comments
Labels
Milestone

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?

killall -9 munged; munged

What is the expected output? What do you see instead?

After the old munged process has been killed, a new munged process should be able to run. Instead, the new munged process exits with the error Found existing socket "/var/run/munge/munge.socket.2".

What version of the software are you using? On what operating system?

munge-0.5.10

Please provide any additional information below.

This error occurs because the old process did not shutdown gracefully, and as such, did not unlink its unix domain socket. The new process finds the existing socket and exits since that socket could be in use by another munged process currently running.

There are several ways in which munged can be prevented from shutting down gracefully: a node could kernel panic, or be power-cycled via powerman, the BMC, or other means. If munged is invoked with the --force command-line option (or if this option is specified in the DAEMON_ARGS in /etc/{default,sysconfig}/munge), the old socket will be unlinked and a new socket will be created. But a new munged process should be capable of recovering from this situation without being "forced", while maintaining the ability to detect whether an existing socket is currently in use.

Reported by Don Albert at Bull.com on 2012-03-16.

Original issue reported on code.google.com by chris.m.dunlap on 18 Mar 2012 at 4:48

@GoogleCodeExporter
Copy link
Author

This issue was closed by 6988416.

Original comment by chris.m.dunlap on 5 Apr 2012 at 1:19

  • Changed state: Fixed

@GoogleCodeExporter
Copy link
Author

This fix breaks Debian GNU/kFreeBSD kfreebsd 7.3-1-amd64:

munged: Error: Failed to lock "/var/run/munge/munge.socket.2.lock": Operation not supported

Original comment by chris.m.dunlap on 5 Jul 2012 at 8:07

  • Changed state: Started

@GoogleCodeExporter
Copy link
Author

Upon further analysis, Debian GNU/kFreeBSD kfreebsd 7.3-1-amd64 does not appear to be broken after all. The error above only occurs when the lockfile resides in an nfs mount.

Original comment by chris.m.dunlap on 5 Jul 2012 at 10:34

  • Changed state: Fixed

@GoogleCodeExporter
Copy link
Author

This issue was updated by 7da7a1b.

Allow the --force command-line option to override the error generated when failing to obtain the advisory lock for the domain socket. On some systems, these locks are not supported on NFS mounts.

Original comment by chris.m.dunlap on 9 May 2013 at 7:23

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants