Fix: lrmd: cancel currently pending STONITH op if stonithd connection is lost #730
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This avoids things like nodes getting killed due to hanging stop operations if a start operation caused stonithd to crash.
I saw and debugged this issue on pacemaker 1.1.10+git20130802-1ubuntu2.3 (Ubuntu Trusty). Although the issue manifests itself there due to another bug that is already fixed, it's a bug in its own right, hence this PR. I don't know of any way to trigger this behavior on the current version of pacemaker short of manually killing stonithd in the middle of an op, but it could theoretically happen if stonithd dies for whatever reason.
This is the sequence of events that triggers the bug, conditional on the (long fixed) bug mentioned in PR #334:
(even though a stonith resource stop is basically a no-op!)