New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fixes #1409
Bug fixes #1409
Conversation
5b7ff74
to
590a754
Compare
0b68905 aborted the transition on quorum loss, but quorum can also be acquired without triggering a new transition, if corosync gives quorum without a node joining (e.g. forced via corosync-cmapctl, or perhaps via heuristics). This aborts the transition when quorum is gained, but only after a 5-second delay, if the transition has not been aborted in that time. This avoids an unnecessary abort in the vast majority of cases where an abort is already done, and it allows some time for all nodes to connect when quorum is gained, rather than immediately fencing remaining unseen nodes.
…terval It already did when a resource was not specified. Also update help text to clarify cleanup vs refresh.
* nodes are joining around the same time, so the one that brings us | ||
* to quorum doesn't cause all the remaining ones to be fenced. | ||
*/ | ||
abort_after_delay(INFINITY, tg_restart, "Quorum gained", 5000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a joining peer makes the cluster acquire the quorum from corosync level, meanwhile if its sbd has SBD_DELAY_START
enabled, which is usually longer than 5s and postpones start of pacemaker there, unnecessary startup fencing targeting the peer is always triggered: 5s after the quorum has been acquired before the node gets the chance to join at pacemaker level.
I'm trying to think of some potential solutions such as:
-
Let pacemaker-controld recognize
SBD_DELAY_START
setting and use a value greater than that as the delay parameter of abort_after_delay() here so that abortion of transition waits for the peer to join CPG. But it's not necessarily straight-forward sinceSBD_DELAY_START
can also be just set toyes
, which makes it adapt the value ofmsgwait
in case of disk mode otherwise2 * watchdog_timeout
... -
Postpone start of corosync as well by adding the following dependency in sbd.service:
Before=corosync.service
But it wouldn't work for the case of diskless SBD, since it apparently requires the connection to corosync to report a successful start:
- Ask users to set this for corosync.service:
ExecStartPre=/bin/sleep <time>
The <time>
is corresponding to SBD_DELAY_START
configured in /etc/sysconfig/sbd
It'd be bothering for the users since they would have to pay attention to it and make sure the relevant settings are synchronized in between.
- We probably could let acquiring of "quorum" require the peer to show up at CPG level as well besides at corosync's quorum level?
Or we probably could only specifically address the case where "wait_for_all" is enabled in corosync and let acquiring of "quorum" require all the peers to show up at CPG level?
So far I cannot think of any better ideas. What do you think @kgaillot and @wenningerk ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind that even if we don't abort the transition here, some unrelated event could abort the transition, and we'd still fence the node.
Why are we fencing the node to begin with, if it's in the cluster membership? I wouldn't expect CPG membership to be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind that even if we don't abort the transition here, some unrelated event could abort the transition, and we'd still fence the node.
Indeed. It's just that the situation easily occurs with this predictable transition abortion.
Why are we fencing the node to begin with, if it's in the cluster membership? I wouldn't expect CPG membership to be required.
Well, now AFAICS it's related to whether the uname of the pending node is known yet upon creating a node_state entry for it, and how scheduler considers the status of the node under the situation...
Please take a look if this makes sense when you get a chance: #3031. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes much more sense, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make things even more complicated to consider in case we're running corosync in "2-node" (probably same as "wait_for_all" for most of the cases) we're already ignoring quorum and would rather be going for availability of the peer via CPG. At least I guess this is also relevant for startup ... have to check how I did that exactly ...
No description provided.