Fix: sbd-inquisitor: Implement default delay start for diskless sbd by epenchev · Pull Request #133 · ClusterLabs/sbd

epenchev · 2021-07-14T13:48:50Z

SBD sysconfig option SBD_DELAY_START is for addressing the case where cluster nodes reboot so fast even before fencing targeting it returns.
This could be common for virtual machines.
So far technically, a setting SBD_DELAY_START=yes takes effect only for sbd with shared disk, since the delay value is basically msgwait retrieved from sbd metadata.
For diskless sbd, for now we'd have to specify SBD_DELAY_START with a explicit delay value.
We need to implement to make SBD_DELAY_START=yes take effect for diskless sbd as well. In this case, the delay value should be SBD_WATCHDOG_TIMEOUT * 2

knet-ci-bot · 2021-07-14T13:48:52Z

Can one of the admins verify this patch?

wenningerk · 2021-07-14T13:51:06Z

test this please

wenningerk · 2021-07-16T06:58:16Z

            exit_status = ping_via_slots(argv[optind + 1], servants_leader);

-        } else if (strcmp(argv[optind], "watch") == 0) {
+        } else if (strcmp(argv[optind], "query-watchdog") == 0) {


guess we need this to be outside the disk-only-section

Yep those two should be separated as before.

Guess we can have one long if-then-else-if-then ... clause with the first part disabled when built without disk-support.

wenningerk · 2021-07-16T07:08:07Z

+                    timeout_watchdog_warn = calculate_timeout_watchdog_warn(timeout_watchdog);
+                }
+
+                cl_log(LOG_DEBUG, "Delay start: %s%s%s",


why don't we postpone that to after we've actually calculated the values that are being executed so that we can display them.
to simplify the logic and still not loose the information where the data came from we can set a pointer to a string constant when updating the delay from e.g. 'SBD_WATCHDOG_TIMEOUT * 2', 'msgwait', 'watchdog-timeout from cmdline * 2' (possible other source of watchdog would be the disk-header but this isn't relevant here as we would the go for msgwait taken from there) and add that as a hint to the log.

wenningerk · 2021-07-16T09:41:31Z

-            timeout_watchdog_warn = calculate_timeout_watchdog_warn(timeout_watchdog);
-        }
+                } else {
+                     cl_log(LOG_DEBUG, "Delay start (no)");


wenningerk · 2021-07-16T09:45:28Z

        } else if (strcmp(argv[optind], "ping") == 0) {
            exit_status = ping_via_slots(argv[optind + 1], servants_leader);

+        } else {


thought we'd move that outside the with-disk-section right at the end.

} else if (strcmp(argv[optind], "ping") == 0) { exit_status = ping_via_slots(argv[optind + 1], servants_leader); } else #endif if (strcmp(argv[optind], "query-watchdog") == 0) { ... } else { exit_status = -2; }

Ah ok, and split the watch option from it right ?

Ok I've pushed the commit with the white space fix to soon.

wenningerk · 2021-07-16T11:24:20Z


+                if (delay_start && delay <= 0) {
+                    delay = get_first_msgwait(servants_leader);
+                    delay_source = "msgwait";


probably just 'if (delay > 0)' as otherwise 2 * watchdog-timeout is taken (disk is probably not the source as otherwise reading msgwait would have worked).
even better would probably have been to bail out if msgwait can't be read but let's leave that to a maybe subsequent PR as it would mean a change in existing behavior and for 3 disks it shouldn't fail if the first disk fails.

wenningerk · 2021-07-16T11:25:42Z

-            exit_status = watchdog_test();
-        } else if (strcmp(argv[optind], "watch") == 0) {
-            /* sleep $(sbd $SBD_DEVICE_ARGS dump | grep -m 1 msgwait | awk '{print $4}') 2>/dev/null */
+                sleep((unsigned long) delay);


sleep(0) is more than a NOP (something like sched_yield()) so we should probably keep it in the delay_start block.

Will fix that, got mix up will the code move and all.

wenningerk · 2021-07-16T11:28:47Z

+            exit_status = watchdog_info();
+        } else if (strcmp(argv[optind], "test-watchdog") == 0) {
+            exit_status = watchdog_test();
+        } else {


why not even further to the end after the "watch"-section?

Yep that make sense.

wenningerk · 2021-07-16T11:35:11Z

+        if (strcmp(argv[optind], "watch") == 0) {
+            /* sleep $(sbd $SBD_DEVICE_ARGS dump | grep -m 1 msgwait | awk '{print $4}') 2>/dev/null */
+
+                const char *delay_source = "SBD_DELAY_START";


not necessarily source of delay at that point still can be SBD_DELAY_START, SBD_WATCHDOG_TIMEOUT * 2, default watchdog-timeout * 2, watchdog-timeout from cmdline

I'm a bit confused about that.
I see delay set only here after it has been set initially to 0.

value = get_env_option("SBD_DELAY_START"); if(value) { delay_start = crm_is_true(value); if (!delay_start) { delay = crm_get_msec(value) / 1000; if (delay > 0) { delay_start = true; } } }

After that point it has been set in the watch section.

True. But if it is not set there the source is 'timeout_watchdog * 2' and that again can have different sources.
Tracking where certain settings are coming from - and logging e.g. in debug or even at higher level together with the value - is one of my projects.
Here debug had that implemented already half-way so I thought we should keep it.
So as to keep that here low effort we can go with 'watchdog-timeout * 2' for now in cases startup-delay is derived from watchdog timeout. We still can add tracking of where watchdog-timeout is coming from in a separate PR including the decision if with delay we still want to show with 'watchdog-timeout' or if we replace the string with the details.

Thanks for the clarifications and for your patience to review my changes.
I will commit the relevant changes.

wenningerk · 2021-07-16T15:37:22Z

-                    if (delay <= 0) {
-                        delay = get_first_msgwait(servants_leader);
-                    }
+                if (delay_start && delay <= 0) {


Guess this should be in the block with (disc_count > 0) as get_first_msgwait is doomed to fail otherwise.

Yep make sense.

wenningerk · 2021-07-16T15:38:55Z

+                if (delay_start) {
+                    if (disk_count == 0 && delay <= 0) {
+                        delay = 2 * timeout_watchdog;
+                        delay_source = "SBD_WATCHDOG_TIMEOUT * 2";


we don't know where watchdog-timeout is coming from - yet.
so we probably should stay with what delay_source already has.

wenningerk · 2021-07-16T20:25:58Z

test this please

wenningerk · 2021-07-18T19:09:46Z

format-string issues detected by ubuntu builders ...

epenchev · 2021-07-19T08:47:23Z

format-string issues detected by ubuntu builders ...

Thanks for the hint, fixed.

wenningerk · 2021-07-19T08:52:39Z

test this please

wenningerk · 2021-07-19T10:54:09Z

Went through the final result again and I'm afraid the logic atm still has 2 holes.
1st guess we should initialize delay_source = delay?"SBD_DELAY_START":"" and just set it "watchdog-timeout * 2" below when we are actually doing that.
2nd there is still a possible way to fall through in disk-mode with a delay == 0.
Mentioned it above already iirc. One way would be bailing out and fail sbd-start but before doing that I guess we should try getting the timeout from one of the other disks if available (maybe in a later PR). And it would definitely be a significant change in behavior to what we have so far.
We should in any case issue a warning. Instead of doing a 'sleep(0)' it would probably be better to fallback to 'watchdog-timeout * 2'. If we decide for the latter that should be probably reflected in the name of the commit or split out in an 2nd one.
@gao-yan: What do you think about defaulting to 2 * watchdog-timeout in case we fail reading the disk here?

wenningerk · 2021-07-19T11:03:16Z

Maybe something like:
"Overhaul startup-delay
restructure code, default to 2 * watchdog-delay without disk or when waitmsg-timeout can't be read from 1st disk"

gao-yan · 2021-07-19T11:25:46Z

2nd there is still a possible way to fall through in disk-mode with a delay == 0.
Mentioned it above already iirc. One way would be bailing out and fail sbd-start but before doing that I guess we should try getting the timeout from one of the other disks if available (maybe in a later PR). And it would definitely be a significant change in behavior to what we have so far.

I think it already does so in get_first_msgwait(). The function actually means to "get_msgwait_from_the_first_available_disk" :-)

gao-yan · 2021-07-19T11:26:22Z

Maybe something like:
"Overhaul startup-delay
restructure code, default to 2 * watchdog-delay without disk or when waitmsg-timeout can't be read from 1st disk"

Sounds sensible and clear to me.

epenchev · 2021-07-19T12:02:17Z

Maybe something like:
"Overhaul startup-delay
restructure code, default to 2 * watchdog-delay without disk or when waitmsg-timeout can't be read from 1st disk"

Yep I guess the check just before the sleep() if (disk_count == 0 && delay <= 0) can be changed to if (delay <= 0) and this way we can default to 2 * watchdog-delay in case of any disk errors.

gao-yan · 2021-07-19T12:12:07Z

Maybe something like:
"Overhaul startup-delay
restructure code, default to 2 * watchdog-delay without disk or when waitmsg-timeout can't be read from 1st disk"

Yep I guess the check just before the sleep() if (disk_count == 0 && delay <= 0) can be changed to if (delay <= 0) and this way we can default to 2 * watchdog-delay in case of any disk errors.

The same impression here.

wenningerk · 2021-07-19T12:26:44Z

2nd there is still a possible way to fall through in disk-mode with a delay == 0.
Mentioned it above already iirc. One way would be bailing out and fail sbd-start but before doing that I guess we should try getting the timeout from one of the other disks if available (maybe in a later PR). And it would definitely be a significant change in behavior to what we have so far.

I think it already does so in get_first_msgwait(). The function actually means to "get_msgwait_from_the_first_available_disk" :-)

Shame on me - the loop is quite obvious ;-)
So we could consider bailing out. But defaulting to 2 * watchdog-delay is still not a bad choice I guess.

wenningerk · 2021-07-19T12:41:52Z

Maybe something like:
"Overhaul startup-delay
restructure code, default to 2 * watchdog-delay without disk or when waitmsg-timeout can't be read from 1st disk"

Yep I guess the check just before the sleep() if (disk_count == 0 && delay <= 0) can be changed to if (delay <= 0) and this way we can default to 2 * watchdog-delay in case of any disk errors.

The same impression here.

Yep that would create the correct logic with minimum code. But we should spare a comment to state that we catch disk-read-issues here as well to remember later on. When we detect the correct reading of msgwait above (delay > 0) we can still state in a warning otherwise that we default to 2 * watchdog-timeout as the final log is just a debug.

wenningerk · 2021-07-19T14:17:18Z

+
+                        if (delay > 0) {
+                            delay_source = "msgwait";
+                            cl_log(LOG_WARNING, "Using 'msgwait' value for 'delay' starting instead of default '2 * watchdog-timeout'");


Sorry if my unclear phrasing led to that ;-)
I guess we are in the branch here where everything runs as configured so no need for a warning.
I meant an else here that would warn about '2 * watchdog-timeout' being taken instead of unreadable 'msgwait'.

No worries, actually this was my initial thought to. But I've understand the message a bit different.
Will fix that quick.

SBD sysconfig option `SBD_DELAY_START` is for addressing the case where cluster nodes reboot so fast even before fencing targeting it returns. This could be common for virtual machines. So far technically, a setting `SBD_DELAY_START=yes` takes effect only for sbd with shared disk, since the delay value is basically `msgwait` retrieved from sbd metadata. For diskless sbd, for now we'd have to specify `SBD_DELAY_START` with a explicit delay value. We need to implement to make `SBD_DELAY_START=yes` take effect for diskless sbd as well. In this case, the delay value should be `SBD_WATCHDOG_TIMEOUT * 2`

wenningerk · 2021-07-19T18:29:08Z

test this please

wenningerk · 2021-07-20T06:58:38Z

Thank you for the contribution!

wenningerk reviewed Jul 14, 2021

View reviewed changes

Comment thread src/sbd-inquisitor.c

wenningerk reviewed Jul 15, 2021

View reviewed changes

Comment thread src/sbd-inquisitor.c

wenningerk reviewed Jul 16, 2021

View reviewed changes

Comment thread src/sbd-inquisitor.c

wenningerk reviewed Jul 16, 2021

View reviewed changes

Comment thread src/sbd-inquisitor.c

wenningerk reviewed Jul 16, 2021

View reviewed changes

wenningerk reviewed Jul 19, 2021

View reviewed changes

wenningerk merged commit f4ca41f into ClusterLabs:master Jul 20, 2021

gao-yan mentioned this pull request Jul 21, 2021

Change corosync and sbd parameters based on the profile environment detected ClusterLabs/crmsh#842

Merged

Conversation

epenchev commented Jul 14, 2021

Uh oh!

knet-ci-bot commented Jul 14, 2021

Uh oh!

wenningerk commented Jul 14, 2021

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

epenchev Jul 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wenningerk commented Jul 16, 2021

Uh oh!

wenningerk commented Jul 18, 2021

Uh oh!

epenchev commented Jul 19, 2021

Uh oh!

wenningerk commented Jul 19, 2021

Uh oh!

wenningerk commented Jul 19, 2021

Uh oh!

wenningerk commented Jul 19, 2021

Uh oh!

gao-yan commented Jul 19, 2021

Uh oh!

gao-yan commented Jul 19, 2021

Uh oh!

epenchev commented Jul 19, 2021

Uh oh!

gao-yan commented Jul 19, 2021

Uh oh!

wenningerk commented Jul 19, 2021

Uh oh!

wenningerk commented Jul 19, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

epenchev Jul 16, 2021 •

edited

Loading