Skip to content

Fix: sbd-inquisitor: Implement default delay start for diskless sbd#133

Merged
wenningerk merged 1 commit into
ClusterLabs:masterfrom
epenchev:delay_start_diskless
Jul 20, 2021
Merged

Fix: sbd-inquisitor: Implement default delay start for diskless sbd#133
wenningerk merged 1 commit into
ClusterLabs:masterfrom
epenchev:delay_start_diskless

Conversation

@epenchev
Copy link
Copy Markdown
Member

SBD sysconfig option SBD_DELAY_START is for addressing the case where cluster nodes reboot so fast even before fencing targeting it returns.
This could be common for virtual machines.
So far technically, a setting SBD_DELAY_START=yes takes effect only for sbd with shared disk, since the delay value is basically msgwait retrieved from sbd metadata.
For diskless sbd, for now we'd have to specify SBD_DELAY_START with a explicit delay value.
We need to implement to make SBD_DELAY_START=yes take effect for diskless sbd as well. In this case, the delay value should be SBD_WATCHDOG_TIMEOUT * 2

@knet-ci-bot
Copy link
Copy Markdown

Can one of the admins verify this patch?

@wenningerk
Copy link
Copy Markdown

test this please

Comment thread src/sbd-inquisitor.c
Comment thread src/sbd-inquisitor.c
Comment thread src/sbd-inquisitor.c Outdated
exit_status = ping_via_slots(argv[optind + 1], servants_leader);

} else if (strcmp(argv[optind], "watch") == 0) {
} else if (strcmp(argv[optind], "query-watchdog") == 0) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guess we need this to be outside the disk-only-section

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep those two should be separated as before.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess we can have one long if-then-else-if-then ... clause with the first part disabled when built without disk-support.

Comment thread src/sbd-inquisitor.c
Comment thread src/sbd-inquisitor.c Outdated
timeout_watchdog_warn = calculate_timeout_watchdog_warn(timeout_watchdog);
}

cl_log(LOG_DEBUG, "Delay start: %s%s%s",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we postpone that to after we've actually calculated the values that are being executed so that we can display them.
to simplify the logic and still not loose the information where the data came from we can set a pointer to a string constant when updating the delay from e.g. 'SBD_WATCHDOG_TIMEOUT * 2', 'msgwait', 'watchdog-timeout from cmdline * 2' (possible other source of watchdog would be the disk-header but this isn't relevant here as we would the go for msgwait taken from there) and add that as a hint to the log.

Comment thread src/sbd-inquisitor.c
Comment thread src/sbd-inquisitor.c Outdated
timeout_watchdog_warn = calculate_timeout_watchdog_warn(timeout_watchdog);
}
} else {
cl_log(LOG_DEBUG, "Delay start (no)");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace

Comment thread src/sbd-inquisitor.c Outdated
} else if (strcmp(argv[optind], "ping") == 0) {
exit_status = ping_via_slots(argv[optind + 1], servants_leader);

} else {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought we'd move that outside the with-disk-section right at the end.

} else if (strcmp(argv[optind], "ping") == 0) {
            exit_status = ping_via_slots(argv[optind + 1], servants_leader);
} else
#endif

if (strcmp(argv[optind], "query-watchdog") == 0) {
...
} else {
    exit_status = -2;
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, and split the watch option from it right ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I've pushed the commit with the white space fix to soon.

Comment thread src/sbd-inquisitor.c Outdated

if (delay_start && delay <= 0) {
delay = get_first_msgwait(servants_leader);
delay_source = "msgwait";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably just 'if (delay > 0)' as otherwise 2 * watchdog-timeout is taken (disk is probably not the source as otherwise reading msgwait would have worked).
even better would probably have been to bail out if msgwait can't be read but let's leave that to a maybe subsequent PR as it would mean a change in existing behavior and for 3 disks it shouldn't fail if the first disk fails.

Comment thread src/sbd-inquisitor.c Outdated
exit_status = watchdog_test();
} else if (strcmp(argv[optind], "watch") == 0) {
/* sleep $(sbd $SBD_DEVICE_ARGS dump | grep -m 1 msgwait | awk '{print $4}') 2>/dev/null */
sleep((unsigned long) delay);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sleep(0) is more than a NOP (something like sched_yield()) so we should probably keep it in the delay_start block.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix that, got mix up will the code move and all.

Comment thread src/sbd-inquisitor.c Outdated
exit_status = watchdog_info();
} else if (strcmp(argv[optind], "test-watchdog") == 0) {
exit_status = watchdog_test();
} else {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not even further to the end after the "watch"-section?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that make sense.

Comment thread src/sbd-inquisitor.c Outdated
if (strcmp(argv[optind], "watch") == 0) {
/* sleep $(sbd $SBD_DEVICE_ARGS dump | grep -m 1 msgwait | awk '{print $4}') 2>/dev/null */

const char *delay_source = "SBD_DELAY_START";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessarily source of delay at that point still can be SBD_DELAY_START, SBD_WATCHDOG_TIMEOUT * 2, default watchdog-timeout * 2, watchdog-timeout from cmdline

Copy link
Copy Markdown
Member Author

@epenchev epenchev Jul 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about that.
I see delay set only here after it has been set initially to 0.

value = get_env_option("SBD_DELAY_START");
        if(value) {
            delay_start = crm_is_true(value);

            if (!delay_start) {
                delay = crm_get_msec(value) / 1000;
                if (delay > 0) {
                    delay_start = true;
                }
            }
        }

After that point it has been set in the watch section.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. But if it is not set there the source is 'timeout_watchdog * 2' and that again can have different sources.
Tracking where certain settings are coming from - and logging e.g. in debug or even at higher level together with the value - is one of my projects.
Here debug had that implemented already half-way so I thought we should keep it.
So as to keep that here low effort we can go with 'watchdog-timeout * 2' for now in cases startup-delay is derived from watchdog timeout. We still can add tracking of where watchdog-timeout is coming from in a separate PR including the decision if with delay we still want to show with 'watchdog-timeout' or if we replace the string with the details.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarifications and for your patience to review my changes.
I will commit the relevant changes.

Comment thread src/sbd-inquisitor.c Outdated
if (delay <= 0) {
delay = get_first_msgwait(servants_leader);
}
if (delay_start && delay <= 0) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess this should be in the block with (disc_count > 0) as get_first_msgwait is doomed to fail otherwise.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep make sense.

Comment thread src/sbd-inquisitor.c Outdated
if (delay_start) {
if (disk_count == 0 && delay <= 0) {
delay = 2 * timeout_watchdog;
delay_source = "SBD_WATCHDOG_TIMEOUT * 2";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't know where watchdog-timeout is coming from - yet.
so we probably should stay with what delay_source already has.

@wenningerk
Copy link
Copy Markdown

test this please

@wenningerk
Copy link
Copy Markdown

format-string issues detected by ubuntu builders ...

@epenchev
Copy link
Copy Markdown
Member Author

format-string issues detected by ubuntu builders ...

Thanks for the hint, fixed.

@wenningerk
Copy link
Copy Markdown

test this please

@wenningerk
Copy link
Copy Markdown

Went through the final result again and I'm afraid the logic atm still has 2 holes.
1st guess we should initialize delay_source = delay?"SBD_DELAY_START":"" and just set it "watchdog-timeout * 2" below when we are actually doing that.
2nd there is still a possible way to fall through in disk-mode with a delay == 0.
Mentioned it above already iirc. One way would be bailing out and fail sbd-start but before doing that I guess we should try getting the timeout from one of the other disks if available (maybe in a later PR). And it would definitely be a significant change in behavior to what we have so far.
We should in any case issue a warning. Instead of doing a 'sleep(0)' it would probably be better to fallback to 'watchdog-timeout * 2'. If we decide for the latter that should be probably reflected in the name of the commit or split out in an 2nd one.
@gao-yan: What do you think about defaulting to 2 * watchdog-timeout in case we fail reading the disk here?

@wenningerk
Copy link
Copy Markdown

Maybe something like:
"Overhaul startup-delay
restructure code, default to 2 * watchdog-delay without disk or when waitmsg-timeout can't be read from 1st disk"

@gao-yan
Copy link
Copy Markdown
Member

gao-yan commented Jul 19, 2021

2nd there is still a possible way to fall through in disk-mode with a delay == 0.
Mentioned it above already iirc. One way would be bailing out and fail sbd-start but before doing that I guess we should try getting the timeout from one of the other disks if available (maybe in a later PR). And it would definitely be a significant change in behavior to what we have so far.

I think it already does so in get_first_msgwait(). The function actually means to "get_msgwait_from_the_first_available_disk" :-)

@gao-yan
Copy link
Copy Markdown
Member

gao-yan commented Jul 19, 2021

Maybe something like:
"Overhaul startup-delay
restructure code, default to 2 * watchdog-delay without disk or when waitmsg-timeout can't be read from 1st disk"

Sounds sensible and clear to me.

@epenchev
Copy link
Copy Markdown
Member Author

Maybe something like:
"Overhaul startup-delay
restructure code, default to 2 * watchdog-delay without disk or when waitmsg-timeout can't be read from 1st disk"

Yep I guess the check just before the sleep() if (disk_count == 0 && delay <= 0) can be changed to if (delay <= 0) and this way we can default to 2 * watchdog-delay in case of any disk errors.

@gao-yan
Copy link
Copy Markdown
Member

gao-yan commented Jul 19, 2021

Maybe something like:
"Overhaul startup-delay
restructure code, default to 2 * watchdog-delay without disk or when waitmsg-timeout can't be read from 1st disk"

Yep I guess the check just before the sleep() if (disk_count == 0 && delay <= 0) can be changed to if (delay <= 0) and this way we can default to 2 * watchdog-delay in case of any disk errors.

The same impression here.

@wenningerk
Copy link
Copy Markdown

2nd there is still a possible way to fall through in disk-mode with a delay == 0.
Mentioned it above already iirc. One way would be bailing out and fail sbd-start but before doing that I guess we should try getting the timeout from one of the other disks if available (maybe in a later PR). And it would definitely be a significant change in behavior to what we have so far.

I think it already does so in get_first_msgwait(). The function actually means to "get_msgwait_from_the_first_available_disk" :-)

Shame on me - the loop is quite obvious ;-)
So we could consider bailing out. But defaulting to 2 * watchdog-delay is still not a bad choice I guess.

@wenningerk
Copy link
Copy Markdown

Maybe something like:
"Overhaul startup-delay
restructure code, default to 2 * watchdog-delay without disk or when waitmsg-timeout can't be read from 1st disk"

Yep I guess the check just before the sleep() if (disk_count == 0 && delay <= 0) can be changed to if (delay <= 0) and this way we can default to 2 * watchdog-delay in case of any disk errors.

The same impression here.

Yep that would create the correct logic with minimum code. But we should spare a comment to state that we catch disk-read-issues here as well to remember later on. When we detect the correct reading of msgwait above (delay > 0) we can still state in a warning otherwise that we default to 2 * watchdog-timeout as the final log is just a debug.

Comment thread src/sbd-inquisitor.c Outdated

if (delay > 0) {
delay_source = "msgwait";
cl_log(LOG_WARNING, "Using 'msgwait' value for 'delay' starting instead of default '2 * watchdog-timeout'");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if my unclear phrasing led to that ;-)
I guess we are in the branch here where everything runs as configured so no need for a warning.
I meant an else here that would warn about '2 * watchdog-timeout' being taken instead of unreadable 'msgwait'.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, actually this was my initial thought to. But I've understand the message a bit different.
Will fix that quick.

SBD sysconfig option `SBD_DELAY_START` is for addressing the case where cluster nodes reboot so fast even before fencing targeting it returns.
This could be common for virtual machines.
So far technically, a setting `SBD_DELAY_START=yes` takes effect only for sbd with shared disk, since the delay value is basically `msgwait` retrieved from sbd metadata.
For diskless sbd, for now we'd have to specify `SBD_DELAY_START` with a explicit delay value.
We need to implement to make `SBD_DELAY_START=yes` take effect for diskless sbd as well. In this case, the delay value should be `SBD_WATCHDOG_TIMEOUT * 2`
@wenningerk
Copy link
Copy Markdown

test this please

@wenningerk wenningerk merged commit f4ca41f into ClusterLabs:master Jul 20, 2021
@wenningerk
Copy link
Copy Markdown

Thank you for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants