Skip to content

Conversation

@liangxin1300
Copy link
Collaborator

@liangxin1300 liangxin1300 commented Mar 27, 2025

This PR supports configuring the crashdump watchdog timeout
by using crm sbd configure crashdump-watchdog-timeout=<timeout>,
for both disk-based and diskless SBD.

  • disk-based SBD case
# crm sbd configure crashdump-watchdog-timeout=60
WARNING: Kdump service is not active on alp-1
WARNING: Kdump service is not active on alp-2
INFO: Set crashdump option for fence_sbd resource
INFO: Set msgwait-timeout to 2*watchdog-timeout + crashdump-watchdog-timeout: 90
INFO: Configuring disk-based SBD
INFO: Initializing SBD device /dev/sda8
INFO: Update SBD_TIMEOUT_ACTION in /etc/sysconfig/sbd: flush,crashdump
INFO: Update SBD_OPTS in /etc/sysconfig/sbd: -C 60
INFO: Already synced /etc/sysconfig/sbd to all nodes
WARNING: Resource is running, need to restart cluster service manually on each node
INFO: Update SBD_DELAY_START in /etc/sysconfig/sbd: 131
INFO: Already synced /etc/sysconfig/sbd to all nodes
WARNING: "stonith-timeout" in crm_config is set to 155, it was 83
  • diskless SBD case
# crm sbd configure crashdump-watchdog-timeout=60
WARNING: Kdump service is not active on alp-1
WARNING: Kdump service is not active on alp-2
INFO: Set stonith-watchdog-timeout to SBD_WATCHDOG_TIMEOUT + crashdump-watchdog-timeout: 75
INFO: Configuring diskless SBD
WARNING: Diskless SBD requires cluster with three or more nodes. If you want to use diskless SBD for 2-node cluster, should be combined with QDevice.
INFO: Update SBD_TIMEOUT_ACTION in /etc/sysconfig/sbd: flush,crashdump
INFO: Update SBD_OPTS in /etc/sysconfig/sbd: -C 60 -Z
INFO: Already synced /etc/sysconfig/sbd to all nodes
INFO: Restarting cluster service
INFO: BEGIN Waiting for cluster
...........                                                                                                                                                                                        
INFO: END Waiting for cluster
WARNING: "stonith-watchdog-timeout" in crm_config is set to 75, it was -1
WARNING: "stonith-timeout" in crm_config is set to 101, it was 71
  • To cleanup crashdump related option and configurations
# crm sbd purge crashdump 
INFO: Delete crashdump option for fence_sbd resource
INFO: Delete SBD_TIMEOUT_ACTION: flush,crashdump and restore original value
INFO: Update SBD_OPTS in /etc/sysconfig/sbd: 
INFO: Already synced /etc/sysconfig/sbd to all nodes
WARNING: Resource is running, need to restart cluster service manually on each node

@liangxin1300 liangxin1300 force-pushed the 20250328_crashdump_option branch from d1be2d5 to d0edb67 Compare March 31, 2025 13:52
@liangxin1300 liangxin1300 force-pushed the 20250328_crashdump_option branch 10 times, most recently from f377478 to e3f1f86 Compare April 11, 2025 02:21
@liangxin1300 liangxin1300 marked this pull request as ready for review April 11, 2025 02:22
@liangxin1300 liangxin1300 force-pushed the 20250328_crashdump_option branch from e3f1f86 to e218c75 Compare April 11, 2025 03:23
))


def has_primitive(cib: lxml.etree.Element, ra: ResourceAgent) -> list[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_primitive sounds like to return a boolean.

Suggested change
def has_primitive(cib: lxml.etree.Element, ra: ResourceAgent) -> list[str]:
def get_primitives_with_ra(cib: lxml.etree.Element, ra: ResourceAgent) -> list[str]:

Comment on lines 43 to 45
return [e.get('id') for e in cib.xpath(
f'/cib/configuration/resources//primitive[@class="{ra.m_class}"{provider_condition} and @type="{ra.m_type}"]'
)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return [e.get('id') for e in cib.xpath(
f'/cib/configuration/resources//primitive[@class="{ra.m_class}"{provider_condition} and @type="{ra.m_type}"]'
)]
return cib.xpath(
f'/cib/configuration/resources//primitive[@class="{ra.m_class}"{provider_condition} and @type="{ra.m_type}"]/@id',
)

Comment on lines 50 to 53
e.get('value') for e in cib.xpath(
f'/cib/configuration/resources//primitive[@id="{res_id}"]'
f'/instance_attributes/nvpair[@name="{param_name}"]'
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use .../@value in xpath.

crmsh/ui_sbd.py Outdated
PARSE_RE = re.compile(
# Match keys with non-empty values, capturing possible suffix
r'(\w+)(?:-(\w+))?=("[^"]+"|[\w/\d;]+)'
r'([\w-]+)-([\w-]+)=([\w/\d]+)'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment and code disagree here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update as

    PARSE_RE = re.compile(
        # To extract key, suffix and value from these possible arguments:
        # watchdog-timeout=30
        # crashdump-watchdog-timeout=120
        # watchdog-device=/dev/watchdog
        r'([\w-]+)-([\w]+)=([\w/]+)'
    )

And added the unit test for it

@nicholasyang2022 nicholasyang2022 self-requested a review April 11, 2025 06:40
@nicholasyang2022 nicholasyang2022 dismissed their stale review April 11, 2025 06:45

Approval is unintentional.

@liangxin1300 liangxin1300 force-pushed the 20250328_crashdump_option branch 8 times, most recently from be8789d to 187d48c Compare April 15, 2025 02:23
This commit supports configuring the crashdump watchdog timeout
by using `crm sbd configure crashdump-watchdog-timeout=<timeout>`,
for both disk-based and diskless SBD.
@liangxin1300 liangxin1300 force-pushed the 20250328_crashdump_option branch from 187d48c to 48e37ad Compare April 15, 2025 02:56
Skip the process of configuring if the value is the same.
Also load all attributes of SBD UI class before running any sbd
subcommand, to ensure all attributes are updated.
When no crashdump watchdog timeout configured, return False;
When crashdump watchdog timeout was specified and not equal to previous
value, return True;
Or, when watchdog timeout was specified and not equal to previous value,
return True.
@liangxin1300 liangxin1300 force-pushed the 20250328_crashdump_option branch 11 times, most recently from 40f10f4 to afb7d99 Compare April 17, 2025 06:57
@liangxin1300 liangxin1300 force-pushed the 20250328_crashdump_option branch from afb7d99 to 964c262 Compare April 17, 2025 07:20
Copy link
Contributor

@zzhou1 zzhou1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@liangxin1300 liangxin1300 merged commit 367bbfa into ClusterLabs:master Apr 17, 2025
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants