Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev: corosync: add subcommands 'crm corosync link ...' for managing multi-links in knet (jsc#PED-8083) #1471

Open
wants to merge 27 commits into
base: master
Choose a base branch
from

Conversation

nicholasyang2022
Copy link
Collaborator

@nicholasyang2022 nicholasyang2022 commented Jun 24, 2024

Knet transport in corosync provides flexible configurations of multi-link connectivity. This pull request adds some subcommands to manage these configurations, including:

  1. crm corosync link show to list configured links
  2. crm corosync link update <linknumber> [<nodename>=<new_address> ... ] [options <option>=[value] ...] to modify an existing link
  3. crm corosync link add <nodename>=<new_address> ... [options <option>=[value] ...] to add a new link.
  4. crm corosync link remove <linknumber> to remove an existing link.

These subcommands operate on local configuration file corosync.conf. To make the changes to take effects, the configuration file needs to be synchronized across the cluster with crm corosync push and corosync service needs to be restarted.

@nicholasyang2022 nicholasyang2022 changed the title Dev: corosync: add subcommands 'crm corosync link ...' for managing multi-links in knet Dev: corosync: add subcommands 'crm corosync link ...' for managing multi-links in knet (jsc#PED-8083) Jun 24, 2024
Copy link

codecov bot commented Jun 24, 2024

Codecov Report

Attention: Patch coverage is 93.42723% with 28 lines in your changes missing coverage. Please review.

Project coverage is 69.94%. Comparing base (2abacf2) to head (2c07507).

Files Patch % Lines
crmsh/ui_corosync.py 88.70% 20 Missing ⚠️
crmsh/corosync.py 96.80% 7 Missing ⚠️
crmsh/bootstrap.py 66.66% 1 Missing ⚠️
Additional details and impacted files
Flag Coverage Δ
integration 54.80% <88.02%> (+0.54%) ⬆️
unit 52.71% <68.30%> (+0.24%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
crmsh/corosync_config_format.py 100.00% <100.00%> (ø)
crmsh/iproute2.py 100.00% <100.00%> (ø)
crmsh/bootstrap.py 88.65% <66.66%> (-0.01%) ⬇️
crmsh/corosync.py 87.64% <96.80%> (+7.03%) ⬆️
crmsh/ui_corosync.py 74.18% <88.70%> (+24.18%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nicholasyang2022 nicholasyang2022 marked this pull request as ready for review June 27, 2024 06:26
@liangxin1300
Copy link
Collaborator

crm(live/alp-1)corosync link# show
Link 0:
  Options:

  Node addresses:
    Node 1: alp-1	192.168.122.189
    Node 2: alp-2	192.168.122.145

Link 1:
  Options:

  Node addresses:
    Node 1: alp-1	10.10.10.61
    Node 2: alp-2	10.10.10.62

crm(live/alp-1)corosync link# add alp-1=20.20.20.61 alp-2=10.10.10.62

10.10.10.62 is already used, it should raise an error here

@liangxin1300
Copy link
Collaborator

crm(live/alp-1)corosync link# add alp-1=10.10.10.61 alp-2=10.10.10.61
INFO: Use "crm corosync diff" to show the difference
INFO: Use "crm corosync push" to sync

Should complaint when assigning the same IP

@liangxin1300
Copy link
Collaborator

liangxin1300 commented Jun 28, 2024

  • It should raise an error when assigning not existing IPaddress when adding a link
  • No complaint for the invalid IP when adding a link
crm(live/alp-1)corosync link# add alp-1=20.20.20 alp-2=30.30.30
INFO: Use "crm corosync diff" to show the difference
INFO: Use "crm corosync push" to sync

@liangxin1300
Copy link
Collaborator

crm(live/alp-1)corosync link# show
Link 0:
  Options:

  Node addresses:
    Node 1: alp-1       192.168.122.189
    Node 2: alp-2       192.168.122.145

Link 1:
  Options:
    knet_transport:     sctp

  Node addresses:
    Node 1: alp-1       10.10.10.61
    Node 2: alp-2       10.10.10.62

I think the showing order should be Node addresses first, then Options, since:

  • all of your usages are adding addr first, then options
  • people might care addresses most, not the options, which mostly use the default ones

@liangxin1300
Copy link
Collaborator

After adding link twice:

nodelist {
        node {
                ring0_addr: 192.168.122.189
                name: alp-1
                nodeid: 1
                ring1_addr: 10.10.10.61
                ring2_addr: 20.20.20.61
        }

        node {
                ring0_addr: 192.168.122.145
                name: alp-2
                nodeid: 2
                ring1_addr: 10.10.10.62
                ring2_addr: 20.20.20.62
        }

}

Please put the ring 0-2 together

@liangxin1300
Copy link
Collaborator

crm(live/alp-1)corosync link# update 2 
INFO: Use "crm corosync diff" to show the difference
INFO: Use "crm corosync push" to sync

This case should raise an error and show the usage?

@nicholasyang2022
Copy link
Collaborator Author

crm(live/alp-1)corosync link# update 2 
INFO: Use "crm corosync diff" to show the difference
INFO: Use "crm corosync push" to sync

This case should raise an error and show the usage?

This is syntactically correct according to the usage. IMO it is better to exit successfully with a warning message.

@liangxin1300
Copy link
Collaborator

liangxin1300 commented Jul 8, 2024

crm(live/alp-1)corosync link# show

  Node addresses:
    Node 1: alp-1       192.168.122.68
    Node 2: alp-2       192.168.122.238
Link 0:
  Options:


  Node addresses:
    Node 1: alp-1       10.10.10.61
    Node 2: alp-2       10.10.10.62
Link 1:
  Options:

crm(live/alp-1)corosync link# 

Link <index> should be the first line

@liangxin1300
Copy link
Collaborator

crm(live/alp-1)corosync link# show

  Node addresses:
    Node 1: alp-1       192.168.122.68
    Node 2: alp-2       192.168.122.238
Link 0:
  Options:


  Node addresses:
    Node 1: alp-1       10.10.10.61
    Node 2: alp-2       10.10.10.62
Link 1:
  Options:

crm(live/alp-1)corosync link# remove 0
ERROR: corosync.link.remove: Cannot remove the last link.

When there are two or more links, remove the first link(index is 0) should be allowed
I've tested that corosync works well when there is no ring0_addr

nicholasyang2022 and others added 26 commits August 2, 2024 11:24
Not fully work. As the help system does not support subcommands at
3rd-level.

Apply suggestions from code review for docs

Co-authored-by: Roger Zhou <zzhou@suse.com>
…D-8083)

Both updating link options and adding/removing links are hot reloadable.
Updating the address of existing nodes is not hot reloadable.

Update crmsh/ui_corosync.py

Co-authored-by: Roger Zhou <zzhou@suse.com>

do not reload or restart if corosync.service is not running

Co-authored-by: Roger Zhou <zzhou@suse.com>
@nicholasyang2022
Copy link
Collaborator Author

We have some problem in the functional tests with corosync-cmapctl -m stats.

     Given Run "crm corosync link remove 1" OK on "hanode1"                                                                 # opt/crmsh/test/features/steps/step_implementation.py:101
ERROR:Step:Run "crm corosync link remove 1" OK on "hanode1":
ERROR: corosync.link.remove: Cannot remove link 1. blablabla

Here the whole stats.knet.node2.link0 section is missing from the stats:

hanode2:/ # corosync-cmapctl -m stats | grep connected
stats.knet.node1.link0.connected (u8) = 1
stats.knet.node1.link1.connected (u8) = 1
stats.knet.node2.link0.connected (u8) = 1

While corosync.log shows that link1 is working:

hanode2:/ # tail /var/log/cluster/corosync.log 
Aug 12 06:12:39 [1819] hanode2 corosync info    [TOTEM ] Configured link number 1: local addr: 172.18.0.3, port=5406
Aug 12 06:12:39 [1819] hanode2 corosync info    [KNET  ] host: host: 1 (passive) best link: 0 (pri: 10)
Aug 12 06:12:39 [1819] hanode2 corosync info    [KNET  ] host: host: 1 (passive) best link: 0 (pri: 10)
Aug 12 06:12:39 [1819] hanode2 corosync info    [KNET  ] host: host: 1 (passive) best link: 0 (pri: 10)
Aug 12 06:12:39 [1819] hanode2 corosync info    [KNET  ] pmtud: MTU manually set to: 0
Aug 12 06:12:42 [1819] hanode2 corosync info    [KNET  ] rx: host: 1 link: 1 is up
Aug 12 06:12:42 [1819] hanode2 corosync info    [KNET  ] link: Resetting MTU for link 1 because host 1 joined
Aug 12 06:12:42 [1819] hanode2 corosync info    [KNET  ] host: host: 1 (passive) best link: 1 (pri: 11)
Aug 12 06:12:42 [1819] hanode2 corosync info    [KNET  ] pmtud: PMTUD link change for host: 1 link: 1 from 485 to 1397
Aug 12 06:12:42 [1819] hanode2 corosync info    [KNET  ] pmtud: Global data MTU changed to: 1397

@zzhou1
Copy link
Contributor

zzhou1 commented Aug 14, 2024

We have some problem in the functional tests with corosync-cmapctl -m stats.

A guess work, maybe corosync link state is still transitioning, maybe need wait a little time for the stable state?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants