New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix ifconfig down in udpu #262

Closed
wants to merge 1 commit into
base: needle
from

Conversation

Projects
None yet
4 participants
@liu4480
Copy link
Contributor

liu4480 commented Oct 30, 2017

I have a two-node cluster: node1 and node2.

on node1, I just run corosync-quorumtool to see the membership status,
and on node2 I run ifconfig eth0 down/up, everything I see is as
expected on node1.

and on node2, when interface is up, I can see corosync works expected.
But when I execute ifconfig eth0 down, something interesting happens:
Quorum information

Date: Thu Oct 19 11:30:24 2017
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 172204675
Ring ID: 32
Quorate: Yes

Votequorum information
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 1
Flags: 2Node Quorate WaitForAll
Unable to get node address for nodeid 172204674: CS_ERR_NOT_EXIST

Membership information
Nodeid Votes Name
172204674 1 (local)
172204675 1 bliu-sle12sp3-node1

This patch will fix the membership on node2 upon ifconfig ethX down/up for udpu.

Bin Liu
fix ifconfig down in udpu
I have a two-node cluster with udpu: node1 and node2.

on node1, I just run corosync-quorumtool to see the membership status,
and on node2 I run ifconfig eth0 down/up, everything I see is as
expected on node1.

and on node2, when interface is up, I can see corosync works expected.
But when I execute ifconfig eth0 down, something interesting happens:
Quorum information
------------------
Date:             Thu Oct 19 11:30:24 2017
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          172204675
Ring ID:          32
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           1
Flags:            2Node Quorate WaitForAll
Unable to get node address for nodeid 172204674: CS_ERR_NOT_EXIST

Membership information
----------------------
    Nodeid      Votes Name
    172204674          1 (local)
    172204675          1 bliu-sle12sp3-node1

This patch will fix the membership on node2 upon ifconfig ethX down/up
with udpu.
@jfriesse

This comment has been minimized.

Copy link
Member

jfriesse commented Oct 30, 2017

@liu4480 Just for sure. When patch is applied, the message

 Unable to get node address for nodeid 172204674: CS_ERR_NOT_EXIST

persists or not? If not, what exactly was causing that (or what exact part of your patch fixes that)?

@jfriesse

This comment has been minimized.

Copy link
Member

jfriesse commented Jul 10, 2018

@liu4480 I totally forgot to merge this patch. Don't you have it somewhere by accident?

@knet-ci-bot

This comment has been minimized.

Copy link

knet-ci-bot commented Jul 10, 2018

Can one of the admins verify this patch?

@edwintorok

This comment has been minimized.

Copy link
Contributor

edwintorok commented Jul 10, 2018

The patch is still visible if you click on the commit, and you can get a raw patch from it: https://github.com/corosync/corosync/commit/d87901d7615769e00fe086c5005892e0977c7aef.patch

@jfriesse

This comment has been minimized.

Copy link
Member

jfriesse commented Jul 12, 2018

@edwintorok Thank you for the link! I've retested the patch and it looks good. It never sends localhost (127.0.0.1) and allowed ipc clients work after ifdown,

So I decided to rephrase commit message and merge/forward-port it as 96b4bd1 / 96354fb.

Just for the record.

  • During my testing I've found small issue with runtime.totem.pg.mrp.srp.members.X.status not reflecting other nodes status after ifdown. This problem is not happening on master, so we've probably fixed it as a result of fixing something unrelated. CPG membership is updated correctly on master and needle. So I don't consider it as a blocker.
  • RRP works, but after both links go down (ifdown) and then at least one goes up, membership is not created looping in gather state.
  • For whoever decides to say hooray, we will test failure using ifdown. Don't do it. ifdown is and will be one of the things which are prohibited. It doesn't test real failure.

@liu4480 Sorry for delay and thank you for this nice patch.

@liu4480

This comment has been minimized.

Copy link
Contributor

liu4480 commented Jul 14, 2018

@jfriesse sorry for the latency and thanks for merging:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment