New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fence_scsi use clusterwide nodeID instead of local nodelist ID of node #289
Conversation
Hi OndrejHome, There is one point to worry about. In older versions of corosync2.4.5, the nodeid in corosync.conf is not mandatory and will be created automatically if not specified.
In this case, this change makes fence_scsi unavailable. If you make this change, you should encourage the message to set the nodeid to corosync.conf. (In the case of corosync3.x, nodeid of nodelist of corosync.conf is essential, so I think that there is no problem.) Best Regards, |
Hi @HideoYamauchi, Thank you for feedback. I can see now that when I omit the 'nodeid' in CentOS 7.6 cluster (corosync 2.4.x) the
In that case this change will really fail. So a good question is on what to do in such case, some ideas that comes to my mind:
I will try to have a look at this during next week and update here on my findings. Ondrej (온드레이) |
Wouldnt sorting the output from |
Hi @oalbrigt,
If I would forget about backward compatibility then most probably hashing the 'node name' in similar way how the 'cluster name' is done already to make first part of SCSI key would make sense to me. It would work in both corosync 2.x and 3.x as 'node name' is something we ask user to provide so it matches further. Eventually it wouldn't even need a lookup in the
|
If you simply sort the entire output (e.g. sorted() around fence-agents/agents/scsi/fence_scsi.py Line 195 in dd6382f
|
Hmm, I might be missing something, but how does 'sorted()' solve the issue?
|
Ah. This is kind of tricky. I guess the best way to do this might be to only return ring0_addr to an array (and sort it), and use the nodes position in the array as the node ID. |
I'll see if I can come up with a good way to do that. |
I realized doing it with arrays all the nodes will have to redo unfencing for it to work correctly. Ondrej: is this an issue you've seen between nodes running same version of a distro, and does it ever happen without manually editing corosync.conf? |
@oalbrigt, I have run into while preparing training in which I wanted to explain how the SCSI key made by fence_scsi is generated. So no real impact from that. Editing corosync.conf was a simple POC that using the IDs might be flawed. For more "real-life" scenario check below: 3 node cluster
Remove any node except of the last (I have removed second node).
One of issues that I overlook here is that SCSI key from node2 that was removed is not removed from device. But this is not important for this PR. Fence last node
Expected outcome: Node 3 key
This is a little bit different example form one that I opened this PR for, but root cause is the same. Using list numbering that can change over time instead of stable IDs of nodes or something else that is stable. In example here the fencing succeeds without providing the desired effect - preventing the write access to shared drive. Example here doesn't manually change anything and uses only Please let me know what you think about this. I will a bit busy this week because of upcoming holidays here so I may respond in second half of week. Ondrej |
We'll have to find a way that works for all scenarios, as there's no reason to "solve" it by just introducing a new edge-case. It seems like we cant use hostnames, as the reservation key is limited to 8 bytes (from sg_persist man-page):
|
also half of that key is already used by small chunk of md5-ed cluster name. fence-agents/agents/scsi/fence_scsi.py Lines 206 to 207 in dd6382f
So we have only 4 bytes if we keep format used so far. I wonder why this is used and only thing that comes to my mind is when storage is used by multiple clusters and somehow we can see multiple keys where I would not expect them to be seen. Otherwise it is rather a nice to see thing to tell from which cluster the keys came from if they were generated by fence_scsi. Few rather theoretical questions comes to my mind:
If node ids cannot fit within 8 bytes I can imagine to have stored on all nodes table that would map nodeid to SCSI key in consistent way. But as this is rather stateful information that can potentially get out-of-date as the cluster lives I would like to avoid it if possible. |
I guess hashing the nodename combined with only using 4 bytes of the hash might be the least likely way to create duplicates. |
We'll have to implement this as a strongly recommended setting to avoid breaking rolling upgrades. So e.g. |
Thank you Oyvind for reply and ideas. I will have a look at this and update you during this week. |
I'm sorry for delay, being more busy than expected. I will try to get to this soon. |
Sounds good. |
9799ad5
to
da8e0a8
Compare
this methos generates second part of SCSI key based on hash of cluster node name instead of currently used ID based approach which can brake if the nodes get removed from cluster but whole cluster is not restarted because the IDs changes. With hash approach hashes stays same. Note that there is theoretical risk that hashes could colide.
Hi Oyvind, so finally a new proposal implementing new option |
Hi Oyvind, I have added the changes to address your review in previous weeks. Once you have time, please let me know if they are OK or if you see any other changes that needs to be done to move this forward. Thank you! |
I added a comment to your last commit. Maybe you didnt see it. |
hmm, by 'last commit' you mean 5810571? I don't see any comment on it - I have tried looking directly into commit and checking the mail notifications, but nothing is showing up. Can you post here the link to that comment please? |
Oh. It seems like I had a lack of coffee that day, so it was stuck in "review" mode. I just added it as a comment as I intended to initially. |
Thanks for double-checking, I have added the change and it looks that it also passed the tests. In case there is anything more in need of changing please let me know. |
LGTM. Thanks. |
short story: 'corosync-cmapctl nodelist' output and local IDs can be different on
different nodes of same cluster while "real" nodeID is always same in
same cluster. Therefore use real "nodeID" for consistent behaviour.
long story:
I have noticed that sometimes my test systems ends up using same key for
fence_scsi
without any particular reason. I wondered what can be cause so I have tried re-ordering nodes in the/etc/corosync/corosync.conf
preserving all the other values. For example below is nodelist from both nodes of Fedora 30 cluster (same happens on CentOS 7.6, but for short I will omit config).fastvm-fedora30-93
/etc/corosync/corosync.conf
fastvm-fedora30-94
/etc/corosync/corosync.conf
Above configs are "logically" same, but just changing the ordering causes the
corosync-cmapctl nodelist
to give different ID in the list of nodes on different nodes as seen below.CentOS 7.6 -
corosync-2.4.3-4.el7.x86_64
Fedora 30 -
corosync-3.0.2-1.fc30.x86_64
In the end I think that when using the
get_node_id
function the fence_scsi agent should really use nodeID and not just arbitrary ID that might be different depending on the node. I have no evidence at this time when the nodelist got bad during normal operation like adding/removing nodes or who know what can affect the nodelist output. However above proof of concept should be sufficient enough to point out that maybe more stable nodeID should be used here.Proposed code change here takes the ID and tries to match it to 'nodeid' from
corosync-cmapctl nodelist
output for given node which fixes problem when above situation is introduced. For backward compatibility the detected nodeID is decreased by 1 to match old behaviour so upgrade should be seemles here. (cluster node IDs starts from1
, whilecorosync-cmapctl nodelist
listing starts from0
).Tested on CentOS 7.6 and Fedora 30 two-node clusters.