Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node left cluster, caused full cluster fence, nodes could not rejoin after reboot #705

Open
mvernimmen opened this issue Sep 16, 2022 · 2 comments

Comments

@mvernimmen
Copy link

Hi,

We've been having some issues with one of our proxmox cluster recently. After investigating the issue we are experiencing seems to be related to corosync behavior. This specific cluster has 28 nodes and can run stable for months. Yesterday we powered off 1 of the servers in the cluster. Immediately after all nodes got fenced and rebooted.
After reboot, all nodes were in a state where they did not rejoin or did not rejoin properly.
Restarting corosync on several machines did not change the situation, not even if all but a few machines were powered down. The only way to get out of this situation was to power down all machines and power them up again one by one.

I'm not a corosync expert by any stretch of the imagination. I provide here (https://forum.proxmox.com/attachments/archive-zip.41145/) some logs that I hope will enable someone to point to the reason for all nodes getting fenced, and hopefully also to figure out why the nodes would not rejoin.
This has happened several times recently and I expect it to happen a few more times. If any commands can be run to gain additional insight I'd be happy to run them during such an event.

I know this is not a very clear bug report and I'm very sorry for that. I'm hoping for some input to further this investigation and drill down to a clear root cause.

best regards,

Max

@jfriesse
Copy link
Member

Hi,
I'm wondering if this could be similar to #701 - anyway, @Fabian-Gruenbichler (and other proxmox guys) will be probably able to identify problem much better than I do, so I will let them to share their view.

@jfriesse
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants