Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It's not possible to rebuild a cluster after node failure #350

Open
john-terrell opened this issue May 22, 2024 · 0 comments
Open

It's not possible to rebuild a cluster after node failure #350

john-terrell opened this issue May 22, 2024 · 0 comments

Comments

@john-terrell
Copy link

Issue report

Testing Microceph on a three node cluster. Removing a node (to simulate a failure) and rebuilding it, it's not possible to rejoin the cluster. There's no way to remove the OSDs from the failed node as this attempts to contact the node that failed (using microceph disk remove). Without being able to remove the OSDs, it's not possible to remove the failed node from the cluster (using microceph cluster remove).

What version of MicroCeph are you using ?

18.2.0+snap71f71782c5

What are the steps to reproduce this issue ?

  1. Install Microceph on three nodes
  2. Remove one of the nodes to simulate a node failing.
  3. Unable to remove the failed node from Microceph since removing OSDs tries to contact the failed node.

What happens (observed behaviour) ?

Unable to rejoin the node since Microceph thinks the node already exists.

What were you expecting to happen ?

Relevant logs, error output, etc.

If it’s considerably long, please paste to https://gist.github.com/ and insert the link here.

Additional comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant