New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding rollback mechanism to handle bootstrap failures #51718
Conversation
bb94d83
to
00d31e6
Compare
jenkins test make check |
I'm still a little wary of having this auto-cleanup opt out rather than opt in by default. Take this as my last official request to, at least initially, make this opt-in for at least one version. I will not harp on this subject and this is the last I will pester you about this. |
We haven't made a final decision yet, as you know this still on a proposal phase and all the options are still open so code can be adjusted easily in case we need to. Let's wait for some more feedback from the community and based on that have some more discussions on the upcoming weekly to decide whether we go with opt out or opt in alternative. |
390e86c
to
312eb19
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, we'll just have to be careful when backporting to swap the default value for the --cleanup-on-failure
flag as discussed in the weekly.
Fixes: https://tracker.ceph.com/issues/57016 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
d404f3e
to
4067e2f
Compare
last force push is to squash all the commits |
reruns of failed/dead jobs: https://pulpito.ceph.com/adking-2023-07-11_19:17:14-orch:cephadm-wip-adk-testing-2023-07-11-0946-distro-default-smithi/ After reruns, 1 failed, 1 dead job
Overall, a pretty clean run. Nothing to block merging |
This PR introduces a simple mechanism to rollback cluster files and processes in case of a failed bootstrap. Two new mutually exclusive options are provided:
--cleanup-on-failure
: when enabled this option will enable the rollback so broken cluster files are deleted automatically--no-cleanup-on-failure
: when enabled this option will disable the rollback keeping the broken cluster filesThese options are provided to ease the backport of this new feature. In older releases the rollback mechanism will be disabled by default to keep the legacy behaviour (nothing is deleted). In this case the user will need to pass the
--cleanup-on-failure
flag in case he wants to enable the rollback mechanism. In main the rollback feature will be enabled by default.Fixes: https://tracker.ceph.com/issues/57016
Known issue:
In case of a keyboard interruption (ctrl+c) sometimes the following message appears. It seems to be related with some asyncio which is not being closed appropriately:
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows