New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cephadm: allow redeploy of daemons in error state if container running #39385
Conversation
| (_, state, _) = check_unit(ctx, unit_name) | ||
| if state == 'running': | ||
| if state == 'running' or is_container_running(ctx, container_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check_unit returns (enabled, state, installed). What about simplifying this to
(_, _, installed) = check_unit(ctx, unit_name)
if installed:My thinking is, independent of the state, we're redeploying the daemon, if it is already installed. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds right. I'm going to test this and if it works I'll swap to this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sebastian-philipp This seems like it should work in most situations. However, it is possible to be in a state where this check passes and it considers this a redeploy but the port needed is actually taken. For example, stop the node-exporter service on a host, use the nc tool to listen to node-exporter's port nc -l localhost 9100 and then go and run a ceph orch redeploy node-exporter command. The node-exporter service will fail due to the port being taken but the user will never get the error message specifying this is the issue because the port wasn't checked. By checking for the container being up I think we can be sure node-exporter is holding the port it needs and redeploy can be done. Now, all of this seems like a pretty unlikely scenario so I'm fine still going with this change if you think this is acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you think this is acceptable.
by now, you're the expert in this part of the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer checking the container. I think it's a more accurate view of whether we're actually redeploying or not.
|
Does it make sense to add |
Signed-off-by: Adam King <adking@redhat.com>
eac9979
to
71613e6
Compare
I want an additional change that makes sure all daemons were deployed by a mgr running the code that we're upgrading to at the end of an upgrade first. This PR helps deal with that issue or similar issues if they come up but I don't consider this a fix for the upgrade problem. |
Signed-off-by: Adam King adking@redhat.com
Right now any daemon that is in an error state, has its container running and has ports to check cannot be redeployed as the binary will still consider it an initial deployment and check the ports which will be in use by the container. In situations like
those described here https://tracker.ceph.com/issues/49013 where the error state is caused by a non-updated unit.run file a redeploy could fix the daemon. However, the redeploy would be blocked due to the ports being in use and the user would be required to manually go stop the services or fix the unit.run files. With this change, a situation like this could be fixed by simply redeploying the daemons using orchestrator commands instead.
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox