New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cephadm: allow ports to be opened in firewall during adoption, reconfig, redeploy #51070
Conversation
6c32f07
to
7d601e0
Compare
5 failures:
Overall, failures should only block merging for the 2 PRs related to the staggered upgrade and cephadm version changes. Everything else should be fine. |
jenkins test api |
jenkins test dashboard cephadm |
src/cephadm/cephadm.py
Outdated
raise Error("TCP Port(s) '{}' required for {} already in use".format(','.join(map(str, ports)), daemon_type)) | ||
# only check port in use if not reconfig or redeploy since service | ||
# we are redeploying/reconfiguring will already be using the port | ||
if not reconfig and not redeploy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a slight pet peeve about booleans used this way. Long story short: what does it mean if the caller passes both reconfig=True and redeploy=True ? Right now they basically have this function act in the same way but what about the future. Since these are IIRC mutually exclusive states, the code should try to reflect that (as reasonably) possible. Why not add a enum? Enums are already used in cephadm. Something like:
class DeploymentKind(Enum):
DEFAULT = 0
RECONFIG = 1
REDEPLOY = 2
def deploy_daemon(..., kind: DeploymentKind = DeploymentKind.DEFAULT, ...)
This is food-for-thought, maybe next time not a hard requirement BTW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for this suggestion as these options are in fact mutually exclusive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added another commit that converts these bools to an enum class as recommended here
Failed/dead job reruns: https://pulpito.ceph.com/adking-2023-05-21_15:51:34-orch:cephadm-wip-adk-testing-2023-05-20-1214-distro-default-smithi/ Failures after reruns:
Overall, nothing to block merging outside of the ones affecting jaeger-tracing, staggered upgrade test, and cephadm version command |
reruns of failed/dead jobs: https://pulpito.ceph.com/adking-2023-05-30_19:08:27-orch:cephadm-wip-adk-testing-2023-05-27-2231-distro-default-smithi/ After reruns, 1 failure:
|
jenkins test api |
7d601e0
to
219971f
Compare
Prior to this patch we were discarding the provided ports on reconfig and redeploy in order to not fail thinking there was a port conflict with the instance of the daemon we were about to reconfig/redeploy. However, it's still desirable for us to make sure the firewall ports are open when we do a reconfig/redpeloy, so this refactors the port handling approach to have it do that but still avoid checking for port conflicts. It also include an update of the type signature of deploy_daemon to the py3 style. That wasn't needed for the change but since I was added an arugment there I thought we might as well do it now. Signed-off-by: Adam King <adking@redhat.com>
Otherwise we risk the prometheus/alertmanager/grafana not functioning properly after adoption due to the necessary port in the firewall not being open. Fixes: https://tracker.ceph.com/issues/59443 Signed-off-by: Adam King <adking@redhat.com>
Since the options are mutually exclusive, using an enum is preferable to having multiple bools to track each of them Signed-off-by: Adam King <adking@redhat.com>
219971f
to
7081759
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Reruns of failed/dead jobs (some also marked still "running" in the original run at the time of writing this, due to the original ansible to set up the node not completing): https://pulpito.ceph.com/adking-2023-06-14_12:47:06-orch:cephadm-wip-adk-testing-2023-06-13-1036-distro-default-smithi/ After reruns, 2 failures:
Nothing there to block merging |
Fixes: https://tracker.ceph.com/issues/59443
Signed-off-by: Adam King adking@redhat.com
The commit messages go a bit more into each of the two parts of this
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows