Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cephadm: allow ports to be opened in firewall during adoption, reconfig, redeploy #51070

Merged
merged 3 commits into from Jun 14, 2023

Conversation

adk3798
Copy link
Contributor

@adk3798 adk3798 commented Apr 13, 2023

Fixes: https://tracker.ceph.com/issues/59443

Signed-off-by: Adam King adking@redhat.com

The commit messages go a bit more into each of the two parts of this

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@adk3798 adk3798 requested a review from a team as a code owner April 13, 2023 18:35
src/cephadm/cephadm.py Outdated Show resolved Hide resolved
@adk3798
Copy link
Contributor Author

adk3798 commented May 3, 2023

https://pulpito.ceph.com/adking-2023-05-03_12:22:42-orch:cephadm-wip-adk-testing-2023-05-02-1906-distro-default-smithi/

5 failures:

  • 2 failures on staggered upgrade test. Combination of https://tracker.ceph.com/issues/58535 plus the attempted fix not quite working yet
  • 1 failure in the test_cephadm task. Related to a PR in the run changing the cephadm version command
  • 1 failure deploying jaegar-tracing. Still need to investigate this, but it's not new to this run, so shouldn't block merging.
  • 1 failure making an OSD in the mds upgrade sequence ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-5/: (5) Input/output error. These pop up occasionally in hte lab, most likely as a result of us using split nvme devices for OSDs. Nothing that would block merging.

Overall, failures should only block merging for the 2 PRs related to the staggered upgrade and cephadm version changes. Everything else should be fine.

@adk3798
Copy link
Contributor Author

adk3798 commented May 3, 2023

jenkins test api

@adk3798
Copy link
Contributor Author

adk3798 commented May 3, 2023

jenkins test dashboard cephadm

raise Error("TCP Port(s) '{}' required for {} already in use".format(','.join(map(str, ports)), daemon_type))
# only check port in use if not reconfig or redeploy since service
# we are redeploying/reconfiguring will already be using the port
if not reconfig and not redeploy:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight pet peeve about booleans used this way. Long story short: what does it mean if the caller passes both reconfig=True and redeploy=True ? Right now they basically have this function act in the same way but what about the future. Since these are IIRC mutually exclusive states, the code should try to reflect that (as reasonably) possible. Why not add a enum? Enums are already used in cephadm. Something like:

class DeploymentKind(Enum):
    DEFAULT = 0
    RECONFIG = 1
    REDEPLOY = 2

def deploy_daemon(..., kind: DeploymentKind = DeploymentKind.DEFAULT, ...)

This is food-for-thought, maybe next time not a hard requirement BTW.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this suggestion as these options are in fact mutually exclusive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added another commit that converts these bools to an enum class as recommended here

@adk3798
Copy link
Contributor Author

adk3798 commented May 21, 2023

https://pulpito.ceph.com/adking-2023-05-21_05:49:02-orch:cephadm-wip-adk-testing-2023-05-20-1214-distro-default-smithi/

Failed/dead job reruns: https://pulpito.ceph.com/adking-2023-05-21_15:51:34-orch:cephadm-wip-adk-testing-2023-05-20-1214-distro-default-smithi/

Failures after reruns:

Overall, nothing to block merging outside of the ones affecting jaeger-tracing, staggered upgrade test, and cephadm version command

@adk3798
Copy link
Contributor Author

adk3798 commented May 31, 2023

jenkins test api

src/cephadm/cephadm.py Outdated Show resolved Hide resolved
src/cephadm/cephadm.py Outdated Show resolved Hide resolved
src/cephadm/cephadm.py Show resolved Hide resolved
src/cephadm/cephadm.py Outdated Show resolved Hide resolved
src/cephadm/cephadm.py Outdated Show resolved Hide resolved
Prior to this patch we were discarding the provided
ports on reconfig and redeploy in order to not fail
thinking there was a port conflict with the instance
of the daemon we were about to reconfig/redeploy. However,
it's still desirable for us to make sure the firewall ports
are open when we do a reconfig/redpeloy, so this refactors
the port handling approach to have it do that but
still avoid checking for port conflicts. It also include
an update of the type signature of deploy_daemon
to the py3 style. That wasn't needed for the change
but since I was added an arugment there I thought we might
as well do it now.

Signed-off-by: Adam King <adking@redhat.com>
Otherwise we risk the prometheus/alertmanager/grafana
not functioning properly after adoption due to the necessary
port in the firewall not being open.

Fixes: https://tracker.ceph.com/issues/59443

Signed-off-by: Adam King <adking@redhat.com>
Since the options are mutually exclusive, using
an enum is preferable to having multiple bools
to track each of them

Signed-off-by: Adam King <adking@redhat.com>
Copy link
Contributor

@rkachach rkachach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@adk3798
Copy link
Contributor Author

adk3798 commented Jun 14, 2023

https://pulpito.ceph.com/adking-2023-06-13_16:39:13-orch:cephadm-wip-adk-testing-2023-06-13-1036-distro-default-smithi/

Reruns of failed/dead jobs (some also marked still "running" in the original run at the time of writing this, due to the original ansible to set up the node not completing): https://pulpito.ceph.com/adking-2023-06-14_12:47:06-orch:cephadm-wip-adk-testing-2023-06-13-1036-distro-default-smithi/

After reruns, 2 failures:

Nothing there to block merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants