Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/cephadm: make SMB and NVMEoF upgrade last in staggered upgrade #57292

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

adk3798
Copy link
Contributor

@adk3798 adk3798 commented May 6, 2024

This needs to happen as some work on the NVMEoF side (still unmerged as of writing this) will make the NVMEoF daemon dependent on the mon. Prior to this patch, in a staggered upgrade, all daemons not using the ceph image were upgraded after the mgr since we typically only care about the default image changing or potential changes to how we handle our systemd units which only needs the mgr to be upgraded to be applied. This NVMEoF dependency on the mon changes this and we can no longer upgrade it directly after the mgr. This patch changes it so the NVMEoF daemon is instead upgraded after all ceph image daemons have been upgraded in a staggered upgrade scenario. Non-staggered upgrades are unaffected as the NVMEoF daemon was already upgraded near the end in that scenario. The SMB dameon has no reason it needs to be upgraded later, but it's in the (small) pool of daemons that don't use the ceph image and aren't for monitoring, so it's been affected by this as well.

NOTE: This is a bit of an ugly patch imo and shows that a refactoring of the upgrade code is likely required. Hopefully this patch is more of a stopgap until that larger effort can be made

Fixes: https://tracker.ceph.com/issues/65809

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

Copy link
Contributor

@phlogistonjohn phlogistonjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me.

# no ceph daemons need upgrade
dds = [d for d in self.mgr.cache.get_daemons_by_type(
daemon_type) if d.name() not in need_upgrade_names]
_, __, n2, ___ = self._detect_need_upgrade(dds, target_digests, target_image)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have gone for n2 = self._detect_need_upgrade(...)[2] but this is fine for now.

@adk3798
Copy link
Contributor Author

adk3798 commented May 10, 2024

jenkins test make check

@adk3798
Copy link
Contributor Author

adk3798 commented May 10, 2024

https://pulpito.ceph.com/adking-2024-05-09_03:09:40-orch:cephadm-wip-adk-testing-2024-05-08-1927-distro-default-smithi/

failures:

  • 21 in cluster log type failures that are still being cleaned up, known issue
  • 3 mds_upgrade_sequence failures, known issue
  • 1 test failed installing ceph-fuse Status code: 503 for https://mirrors.centos.org/metalink?repo=centos-baseos-9-stream&arch=x86_64&protocol=https,http, happens on occasion, nothing to block merging over

Overall, nothing unexpected in the run

This needs to happen as some work on the NVMEoF side (still unmerged
as of writing this) will make the NVMEoF daemon dependent on the mon.
Prior to this patch, in a staggered upgrade, all daemons not using the
ceph image were upgraded after the mgr since we typically only care
about the default image changing or potential changes to how we handle
our systemd units which only needs the mgr to be upgraded to be applied.
This NVMEoF dependency on the mon changes this and we can no longer
upgrade it directly after the mgr. This patch changes it so the NVMEoF
daemon is instead upgraded after all ceph image daemons have been
upgraded in a staggered upgrade scenario. Non-staggered upgrades
are unaffected as the NVMEoF daemon was already upgraded near the
end in that scenario. The SMB dameon has no reason it needs to be
upgraded later, but it's in the (small) pool of daemons that don't
use the ceph image and aren't for monitoring, so it's been affected
by this as well.

NOTE: This is a bit of an ugly patch imo and shows that a refactoring
of the upgrade code is likely required. Hopefully this patch is more
of a stopgap until that larger effort can be made

Fixes: https://tracker.ceph.com/issues/65809

Signed-off-by: Adam King <adking@redhat.com>
@adk3798 adk3798 force-pushed the staggered-upgrade-non-ceph-daemons branch from 45d9054 to 5e7a3c2 Compare May 10, 2024 16:20
@adk3798
Copy link
Contributor Author

adk3798 commented May 10, 2024

jenkins test api

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants