Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/cephadm: redeploy monitoring stack daemons if their image changes #40507

Closed
wants to merge 1 commit into from

Conversation

adk3798
Copy link
Contributor

@adk3798 adk3798 commented Mar 30, 2021

Fixes: https://tracker.ceph.com/issues/50061

Signed-off-by: Adam King adking@redhat.com

Idea is to automatically redeploy the relevant monitoring stack daemon if the user changes the image we're using for it using 'ceph config set . . .'

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

…ner image config setting

Fixes: https://tracker.ceph.com/issues/50061

Signed-off-by: Adam King <adking@redhat.com>
@adk3798 adk3798 requested a review from a team as a code owner March 30, 2021 19:22
@liewegas
Copy link
Member

I like the simplicity. IIUC we would update the default config value and as soon as the mgr is upgraded the monitoring daemons would redeploy.

The upgrade sequence would end up being:

  • ceph orch upgrade start ...
  • N-1 mgrs upgraded
  • mgr failover
  • monitoring daemons updated
  • N-1 mgrs redeployed
  • mgr failover again
  • last mgr redeployed
  • monitors upgraded
  • ...

It's an odd spot to put the monitoring upgrades in the sequence, but harmless.

One thing that might be a bit confusing: changing hte container_image ceph option doesn't have any immediate impact unless you explicitly redeploy; you have to do ceph orch upgrade start to trigger a ceph upgrade. OTOH, just changing the mgr/cephadm/*_image option and it'll immediately upgrade the non-ceph daemon.

@tchaikov
Copy link
Contributor

jenkins test api

if not action and dd.daemon_type in MONITORING_STACK_TYPES and not self.mgr.spec_store[dd.service_name()].deleted:
if dd.container_image_name != self.mgr._get_container_image(dd.name()):
self.log.info('Redeploying %s. Daemon\'s current image does not match mgr/cephadm/container_image_%s config setting.'
% (dd.name(), dd.daemon_type))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Redeploying %s (container image changed)...' % dd.name() so this is consistent with the other messages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, no need to use % for formatting the logging message, just

self.log.info('Redeploying %s. Daemon\'s current image does not match mgr/cephadm/container_image_%s config setting.',
              dd.name(), dd.daemon_type)

@sebastian-philipp
Copy link
Contributor

I'm a bit concerned that we're introducing a special case for the monitoring daemons here.

This already contains the Monitoring daemons:

for daemon_type in CEPH_UPGRADE_ORDER:

On the other hand, the upgrade is odd actually, as it does not follow the declarative approach.

Imagine the upgrade would work like for each daemon:

  1. set the container_image config
  2. call _check_daemons()

Which would be great and make cephadm even more declarative.

I see two good ways here::

Either we opt for making cephadm more declarative by redeploying daemons, if the container_image does not match. (Not just monitoring, but all)

Or we opt for keeping the current behavior and keep the existing upgrade logic.

@stale
Copy link

stale bot commented Jul 21, 2021

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Jul 21, 2021
@adk3798 adk3798 closed this Jul 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants