Skip to content

Commit

Permalink
Merge pull request #45786 from adk3798/staggered-upgrade
Browse files Browse the repository at this point in the history
mgr/cephadm: staggered upgrade

Reviewed-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
Reviewed-by: Redouane Kachach <rkachach@redhat.com>
  • Loading branch information
adk3798 committed May 20, 2022
2 parents c6e5724 + 6a68def commit 54cdc1d
Show file tree
Hide file tree
Showing 12 changed files with 931 additions and 261 deletions.
97 changes: 97 additions & 0 deletions doc/cephadm/upgrade.rst
Original file line number Diff line number Diff line change
Expand Up @@ -188,3 +188,100 @@ you need. For example, the following command upgrades to a development build:
ceph orch upgrade start --image quay.io/ceph-ci/ceph:recent-git-branch-name

For more information about available container images, see :ref:`containers`.

Staggered Upgrade
=================

Some users may prefer to upgrade components in phases rather than all at once.
The upgrade command, starting in 16.2.10 and 17.2.1 allows parameters
to limit which daemons are upgraded by a single upgrade command. The options in
include ``daemon_types``, ``services``, ``hosts`` and ``limit``. ``daemon_types``
takes a comma-separated list of daemon types and will only upgrade daemons of those
types. ``services`` is mutually exclusive with ``daemon_types``, only takes services
of one type at a time (e.g. can't provide an OSD and RGW service at the same time), and
will only upgrade daemons belonging to those services. ``hosts`` can be combined
with ``daemon_types`` or ``services`` or provided on its own. The ``hosts`` parameter
follows the same format as the command line options for :ref:`orchestrator-cli-placement-spec`.
``limit`` takes an integer > 0 and provides a numerical limit on the number of
daemons cephadm will upgrade. ``limit`` can be combined with any of the other
parameters. For example, if you specify to upgrade daemons of type osd on host
Host1 with ``limit`` set to 3, cephadm will upgrade (up to) 3 osd daemons on
Host1.

Example: specifying daemon types and hosts:

.. prompt:: bash #

ceph orch upgrade start --image <image-name> --daemon-types mgr,mon --hosts host1,host2

Example: specifying services and using limit:

.. prompt:: bash #

ceph orch upgrade start --image <image-name> --services rgw.example1,rgw.example2 --limit 2

.. note::

Cephadm strictly enforces an order to the upgrade of daemons that is still present
in staggered upgrade scenarios. The current upgrade ordering is
``mgr -> mon -> crash -> osd -> mds -> rgw -> rbd-mirror -> cephfs-mirror -> iscsi -> nfs``.
If you specify parameters that would upgrade daemons out of order, the upgrade
command will block and note which daemons will be missed if you proceed.

.. note::

Upgrade commands with limiting parameters will validate the options before beginning the
upgrade, which may require pulling the new container image. Do not be surprised
if the upgrade start command takes a while to return when limiting parameters are provided.

.. note::

In staggered upgrade scenarios (when a limiting parameter is provided) monitoring
stack daemons including Prometheus and node-exporter are refreshed after the Manager
daemons have been upgraded. Do not be surprised if Manager upgrades thus take longer
than expected. Note that the versions of monitoring stack daemons may not change between
Ceph releases, in which case they are only redeployed.

Upgrading to a version that supports staggered upgrade from one that doesn't
----------------------------------------------------------------------------

While upgrading from a version that already supports staggered upgrades the process
simply requires providing the necessary arguments. However, if you wish to upgrade
to a version that supports staggered upgrade from one that does not, there is a
workaround. It requires first manually upgrading the Manager daemons and then passing
the limiting parameters as usual.

.. warning::
Make sure you have multiple running mgr daemons before attempting this procedure.

To start with, determine which Manager is your active one and which are standby. This
can be done in a variety of ways such as looking at the ``ceph -s`` output. Then,
manually upgrade each standby mgr daemon with:

.. prompt:: bash #

ceph orch daemon redeploy mgr.example1.abcdef --image <new-image-name>

.. note::

If you are on a very early version of cephadm (early Octopus) the ``orch daemon redeploy``
command may not have the ``--image`` flag. In that case, you must manually set the
Manager container image ``ceph config set mgr container_image <new-image-name>`` and then
redeploy the Manager ``ceph orch daemon redeploy mgr.example1.abcdef``

At this point, a Manager fail over should allow us to have the active Manager be one
running the new version.

.. prompt:: bash #

ceph mgr fail

Verify the active Manager is now one running the new version. To complete the Manager
upgrading:

.. prompt:: bash #

ceph orch upgrade start --image <new-image-name> --daemon-types mgr

You should now have all your Manager daemons on the new version and be able to
specify the limiting parameters for the rest of the upgrade.
1 change: 1 addition & 0 deletions qa/suites/orch/cephadm/upgrade/3-upgrade/.qa
111 changes: 111 additions & 0 deletions qa/suites/orch/cephadm/upgrade/3-upgrade/staggered.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
tasks:
- cephadm.shell:
env: [sha1]
mon.a:
- radosgw-admin realm create --rgw-realm=r --default
- radosgw-admin zonegroup create --rgw-zonegroup=default --master --default
- radosgw-admin zone create --rgw-zonegroup=default --rgw-zone=z --master --default
- radosgw-admin period update --rgw-realm=r --commit
- ceph orch apply rgw r z --placement=2 --port=8000
- sleep 180
- ceph config set mon mon_warn_on_insecure_global_id_reclaim false --force
- ceph config set mon mon_warn_on_insecure_global_id_reclaim_allowed false --force
- ceph config set global log_to_journald false --force
# get some good info on the state of things pre-upgrade. Useful for debugging
- ceph orch ps
- ceph versions
- ceph -s
- ceph orch ls
# doing staggered upgrade requires mgr daemons being on a version that contains the staggered upgrade code
# until there is a stable version that contains it, we can test by manually upgrading a mgr daemon
- ceph config set mgr container_image quay.ceph.io/ceph-ci/ceph:$sha1
- ceph orch daemon redeploy "mgr.$(ceph mgr dump -f json | jq .standbys | jq .[] | jq -r .name)"
- ceph orch ps --refresh
- sleep 180
# gather more possible debugging info
- ceph orch ps
- ceph versions
- ceph -s
# check that there are two different versions found for mgr daemon (which implies we upgraded one)
- ceph versions | jq -e '.mgr | length == 2'
- ceph mgr fail
- sleep 180
# now try upgrading the other mgr
# we should now have access to --image flag for the daemon redeploy command
- ceph orch daemon redeploy "mgr.$(ceph mgr dump -f json | jq .standbys | jq .[] | jq -r .name)" --image quay.ceph.io/ceph-ci/ceph:$sha1
- ceph orch ps --refresh
- sleep 180
# gather more possible debugging info
- ceph orch ps
- ceph versions
- ceph -s
- ceph mgr fail
- sleep 180
# gather more debugging info
- ceph orch ps
- ceph versions
- ceph -s
# now that both mgrs should have been redeployed with the new version, we should be back on only 1 version for the mgrs
- ceph versions | jq -e '.mgr | length == 1'
- ceph mgr fail
- sleep 180
# debugging info
- ceph orch ps
- ceph versions
# to make sure mgr daemons upgrade is fully completed, including being deployed by a mgr on new new version
# also serves as an early failure if manually upgrading the mgrs failed as --daemon-types won't be recognized
- ceph orch upgrade start --image quay.ceph.io/ceph-ci/ceph:$sha1 --daemon-types mgr
- while ceph orch upgrade status | jq '.in_progress' | grep true && ! ceph orch upgrade status | jq '.message' | grep Error ; do ceph orch ps ; ceph versions ; ceph orch upgrade status ; sleep 30 ; done
# verify only one version found for mgrs and that their version hash matches what we are upgrading to
- ceph versions | jq -e '.mgr | length == 1'
- ceph versions | jq -e '.mgr | keys' | grep $sha1
# verify overall we still se two versions, basically to make sure --daemon-types wans't ignored and all daemons upgraded
- ceph versions | jq -e '.overall | length == 2'
# check that exactly two daemons have been upgraded to the new image (our 2 mgr daemons)
- ceph orch upgrade check quay.ceph.io/ceph-ci/ceph:$sha1 | jq -e '.up_to_date | length == 2'
# upgrade only the mons on one of the two hosts
- ceph orch upgrade start --image quay.ceph.io/ceph-ci/ceph:$sha1 --daemon-types mon --hosts $(ceph orch ps | grep mgr.x | awk '{print $2}')
- while ceph orch upgrade status | jq '.in_progress' | grep true && ! ceph orch upgrade status | jq '.message' | grep Error ; do ceph orch ps ; ceph versions ; ceph orch upgrade status ; sleep 30 ; done
- ceph orch ps
# verify tow different version seen for mons
- ceph versions | jq -e '.mon | length == 2'
# upgrade mons on the other hosts
- ceph orch upgrade start --image quay.ceph.io/ceph-ci/ceph:$sha1 --daemon-types mon --hosts $(ceph orch ps | grep mgr.y | awk '{print $2}')
- while ceph orch upgrade status | jq '.in_progress' | grep true && ! ceph orch upgrade status | jq '.message' | grep Error ; do ceph orch ps ; ceph versions ; ceph orch upgrade status ; sleep 30 ; done
- ceph orch ps
# verify all mons now on same version and version hash matches what we are upgrading to
- ceph versions | jq -e '.mon | length == 1'
- ceph versions | jq -e '.mon | keys' | grep $sha1
# verify exactly 5 daemons are now upgraded (2 mgrs, 3 mons)
- ceph orch upgrade check quay.ceph.io/ceph-ci/ceph:$sha1 | jq -e '.up_to_date | length == 5'
# upgrade exactly 2 osd daemons
- ceph orch upgrade start --image quay.ceph.io/ceph-ci/ceph:$sha1 --daemon-types osd --limit 2
- while ceph orch upgrade status | jq '.in_progress' | grep true && ! ceph orch upgrade status | jq '.message' | grep Error ; do ceph orch ps ; ceph versions ; ceph orch upgrade status ; sleep 30 ; done
- ceph orch ps
# verify two different versions now seen for osds
- ceph versions | jq -e '.osd | length == 2'
# verify exactly 7 daemons have been upgraded (2 mgrs, 3 mons, 2 osds)
- ceph orch upgrade check quay.ceph.io/ceph-ci/ceph:$sha1 | jq -e '.up_to_date | length == 7'
# upgrade one more osd
- ceph orch upgrade start --image quay.ceph.io/ceph-ci/ceph:$sha1 --daemon-types crash,osd --limit 1
- while ceph orch upgrade status | jq '.in_progress' | grep true && ! ceph orch upgrade status | jq '.message' | grep Error ; do ceph orch ps ; ceph versions ; ceph orch upgrade status ; sleep 30 ; done
- ceph orch ps
- ceph versions | jq -e '.osd | length == 2'
# verify now 8 daemons have been upgraded
- ceph orch upgrade check quay.ceph.io/ceph-ci/ceph:$sha1 | jq -e '.up_to_date | length == 8'
# upgrade the rest of the osds
- ceph orch upgrade start --image quay.ceph.io/ceph-ci/ceph:$sha1 --daemon-types crash,osd
- while ceph orch upgrade status | jq '.in_progress' | grep true && ! ceph orch upgrade status | jq '.message' | grep Error ; do ceph orch ps ; ceph versions ; ceph orch upgrade status ; sleep 30 ; done
- ceph orch ps
# verify all osds are now on same version and version hash matches what we are upgrading to
- ceph versions | jq -e '.osd | length == 1'
- ceph versions | jq -e '.osd | keys' | grep $sha1
# upgrade the rgw daemons using --services
- ceph orch upgrade start --image quay.ceph.io/ceph-ci/ceph:$sha1 --services rgw.r.z
- while ceph orch upgrade status | jq '.in_progress' | grep true && ! ceph orch upgrade status | jq '.message' | grep Error ; do ceph orch ps ; ceph versions ; ceph orch upgrade status ; sleep 30 ; done
- ceph orch ps
# verify all rgw daemons on same version and version hash matches what we are upgrading to
- ceph versions | jq -e '.rgw | length == 1'
- ceph versions | jq -e '.rgw | keys' | grep $sha1
# run upgrade one more time with no filter parameters to make sure anything left gets upgraded
- ceph orch upgrade start --image quay.ceph.io/ceph-ci/ceph:$sha1
33 changes: 30 additions & 3 deletions src/pybind/mgr/cephadm/module.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@
from .upgrade import CephadmUpgrade
from .template import TemplateMgr
from .utils import CEPH_IMAGE_TYPES, RESCHEDULE_FROM_OFFLINE_HOSTS_TYPES, forall_hosts, \
cephadmNoImage
cephadmNoImage, CEPH_UPGRADE_ORDER
from .configchecks import CephadmConfigChecks
from .offline_watcher import OfflineHostWatcher

Expand Down Expand Up @@ -2692,10 +2692,37 @@ def upgrade_ls(self, image: Optional[str], tags: bool, show_all_versions: Option
return self.upgrade.upgrade_ls(image, tags, show_all_versions)

@handle_orch_error
def upgrade_start(self, image: str, version: str) -> str:
def upgrade_start(self, image: str, version: str, daemon_types: Optional[List[str]] = None, host_placement: Optional[str] = None,
services: Optional[List[str]] = None, limit: Optional[int] = None) -> str:
if self.inventory.get_host_with_state("maintenance"):
raise OrchestratorError("upgrade aborted - you have host(s) in maintenance state")
return self.upgrade.upgrade_start(image, version)
if daemon_types is not None and services is not None:
raise OrchestratorError('--daemon-types and --services are mutually exclusive')
if daemon_types is not None:
for dtype in daemon_types:
if dtype not in CEPH_UPGRADE_ORDER:
raise OrchestratorError(f'Upgrade aborted - Got unexpected daemon type "{dtype}".\n'
f'Viable daemon types for this command are: {utils.CEPH_TYPES + utils.GATEWAY_TYPES}')
if services is not None:
for service in services:
if service not in self.spec_store:
raise OrchestratorError(f'Upgrade aborted - Got unknown service name "{service}".\n'
f'Known services are: {self.spec_store.all_specs.keys()}')
hosts: Optional[List[str]] = None
if host_placement is not None:
all_hosts = list(self.inventory.all_specs())
placement = PlacementSpec.from_string(host_placement)
hosts = placement.filter_matching_hostspecs(all_hosts)
if not hosts:
raise OrchestratorError(
f'Upgrade aborted - hosts parameter "{host_placement}" provided did not match any hosts')

if limit is not None:
if limit < 1:
raise OrchestratorError(
f'Upgrade aborted - --limit arg must be a positive integer, not {limit}')

return self.upgrade.upgrade_start(image, version, daemon_types, hosts, services, limit)

@handle_orch_error
def upgrade_pause(self) -> str:
Expand Down
3 changes: 2 additions & 1 deletion src/pybind/mgr/cephadm/services/osd.py
Original file line number Diff line number Diff line change
Expand Up @@ -298,7 +298,8 @@ def generate_previews(self, osdspecs: List[DriveGroupSpec], for_host: str) -> Li

# driveselection for host
cmds: List[str] = self.driveselection_to_ceph_volume(ds,
osd_id_claims.filtered_by_host(host),
osd_id_claims.filtered_by_host(
host),
preview=True)
if not cmds:
logger.debug("No data_devices, skipping DriveGroup: {}".format(
Expand Down
12 changes: 8 additions & 4 deletions src/pybind/mgr/cephadm/tests/test_cephadm.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,9 +166,11 @@ def test_re_add_host_receive_loopback(self, resolve_ip, cephadm_module):
resolve_ip.side_effect = ['192.168.122.1', '127.0.0.1', '127.0.0.1']
assert wait(cephadm_module, cephadm_module.get_hosts()) == []
cephadm_module._add_host(HostSpec('test', '192.168.122.1'))
assert wait(cephadm_module, cephadm_module.get_hosts()) == [HostSpec('test', '192.168.122.1')]
assert wait(cephadm_module, cephadm_module.get_hosts()) == [
HostSpec('test', '192.168.122.1')]
cephadm_module._add_host(HostSpec('test'))
assert wait(cephadm_module, cephadm_module.get_hosts()) == [HostSpec('test', '192.168.122.1')]
assert wait(cephadm_module, cephadm_module.get_hosts()) == [
HostSpec('test', '192.168.122.1')]
with pytest.raises(OrchestratorError):
cephadm_module._add_host(HostSpec('test2'))

Expand Down Expand Up @@ -894,7 +896,8 @@ def test_driveselection_to_ceph_volume(self, cephadm_module, devices, preview, e
ds = DriveSelection(dg, Devices([Device(path) for path in devices]))
preview = preview
out = cephadm_module.osd_service.driveselection_to_ceph_volume(ds, [], preview)
assert all(any(cmd in exp_cmd for exp_cmd in exp_commands) for cmd in out), f'Expected cmds from f{out} in {exp_commands}'
assert all(any(cmd in exp_cmd for exp_cmd in exp_commands)
for cmd in out), f'Expected cmds from f{out} in {exp_commands}'

@pytest.mark.parametrize(
"devices, preview, exp_commands",
Expand All @@ -919,7 +922,8 @@ def test_raw_driveselection_to_ceph_volume(self, cephadm_module, devices, previe
ds = DriveSelection(dg, Devices([Device(path) for path in devices]))
preview = preview
out = cephadm_module.osd_service.driveselection_to_ceph_volume(ds, [], preview)
assert all(any(cmd in exp_cmd for exp_cmd in exp_commands) for cmd in out), f'Expected cmds from f{out} in {exp_commands}'
assert all(any(cmd in exp_cmd for exp_cmd in exp_commands)
for cmd in out), f'Expected cmds from f{out} in {exp_commands}'

@mock.patch("cephadm.serve.CephadmServe._run_cephadm", _run_cephadm(
json.dumps([
Expand Down

0 comments on commit 54cdc1d

Please sign in to comment.