mgr/cephadm: add iscsi and nfs to upgrade process #39677

adk3798 · 2021-02-24T22:50:08Z

Fixes: https://tracker.ceph.com/issues/49462

Signed-off-by: Adam King adking@redhat.com

The reason for the change to the way the keyring is grabbed is because the iscsi caps changed at some point and upgrading from an old version might mean there's an already existent keyring with different caps. The auth get-or-create command fails if you pass an existent entity with different caps and the iscsi daemon could not be redeployed (in which case it cannot be upgraded either).

The reason for avoiding the reconfig due to monmap changes in the middle of the upgrade is it made it possible for similar problems to https://tracker.ceph.com/issues/49013 where the daemon being reconfigured had an old unit.run file that was incompatible with changes in the updated systemd unit file. In that case, specifically for nfs, trying to reconfig the daemon would fail (reconfig does not redeploy the unit.run file for the daemon) and you would have to wait until the call timed out. This made the upgrade significantly slower since it would happen every time the serve loop was entered from when the mons were upgraded until nfs was upgraded.

Checklist

References tracker ticket
Updates documentation if necessary
Includes tests for new functionality or reproducer for bug

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox

src/pybind/mgr/cephadm/serve.py

src/pybind/mgr/cephadm/services/cephadmservice.py

src/pybind/mgr/cephadm/utils.py

sebastian-philipp

lgtm!

sebastian-philipp · 2021-02-26T11:02:09Z

src/pybind/mgr/cephadm/module.py

+            # need iscsi and nfs as well in order to upgrade them
+            if daemon_type not in CEPH_TYPES and daemon_type not in ['nfs', 'iscsi']:


this hunk is still needed?

What do you mean by this comment? Do you think we can just remove the check and allow custom images for any daemon type?

src/pybind/mgr/cephadm/services/cephadmservice.py

src/pybind/mgr/cephadm/module.py

src/pybind/mgr/cephadm/services/cephadmservice.py

sebastian-philipp · 2021-03-02T14:23:26Z

please veirfy this in teutology. Please add either iscsi or nfs (or both) to

https://github.com/ceph/ceph/blob/master/qa/suites/rados/cephadm/upgrade/fixed-2.yaml

See

ceph/qa/suites/rados/cephadm/smoke/fixed-2.yaml

Line 23 in a171b32

- ceph.iscsi.iscsi.a

and

ceph/qa/suites/rados/cephadm/workunits/task/test_orch_cli.yaml

Lines 14 to 17 in a171b32

    
                 - ceph orch apply mds a 
        
           - cephfs_test_runner: 
        
               modules: 
        
                 - tasks.cephfs.test_nfs

github-actions · 2021-03-04T20:37:56Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

adk3798 · 2021-03-04T21:27:16Z

There was an issue here with the way the new ok-to-stop functions worked with upgrade. We don't pause upgrade if an ok-to-stop check fails or notify the user so in a case where you have something like only have 1 nfs daemon, the ok-to-stop would perpetually fail as the check is on a static condition that requires change from the user, and the upgrade would go on forever without telling the user anything is wrong. The new ok-to-stop functions make sense for something like putting a host in maintenance mode but simply don't work well with the upgrade. I opted for limiting the ok-to-stop checks in upgrade to the daemons who had a defined ok-to-stop before the addition of host maintenance (mon, osd and mds, the ones with an actual ceph ok-to-stop command) to avoid the situation. If we don't want to do that, some change to the ok-to-stop functions would have to happen such as having a stronger force flag or making them aware of when an upgrade is happening. Even without this PR adding isci and nfs to upgrade I think the issue might already exist if there is only a single rgw daemon during an upgrade.

adk3798 · 2021-03-04T21:28:58Z

@sebastian-philipp I added iscsi directly to the fixed-2.yaml file. How would adding nfs work? I'm assuming I can't just copy the cephfs_test_runner block from the example you linked into the fixed-2.yaml.

github-actions · 2021-03-10T17:45:25Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

If the caps change from the old version to the new one it causes issues in the upgrade. This allows the caps to be updated. Currently only seeing this with iscsi but changing it for other as a precaution Signed-off-by: Adam King <adking@redhat.com>

Fixes: https://tracker.ceph.com/issues/49462 Signed-off-by: Adam King <adking@redhat.com>

Fixes: https://tracker.ceph.com/issues/54502 Signed-off-by: Tatjana Dehler <tdehler@suse.com> (cherry picked from commit e233ed0) Conflicts: src/pybind/mgr/cephadm/services/cephadmservice.py Fixed conflict because `get_keyring_with_caps` has not been backported to octopus: ceph#39677 src/pybind/mgr/cephadm/services/monitoring.py Fixed a few conflicts because the upstream master contains some improvements around handling URLs, e.g. ceph#43579

Fixes: https://tracker.ceph.com/issues/54502 Signed-off-by: Tatjana Dehler <tdehler@suse.com> (cherry picked from commit 4f14993) Conflicts: src/pybind/mgr/cephadm/services/cephadmservice.py Fixed conflict because `get_keyring_with_caps` has not been backported to octopus: ceph#39677 src/pybind/mgr/cephadm/services/monitoring.py Fixed a few conflicts because the master contains some improvements around handling URLs, e.g. ceph#43579

adk3798 added needs-review cephadm labels Feb 24, 2021

adk3798 requested a review from a team as a code owner February 24, 2021 22:50

github-actions bot added the pybind label Feb 24, 2021

sebastian-philipp suggested changes Feb 25, 2021

View reviewed changes

src/pybind/mgr/cephadm/serve.py Outdated Show resolved Hide resolved

src/pybind/mgr/cephadm/services/cephadmservice.py Outdated Show resolved Hide resolved

src/pybind/mgr/cephadm/utils.py Outdated Show resolved Hide resolved

sebastian-philipp requested a review from mgfritch February 25, 2021 17:34

adk3798 force-pushed the upgrade-iscsi-nfs branch from 1317837 to 8d227a2 Compare February 25, 2021 18:16

sebastian-philipp suggested changes Feb 26, 2021

View reviewed changes

mgfritch reviewed Feb 27, 2021

View reviewed changes

src/pybind/mgr/cephadm/module.py Outdated Show resolved Hide resolved

src/pybind/mgr/cephadm/services/cephadmservice.py Show resolved Hide resolved

src/pybind/mgr/cephadm/services/cephadmservice.py Outdated Show resolved Hide resolved

sebastian-philipp added wip-swagner-testing My Teuthology tests and removed wip-swagner-testing My Teuthology tests labels Mar 1, 2021

adk3798 force-pushed the upgrade-iscsi-nfs branch from 8d227a2 to a8c1220 Compare March 2, 2021 20:14

adk3798 added the DNM label Mar 2, 2021

adk3798 force-pushed the upgrade-iscsi-nfs branch 3 times, most recently from 7298018 to 9f26ac7 Compare March 4, 2021 20:29

github-actions bot added the core label Mar 4, 2021

adk3798 removed the DNM label Mar 4, 2021

github-actions bot added the needs-rebase label Mar 4, 2021

adk3798 force-pushed the upgrade-iscsi-nfs branch from 9f26ac7 to f519ef5 Compare March 4, 2021 21:14

github-actions bot removed the needs-rebase label Mar 4, 2021

mgfritch approved these changes Mar 5, 2021

View reviewed changes

jmolmo approved these changes Mar 8, 2021

View reviewed changes

sebastian-philipp added the needs-qa label Mar 9, 2021

sebastian-philipp approved these changes Mar 9, 2021

View reviewed changes

liewegas added the wip-sage-testing label Mar 9, 2021

liewegas mentioned this pull request Mar 9, 2021

mgr/cephadm: use existing cephx key it if varies #39956

Closed

sebastian-philipp added the needs-rebase label Mar 10, 2021

adk3798 force-pushed the upgrade-iscsi-nfs branch from f519ef5 to 704cffd Compare March 10, 2021 13:34

github-actions bot removed the needs-rebase label Mar 10, 2021

github-actions bot added the needs-rebase label Mar 10, 2021

adk3798 added 2 commits March 10, 2021 17:08

mgr/cephadm: add iscsi and nfs to upgrade

20e7b4d

Fixes: https://tracker.ceph.com/issues/49462 Signed-off-by: Adam King <adking@redhat.com>

adk3798 force-pushed the upgrade-iscsi-nfs branch from 704cffd to 20e7b4d Compare March 10, 2021 22:10

github-actions bot removed the needs-rebase label Mar 10, 2021

liewegas merged commit 99971f7 into ceph:master Mar 11, 2021

liewegas mentioned this pull request Mar 15, 2021

pacific: cephadm: Batch backport March (2) #40135

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mgr/cephadm: add iscsi and nfs to upgrade process #39677

mgr/cephadm: add iscsi and nfs to upgrade process #39677

adk3798 commented Feb 24, 2021 •

edited

sebastian-philipp left a comment

sebastian-philipp Feb 26, 2021

adk3798 Mar 2, 2021

sebastian-philipp commented Mar 2, 2021

github-actions bot commented Mar 4, 2021

adk3798 commented Mar 4, 2021

adk3798 commented Mar 4, 2021

github-actions bot commented Mar 10, 2021

		# need iscsi and nfs as well in order to upgrade them
		if daemon_type not in CEPH_TYPES and daemon_type not in ['nfs', 'iscsi']:

mgr/cephadm: add iscsi and nfs to upgrade process #39677

mgr/cephadm: add iscsi and nfs to upgrade process #39677

Conversation

adk3798 commented Feb 24, 2021 • edited

Checklist

sebastian-philipp left a comment

Choose a reason for hiding this comment

sebastian-philipp Feb 26, 2021

Choose a reason for hiding this comment

adk3798 Mar 2, 2021

Choose a reason for hiding this comment

sebastian-philipp commented Mar 2, 2021

github-actions bot commented Mar 4, 2021

adk3798 commented Mar 4, 2021

adk3798 commented Mar 4, 2021

github-actions bot commented Mar 10, 2021

adk3798 commented Feb 24, 2021 •

edited