Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/cephadm: add iscsi and nfs to upgrade process #39677

Merged
merged 2 commits into from Mar 11, 2021

Conversation

adk3798
Copy link
Contributor

@adk3798 adk3798 commented Feb 24, 2021

Fixes: https://tracker.ceph.com/issues/49462

Signed-off-by: Adam King adking@redhat.com

The reason for the change to the way the keyring is grabbed is because the iscsi caps changed at some point and upgrading from an old version might mean there's an already existent keyring with different caps. The auth get-or-create command fails if you pass an existent entity with different caps and the iscsi daemon could not be redeployed (in which case it cannot be upgraded either).

The reason for avoiding the reconfig due to monmap changes in the middle of the upgrade is it made it possible for similar problems to https://tracker.ceph.com/issues/49013 where the daemon being reconfigured had an old unit.run file that was incompatible with changes in the updated systemd unit file. In that case, specifically for nfs, trying to reconfig the daemon would fail (reconfig does not redeploy the unit.run file for the daemon) and you would have to wait until the call timed out. This made the upgrade significantly slower since it would happen every time the serve loop was entered from when the mons were upgraded until nfs was upgraded.

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

src/pybind/mgr/cephadm/serve.py Outdated Show resolved Hide resolved
src/pybind/mgr/cephadm/services/cephadmservice.py Outdated Show resolved Hide resolved
src/pybind/mgr/cephadm/utils.py Outdated Show resolved Hide resolved
Copy link
Contributor

@sebastian-philipp sebastian-philipp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Comment on lines 1652 to 1622
# need iscsi and nfs as well in order to upgrade them
if daemon_type not in CEPH_TYPES and daemon_type not in ['nfs', 'iscsi']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this hunk is still needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by this comment? Do you think we can just remove the check and allow custom images for any daemon type?

src/pybind/mgr/cephadm/services/cephadmservice.py Outdated Show resolved Hide resolved
src/pybind/mgr/cephadm/services/cephadmservice.py Outdated Show resolved Hide resolved
@sebastian-philipp sebastian-philipp added wip-swagner-testing My Teuthology tests and removed wip-swagner-testing My Teuthology tests labels Mar 1, 2021
@sebastian-philipp
Copy link
Contributor

please veirfy this in teutology. Please add either iscsi or nfs (or both) to

https://github.com/ceph/ceph/blob/master/qa/suites/rados/cephadm/upgrade/fixed-2.yaml

See

and

- ceph orch apply mds a
- cephfs_test_runner:
modules:
- tasks.cephfs.test_nfs

@adk3798 adk3798 added the DNM label Mar 2, 2021
@adk3798 adk3798 force-pushed the upgrade-iscsi-nfs branch 3 times, most recently from 7298018 to 9f26ac7 Compare March 4, 2021 20:29
@github-actions github-actions bot added the core label Mar 4, 2021
@adk3798 adk3798 removed the DNM label Mar 4, 2021
@github-actions
Copy link

github-actions bot commented Mar 4, 2021

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@adk3798
Copy link
Contributor Author

adk3798 commented Mar 4, 2021

There was an issue here with the way the new ok-to-stop functions worked with upgrade. We don't pause upgrade if an ok-to-stop check fails or notify the user so in a case where you have something like only have 1 nfs daemon, the ok-to-stop would perpetually fail as the check is on a static condition that requires change from the user, and the upgrade would go on forever without telling the user anything is wrong. The new ok-to-stop functions make sense for something like putting a host in maintenance mode but simply don't work well with the upgrade. I opted for limiting the ok-to-stop checks in upgrade to the daemons who had a defined ok-to-stop before the addition of host maintenance (mon, osd and mds, the ones with an actual ceph ok-to-stop command) to avoid the situation. If we don't want to do that, some change to the ok-to-stop functions would have to happen such as having a stronger force flag or making them aware of when an upgrade is happening. Even without this PR adding isci and nfs to upgrade I think the issue might already exist if there is only a single rgw daemon during an upgrade.

@adk3798
Copy link
Contributor Author

adk3798 commented Mar 4, 2021

@sebastian-philipp I added iscsi directly to the fixed-2.yaml file. How would adding nfs work? I'm assuming I can't just copy the cephfs_test_runner block from the example you linked into the fixed-2.yaml.

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

If the caps change from the old version to the new one it causes
issues in the upgrade. This allows the caps to be updated. Currently
only seeing this with iscsi but changing it for other as a precaution

Signed-off-by: Adam King <adking@redhat.com>
Fixes: https://tracker.ceph.com/issues/49462

Signed-off-by: Adam King <adking@redhat.com>
@liewegas liewegas merged commit 99971f7 into ceph:master Mar 11, 2021
s0nea pushed a commit to s0nea/ceph that referenced this pull request Mar 17, 2022
Fixes: https://tracker.ceph.com/issues/54502
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
(cherry picked from commit e233ed0)

Conflicts:
	src/pybind/mgr/cephadm/services/cephadmservice.py
Fixed conflict because `get_keyring_with_caps` has not been
backported to octopus:
ceph#39677
	src/pybind/mgr/cephadm/services/monitoring.py
Fixed a few conflicts because the upstream master contains some
improvements around handling URLs, e.g.
ceph#43579
s0nea pushed a commit to s0nea/ceph that referenced this pull request Mar 24, 2022
Fixes: https://tracker.ceph.com/issues/54502
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
(cherry picked from commit 4f14993)

Conflicts:
	src/pybind/mgr/cephadm/services/cephadmservice.py
Fixed conflict because `get_keyring_with_caps` has not been
backported to octopus: ceph#39677
	src/pybind/mgr/cephadm/services/monitoring.py
Fixed a few conflicts because the master contains some improvements
around handling URLs, e.g. ceph#43579
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants