New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/cephadm: add iscsi and nfs to upgrade process #39677
Conversation
1317837
to
8d227a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
src/pybind/mgr/cephadm/module.py
Outdated
# need iscsi and nfs as well in order to upgrade them | ||
if daemon_type not in CEPH_TYPES and daemon_type not in ['nfs', 'iscsi']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this hunk is still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by this comment? Do you think we can just remove the check and allow custom images for any daemon type?
please veirfy this in teutology. Please add either iscsi or nfs (or both) to https://github.com/ceph/ceph/blob/master/qa/suites/rados/cephadm/upgrade/fixed-2.yaml See
and ceph/qa/suites/rados/cephadm/workunits/task/test_orch_cli.yaml Lines 14 to 17 in a171b32
|
8d227a2
to
a8c1220
Compare
7298018
to
9f26ac7
Compare
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
9f26ac7
to
f519ef5
Compare
There was an issue here with the way the new ok-to-stop functions worked with upgrade. We don't pause upgrade if an ok-to-stop check fails or notify the user so in a case where you have something like only have 1 nfs daemon, the ok-to-stop would perpetually fail as the check is on a static condition that requires change from the user, and the upgrade would go on forever without telling the user anything is wrong. The new ok-to-stop functions make sense for something like putting a host in maintenance mode but simply don't work well with the upgrade. I opted for limiting the ok-to-stop checks in upgrade to the daemons who had a defined ok-to-stop before the addition of host maintenance (mon, osd and mds, the ones with an actual ceph ok-to-stop command) to avoid the situation. If we don't want to do that, some change to the ok-to-stop functions would have to happen such as having a stronger force flag or making them aware of when an upgrade is happening. Even without this PR adding isci and nfs to upgrade I think the issue might already exist if there is only a single rgw daemon during an upgrade. |
@sebastian-philipp I added iscsi directly to the fixed-2.yaml file. How would adding nfs work? I'm assuming I can't just copy the cephfs_test_runner block from the example you linked into the fixed-2.yaml. |
f519ef5
to
704cffd
Compare
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
If the caps change from the old version to the new one it causes issues in the upgrade. This allows the caps to be updated. Currently only seeing this with iscsi but changing it for other as a precaution Signed-off-by: Adam King <adking@redhat.com>
Fixes: https://tracker.ceph.com/issues/49462 Signed-off-by: Adam King <adking@redhat.com>
704cffd
to
20e7b4d
Compare
Fixes: https://tracker.ceph.com/issues/54502 Signed-off-by: Tatjana Dehler <tdehler@suse.com> (cherry picked from commit e233ed0) Conflicts: src/pybind/mgr/cephadm/services/cephadmservice.py Fixed conflict because `get_keyring_with_caps` has not been backported to octopus: ceph#39677 src/pybind/mgr/cephadm/services/monitoring.py Fixed a few conflicts because the upstream master contains some improvements around handling URLs, e.g. ceph#43579
Fixes: https://tracker.ceph.com/issues/54502 Signed-off-by: Tatjana Dehler <tdehler@suse.com> (cherry picked from commit 4f14993) Conflicts: src/pybind/mgr/cephadm/services/cephadmservice.py Fixed conflict because `get_keyring_with_caps` has not been backported to octopus: ceph#39677 src/pybind/mgr/cephadm/services/monitoring.py Fixed a few conflicts because the master contains some improvements around handling URLs, e.g. ceph#43579
Fixes: https://tracker.ceph.com/issues/49462
Signed-off-by: Adam King adking@redhat.com
The reason for the change to the way the keyring is grabbed is because the iscsi caps changed at some point and upgrading from an old version might mean there's an already existent keyring with different caps. The
auth get-or-create
command fails if you pass an existent entity with different caps and the iscsi daemon could not be redeployed (in which case it cannot be upgraded either).The reason for avoiding the reconfig due to monmap changes in the middle of the upgrade is it made it possible for similar problems to https://tracker.ceph.com/issues/49013 where the daemon being reconfigured had an old unit.run file that was incompatible with changes in the updated systemd unit file. In that case, specifically for nfs, trying to reconfig the daemon would fail (reconfig does not redeploy the unit.run file for the daemon) and you would have to wait until the call timed out. This made the upgrade significantly slower since it would happen every time the serve loop was entered from when the mons were upgraded until nfs was upgraded.Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox