New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd/scheduler: Reset ephemeral changes to mClock built-in profile #51480
Conversation
jenkins test make check |
ee52e9b
to
1a5ee8d
Compare
@athanatos @neha-ojha I was looking into the logs reported by QE where the QoS related config keys were not removed from the mon store. See BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2124137#c41. It turns out that the failure was due to the OSD having no privilege to run the "config rm" command. I therefore added one more commit that resolves this issue. The default osd profile needs to allow the OSDs to run the "config rm" command. The new commit allows running this command only for the following config keys:
This is similar to the change introduced to allow OSDs to run the "config set" command for a couple of config keys as implemented in PR: #42853. With this change, I was able to verify using vstart cluster that there are no "access denied" errors for the commands issued by the OSD. |
jenkins test api |
jenkins test api |
Rados suite review available here: https://tracker.ceph.com/projects/rados/wiki/MAIN#httpstrellocomc1EFSeXDn1752-wip-yuri10-testing-2023-05-16-1243 |
jenkins test api |
…tion This is a follow-up to PR: ceph#48703. Modify the mon caps to allow OSDs to run the "config rm" command with restriction to remove only the following config keys from the mon store: - osd_max_backfills - osd_recovery_max_active(.*) - osd_recovery_max_active and - osd_recovery_max_active_(hdd|ssd) - osd_mclock_scheduler_(.*) -> all the QoS specific config options. The above is similar to the change in mon caps to run the "config set" command as implemented in PR: ceph#42853. Fixes: https://tracker.ceph.com/issues/61155 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
This is a follow-up to PR: ceph#48703. This commit also considers changes made ephemerally using either the 'daemon' or the 'tell' interfaces to override the built-in mClock QoS parameters. In such a scenario, the ephemeral changes are removed using the rm_val() method exposed by the config subsytem and logging this information. Other changes: 1. Add a standalone test to exercise the fix. 2. Add documentation note on the outcome of the attempt to modify built-in profile defaults. Fixes: https://tracker.ceph.com/issues/61155 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
1a5ee8d
to
95fcae1
Compare
95fcae1
to
752c450
Compare
@sseshasa pls merge when ready This is tested ref: https://trello.com/c/1EFSeXDn |
With mClock scheduler enabled, a small subset of config options related to recovery limits are not allowed to be modified unless osd_mclock_override_recovery_settings option is enabled. This override option is disabled by default. The following options cannot be modified without enabling the override option: - osd_max_backfills - osd_recovery_max_active[_(hdd|ssd)] The above options are removed from the mon kv store which effectively restores them to the default values. This was resulting in tests for example, test_cluster_configuration.ClusterConfigurationTest to fail since it modifies the recovery options and expects to verify the modified value. Therefore, for tests, osd_mclock_override_recovery_settings option is enabled in vstart_runner.py so that current and future tests are not affected. Fixes: https://tracker.ceph.com/issues/61155 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
752c450
to
aed71b5
Compare
jenkins test api |
@rishabh-d-dave Please review aed71b5 as I see that you are the main contributor of this tool. With the mon caps change in this PR, vstart_runner test as mentioned in the above commit's message was failing. I therefore made a small change in vstart_runner.py to allow override of recovery options as mclock scheduler is the default. With this change, the API test passes. |
@rishabh-d-dave Can you please take a quick look at the above commit related to vstart_runner.py? The other commits are already reviewed. I am planning to backport this to reef and quincy and get them into downstream releases soon and therefore the urgency. |
This is a follow-up to PR: #48703. This fix also considers changes made ephemerally using either the 'daemon' or the 'tell' interfaces to override the built-in mClock QoS parameters. In such a scenario, the ephemeral changes are removed using the rm_val() method exposed by the config subsytem and logging this information.
Other changes:
Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee sseshasa@redhat.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows