osd/scheduler: Reset ephemeral changes to mClock built-in profile #51480

sseshasa · 2023-05-15T13:32:13Z

This is a follow-up to PR: #48703. This fix also considers changes made ephemerally using either the 'daemon' or the 'tell' interfaces to override the built-in mClock QoS parameters. In such a scenario, the ephemeral changes are removed using the rm_val() method exposed by the config subsytem and logging this information.

Other changes:

Add a standalone test to exercise the fix.
Add documentation note on the outcome of the attempt to modify built-in profile defaults.
Modify mon caps to allow OSDs to run "config rm" command with restrictions.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee sseshasa@redhat.com

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows

sseshasa · 2023-05-16T03:25:41Z

jenkins test make check

sseshasa · 2023-05-16T17:49:47Z

@athanatos @neha-ojha I was looking into the logs reported by QE where the QoS related config keys were not removed from the mon store. See BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2124137#c41. It turns out that the failure was due to the OSD having no privilege to run the "config rm" command. I therefore added one more commit that resolves this issue. The default osd profile needs to allow the OSDs to run the "config rm" command. The new commit allows running this command only for the following config keys:

osd_max_backfills
osd_recovery_max_active(.*)
- osd_recovery_max_active and
- osd_recovery_max_active_(hdd|ssd)
osd_mclock_scheduler_(.*) -> all the QoS specific config options.

This is similar to the change introduced to allow OSDs to run the "config set" command for a couple of config keys as implemented in PR: #42853.

With this change, I was able to verify using vstart cluster that there are no "access denied" errors for the commands issued by the OSD.

sseshasa · 2023-05-16T18:11:07Z

jenkins test api

ljflores · 2023-05-17T20:04:17Z

jenkins test api

ljflores · 2023-05-17T20:04:38Z

Rados suite review available here: https://tracker.ceph.com/projects/rados/wiki/MAIN#httpstrellocomc1EFSeXDn1752-wip-yuri10-testing-2023-05-16-1243

yuriw · 2023-05-17T20:39:20Z

jenkins test api

sseshasa · 2023-05-18T05:05:59Z

@yuriw @ljflores I am checking if the API test failure is related to the moncap commit I introduced.

…tion This is a follow-up to PR: ceph#48703. Modify the mon caps to allow OSDs to run the "config rm" command with restriction to remove only the following config keys from the mon store: - osd_max_backfills - osd_recovery_max_active(.*) - osd_recovery_max_active and - osd_recovery_max_active_(hdd|ssd) - osd_mclock_scheduler_(.*) -> all the QoS specific config options. The above is similar to the change in mon caps to run the "config set" command as implemented in PR: ceph#42853. Fixes: https://tracker.ceph.com/issues/61155 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>

This is a follow-up to PR: ceph#48703. This commit also considers changes made ephemerally using either the 'daemon' or the 'tell' interfaces to override the built-in mClock QoS parameters. In such a scenario, the ephemeral changes are removed using the rm_val() method exposed by the config subsytem and logging this information. Other changes: 1. Add a standalone test to exercise the fix. 2. Add documentation note on the outcome of the attempt to modify built-in profile defaults. Fixes: https://tracker.ceph.com/issues/61155 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>

yuriw · 2023-05-18T13:50:30Z

@sseshasa pls merge when ready

This is tested ref: https://trello.com/c/1EFSeXDn

With mClock scheduler enabled, a small subset of config options related to recovery limits are not allowed to be modified unless osd_mclock_override_recovery_settings option is enabled. This override option is disabled by default. The following options cannot be modified without enabling the override option: - osd_max_backfills - osd_recovery_max_active[_(hdd|ssd)] The above options are removed from the mon kv store which effectively restores them to the default values. This was resulting in tests for example, test_cluster_configuration.ClusterConfigurationTest to fail since it modifies the recovery options and expects to verify the modified value. Therefore, for tests, osd_mclock_override_recovery_settings option is enabled in vstart_runner.py so that current and future tests are not affected. Fixes: https://tracker.ceph.com/issues/61155 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>

sseshasa · 2023-05-18T18:25:30Z

jenkins test api

sseshasa · 2023-05-19T04:19:27Z

@rishabh-d-dave Please review aed71b5 as I see that you are the main contributor of this tool. With the mon caps change in this PR, vstart_runner test as mentioned in the above commit's message was failing. I therefore made a small change in vstart_runner.py to allow override of recovery options as mclock scheduler is the default. With this change, the API test passes.

sseshasa · 2023-05-22T05:25:05Z

@rishabh-d-dave Please review aed71b5 as I see that you are the main contributor of this tool.

@rishabh-d-dave Can you please take a quick look at the above commit related to vstart_runner.py? The other commits are already reviewed. I am planning to backport this to reef and quincy and get them into downstream releases soon and therefore the urgency.

sseshasa requested review from a team as code owners May 15, 2023 13:32

github-actions bot added core documentation labels May 15, 2023

athanatos approved these changes May 15, 2023

View reviewed changes

neha-ojha added the needs-qa label May 16, 2023

sseshasa force-pushed the wip-fix-pr48703-followup branch from ee52e9b to 1a5ee8d Compare May 16, 2023 17:38

github-actions bot added the mon label May 16, 2023

ljflores added the wip-yuri10-testing label May 16, 2023

sseshasa added 2 commits May 18, 2023 14:03

sseshasa force-pushed the wip-fix-pr48703-followup branch from 1a5ee8d to 95fcae1 Compare May 18, 2023 09:20

github-actions bot added the tests label May 18, 2023

sseshasa force-pushed the wip-fix-pr48703-followup branch from 95fcae1 to 752c450 Compare May 18, 2023 12:42

yuriw added TESTED ready-to-merge and removed wip-yuri10-testing labels May 18, 2023

sseshasa force-pushed the wip-fix-pr48703-followup branch from 752c450 to aed71b5 Compare May 18, 2023 16:12

sseshasa merged commit 0f64042 into ceph:main May 22, 2023
11 checks passed

This was referenced May 22, 2023

reef: osd/scheduler: Reset ephemeral changes to mClock built-in profile #51663

Merged

quincy: osd/scheduler: Reset ephemeral changes to mClock built-in profile #51664

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osd/scheduler: Reset ephemeral changes to mClock built-in profile #51480

osd/scheduler: Reset ephemeral changes to mClock built-in profile #51480

sseshasa commented May 15, 2023 •

edited

sseshasa commented May 16, 2023

sseshasa commented May 16, 2023

sseshasa commented May 16, 2023

ljflores commented May 17, 2023

ljflores commented May 17, 2023

yuriw commented May 17, 2023

sseshasa commented May 18, 2023

yuriw commented May 18, 2023

sseshasa commented May 18, 2023

sseshasa commented May 19, 2023

sseshasa commented May 22, 2023

osd/scheduler: Reset ephemeral changes to mClock built-in profile #51480

osd/scheduler: Reset ephemeral changes to mClock built-in profile #51480

Conversation

sseshasa commented May 15, 2023 • edited

Contribution Guidelines

Checklist

sseshasa commented May 16, 2023

sseshasa commented May 16, 2023

sseshasa commented May 16, 2023

ljflores commented May 17, 2023

ljflores commented May 17, 2023

yuriw commented May 17, 2023

sseshasa commented May 18, 2023

yuriw commented May 18, 2023

sseshasa commented May 18, 2023

sseshasa commented May 19, 2023

sseshasa commented May 22, 2023

sseshasa commented May 15, 2023 •

edited