Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd/scheduler: Reset ephemeral changes to mClock built-in profile #51480

Merged
merged 3 commits into from May 22, 2023

Conversation

sseshasa
Copy link
Contributor

@sseshasa sseshasa commented May 15, 2023

This is a follow-up to PR: #48703. This fix also considers changes made ephemerally using either the 'daemon' or the 'tell' interfaces to override the built-in mClock QoS parameters. In such a scenario, the ephemeral changes are removed using the rm_val() method exposed by the config subsytem and logging this information.

Other changes:

  1. Add a standalone test to exercise the fix.
  2. Add documentation note on the outcome of the attempt to modify built-in profile defaults.
  3. Modify mon caps to allow OSDs to run "config rm" command with restrictions.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee sseshasa@redhat.com

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@sseshasa
Copy link
Contributor Author

jenkins test make check

@sseshasa
Copy link
Contributor Author

@athanatos @neha-ojha I was looking into the logs reported by QE where the QoS related config keys were not removed from the mon store. See BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2124137#c41. It turns out that the failure was due to the OSD having no privilege to run the "config rm" command. I therefore added one more commit that resolves this issue. The default osd profile needs to allow the OSDs to run the "config rm" command. The new commit allows running this command only for the following config keys:

  • osd_max_backfills
  • osd_recovery_max_active(.*)
    • osd_recovery_max_active and
    • osd_recovery_max_active_(hdd|ssd)
  • osd_mclock_scheduler_(.*) -> all the QoS specific config options.

This is similar to the change introduced to allow OSDs to run the "config set" command for a couple of config keys as implemented in PR: #42853.

With this change, I was able to verify using vstart cluster that there are no "access denied" errors for the commands issued by the OSD.

@sseshasa
Copy link
Contributor Author

jenkins test api

@ljflores
Copy link
Contributor

jenkins test api

@ljflores
Copy link
Contributor

@yuriw
Copy link
Contributor

yuriw commented May 17, 2023

jenkins test api

@sseshasa
Copy link
Contributor Author

@yuriw @ljflores I am checking if the API test failure is related to the moncap commit I introduced.

…tion

This is a follow-up to PR: ceph#48703.
Modify the mon caps to allow OSDs to run the "config rm" command with
restriction to remove only the following config keys from the mon store:
- osd_max_backfills
- osd_recovery_max_active(.*)
  - osd_recovery_max_active and
  - osd_recovery_max_active_(hdd|ssd)
- osd_mclock_scheduler_(.*) -> all the QoS specific config options.

The above is similar to the change in mon caps to run the "config set"
command as implemented in PR: ceph#42853.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
This is a follow-up to PR: ceph#48703.
This commit also considers changes made ephemerally using either the
'daemon' or the 'tell' interfaces to override the built-in mClock
QoS parameters. In such a scenario, the ephemeral changes are removed
using the rm_val() method exposed by the config subsytem and logging
this information.

Other changes:

1. Add a standalone test to exercise the fix.
2. Add documentation note on the outcome of the attempt to modify
   built-in profile defaults.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
@yuriw
Copy link
Contributor

yuriw commented May 18, 2023

@sseshasa pls merge when ready

This is tested ref: https://trello.com/c/1EFSeXDn

With mClock scheduler enabled, a small subset of config options related
to recovery limits are not allowed to be modified unless
osd_mclock_override_recovery_settings option is enabled. This override
option is disabled by default. The following options cannot be modified
without enabling the override option:

 - osd_max_backfills
 - osd_recovery_max_active[_(hdd|ssd)]

The above options are removed from the mon kv store which effectively
restores them to the default values.

This was resulting in tests for example,
test_cluster_configuration.ClusterConfigurationTest to fail since it
modifies the recovery options and expects to verify the modified value.

Therefore, for tests, osd_mclock_override_recovery_settings option is
enabled in vstart_runner.py so that current and future tests
are not affected.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
@sseshasa
Copy link
Contributor Author

jenkins test api

@sseshasa
Copy link
Contributor Author

@rishabh-d-dave Please review aed71b5 as I see that you are the main contributor of this tool. With the mon caps change in this PR, vstart_runner test as mentioned in the above commit's message was failing. I therefore made a small change in vstart_runner.py to allow override of recovery options as mclock scheduler is the default. With this change, the API test passes.

@sseshasa
Copy link
Contributor Author

@rishabh-d-dave Please review aed71b5 as I see that you are the main contributor of this tool.

@rishabh-d-dave Can you please take a quick look at the above commit related to vstart_runner.py? The other commits are already reviewed. I am planning to backport this to reef and quincy and get them into downstream releases soon and therefore the urgency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants