Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: Restore defaults of mClock built-in profiles upon modification #48703

Merged
merged 1 commit into from Feb 9, 2023

Conversation

sseshasa
Copy link
Contributor

@sseshasa sseshasa commented Nov 2, 2022

The QoS parameters (res, wgt, lim) of mClock profiles are not allowed to be modified by users using commands like "config set" or via admin socket. handle_conf_change() does not allow changes to any built-in mClock profile at the mClock scheduler. But the config subsystem showed the change as expected for the built-in mClock profile QoS parameters. This misled the user into thinking that the change was made at the mClock server when it was not the case.

The above issue is the result of the config "levels" used by the config subsystem. The inital built-in QoS params are set at the CONF_DEFAULT level. This allows the user to modify the built-in QoS params using "config set" command which sets values at CONF_MON level which has higher priority than CONF_DEFAULT level. The new value is persisted on the mon store and therefore the config subsystem shows the change when "config show" command is issued.

To prevent the above, this commit adds changes to restore the defaults set for the built-in profiles by removing the new config changes from the MON store. This results in the original defaults to come back into effect and maintain a consistent view of the built-in profile across all levels.

To accomplish this, the mClock scheduler is provided with additional information like the OSD id, shard id and a pointer to the MonClient using which the Mon store command to remove the option is executed.

Fixes: https://tracker.ceph.com/issues/57533
Signed-off-by: Sridhar Seshasayee sseshasa@redhat.com

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@sseshasa sseshasa requested a review from a team as a code owner November 2, 2022 06:37
@github-actions github-actions bot added the core label Nov 2, 2022
@sseshasa sseshasa force-pushed the wip-fix-mclk-builtin-profile-mods branch from 90190c5 to 85c2d19 Compare November 2, 2022 11:31
@github-actions github-actions bot added the tests label Nov 2, 2022
@sseshasa sseshasa requested review from neha-ojha and a team November 2, 2022 13:12
@sseshasa
Copy link
Contributor Author

sseshasa commented Nov 2, 2022

jenkins test make check

@sseshasa sseshasa force-pushed the wip-fix-mclk-builtin-profile-mods branch from 85c2d19 to bfb457c Compare November 4, 2022 06:28
@sseshasa
Copy link
Contributor Author

sseshasa commented Nov 4, 2022

jenkins test make check

@sseshasa
Copy link
Contributor Author

sseshasa commented Nov 4, 2022

jenkins test api

@sseshasa
Copy link
Contributor Author

sseshasa commented Nov 4, 2022

jenkins test make check

@sseshasa
Copy link
Contributor Author

sseshasa commented Nov 4, 2022

jenkins test api

@sseshasa sseshasa force-pushed the wip-fix-mclk-builtin-profile-mods branch from bfb457c to 274a643 Compare November 7, 2022 05:22
@sseshasa
Copy link
Contributor Author

sseshasa commented Dec 6, 2022

jenkins test make check

1 similar comment
@neha-ojha
Copy link
Member

jenkins test make check

@neha-ojha
Copy link
Member

jenkins test api

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@sseshasa sseshasa force-pushed the wip-fix-mclk-builtin-profile-mods branch from 016c7a9 to 77b0c88 Compare December 29, 2022 13:39
@sseshasa
Copy link
Contributor Author

sseshasa commented Jan 3, 2023

jenkins test make check

@sseshasa
Copy link
Contributor Author

sseshasa commented Jan 3, 2023

jenkins test api

@sseshasa sseshasa force-pushed the wip-fix-mclk-builtin-profile-mods branch 3 times, most recently from adc9cbb to 32cf67e Compare January 5, 2023 14:20
@sseshasa
Copy link
Contributor Author

sseshasa commented Jan 5, 2023

jenkins test make check

@sseshasa sseshasa force-pushed the wip-fix-mclk-builtin-profile-mods branch from 32cf67e to 200acac Compare January 5, 2023 15:31
@sseshasa sseshasa force-pushed the wip-fix-mclk-builtin-profile-mods branch from 200acac to 791f03a Compare January 6, 2023 08:25
@sseshasa
Copy link
Contributor Author

sseshasa commented Jan 6, 2023

jenkins test make check

@sseshasa sseshasa force-pushed the wip-fix-mclk-builtin-profile-mods branch from 791f03a to d188776 Compare January 9, 2023 12:46
The QoS parameters (res, wgt, lim) of mClock profiles are not allowed to
be modified by users using commands like "config set" or via admin socket.
handle_conf_change() does not allow changes to any built-in mClock profile
at the mClock scheduler. But the config subsystem showed the change as
expected for the built-in mClock profile QoS parameters. This misled the
user into thinking that the change was made at the mClock server when
it was not the case.

The above issue is the result of the config "levels" used by the config
subsystem. The inital built-in QoS params are set at the CONF_DEFAULT
level. This allows the user to modify the built-in QoS params using
"config set" command which sets values at CONF_MON level which has higher
priority than CONF_DEFAULT level. The new value is persisted on the mon
store and therefore the config subsystem shows the change when "config
show" command is issued.

To prevent the above, this commit adds changes to restore the defaults set
for the built-in profiles by removing the new config changes from the MON
store. This results in the original defaults to come back into effect and
maintain a consistent view of the built-in profile across all levels.

To accomplish this, the mClock scheduler is provided with additional
information like the OSD id, shard id and a pointer to the MonClient
using which the Mon store command to remove the option is executed.

A standalone test is added to verify that built-in params cannot be
modified and the original profile params are retained.

Fixes: https://tracker.ceph.com/issues/57533
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
@sseshasa sseshasa force-pushed the wip-fix-mclk-builtin-profile-mods branch from d188776 to b127806 Compare January 9, 2023 14:25
@ljflores
Copy link
Contributor

ljflores commented Feb 8, 2023

Rados suite review: https://pulpito.ceph.com/?branch=wip-yuri2-testing-2023-01-26-1532

Failures, unrelated:
1. https://tracker.ceph.com/issues/58496 -- fix in progress
2. https://tracker.ceph.com/issues/58585
3. https://tracker.ceph.com/issues/58475
4. https://tracker.ceph.com/issues/57754
5. https://tracker.ceph.com/issues/49287
6. https://tracker.ceph.com/issues/57731
7. https://tracker.ceph.com/issues/54829
8. https://tracker.ceph.com/issues/52221

Details:
1. osd/PeeringState: FAILED ceph_assert(!acting_recovery_backfill.empty()) - Ceph - RADOS
2. rook: failed to pull kubelet image - Ceph - Orchestrator
3. test_dashboard_e2e.sh: Conflicting peer dependency: postcss@8.4.21 - Ceph - Mgr - Dashboard
4. test_envlibrados_for_rocksdb.sh: update-alternatives: error: alternative path /usr/bin/gcc-11 doesn't exist - Infrastructure
5. podman: setting cgroup config for procHooks process caused: Unit libpod-$hash.scope not found - Ceph - Orchestrator
6. Problem: package container-selinux conflicts with udica < 0.2.6-1 provided by udica-0.2.4-1 - Infrastructure
7. crash: void OSDMap::check_health(ceph::common::CephContext*, health_check_map_t*) const: assert(num_down_in_osds <= num_in_osds) - Ceph - RADOS
8. crash: void OSD::handle_osd_map(MOSDMap*): assert(p != added_maps_bl.end()) - Ceph - RADOS

@yuriw yuriw merged commit 186df9b into ceph:main Feb 9, 2023
11 checks passed
sseshasa added a commit to sseshasa/ceph that referenced this pull request May 15, 2023
This is a follow-up to PR: ceph#48703.
This commit also considers changes made ephemerally using either the
'daemon' or the 'tell' interfaces to override the built-in mClock
QoS parameters. In such a scenario, the ephemeral changes are removed
using the rm_val() method exposed by the config subsytem and logging
this information.

Other changes:

1. Add a standalone test to exercise the fix.
2. Add documentation note on the outcome of the attempt to modify
   built-in profile defaults.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
sseshasa added a commit to sseshasa/ceph that referenced this pull request May 16, 2023
…tion

This is a follow-up to PR: ceph#48703.
Modify the mon caps to allow OSDs to run the "config rm" command with
restriction to remove only the following config keys from the mon store:
- osd_max_backfills
- osd_recovery_max_active(.*)
  - osd_recovery_max_active and
  - osd_recovery_max_active_(hdd|ssd)
- osd_mclock_scheduler_(.*) -> all the QoS specific config options.

The above is similar to the change in mon caps to run the "config set"
command as implemented in PR: ceph#42853.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
sseshasa added a commit to sseshasa/ceph that referenced this pull request May 16, 2023
This is a follow-up to PR: ceph#48703.
This commit also considers changes made ephemerally using either the
'daemon' or the 'tell' interfaces to override the built-in mClock
QoS parameters. In such a scenario, the ephemeral changes are removed
using the rm_val() method exposed by the config subsytem and logging
this information.

Other changes:

1. Add a standalone test to exercise the fix.
2. Add documentation note on the outcome of the attempt to modify
   built-in profile defaults.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
sseshasa added a commit to sseshasa/ceph that referenced this pull request May 18, 2023
…tion

This is a follow-up to PR: ceph#48703.
Modify the mon caps to allow OSDs to run the "config rm" command with
restriction to remove only the following config keys from the mon store:
- osd_max_backfills
- osd_recovery_max_active(.*)
  - osd_recovery_max_active and
  - osd_recovery_max_active_(hdd|ssd)
- osd_mclock_scheduler_(.*) -> all the QoS specific config options.

The above is similar to the change in mon caps to run the "config set"
command as implemented in PR: ceph#42853.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
sseshasa added a commit to sseshasa/ceph that referenced this pull request May 18, 2023
This is a follow-up to PR: ceph#48703.
This commit also considers changes made ephemerally using either the
'daemon' or the 'tell' interfaces to override the built-in mClock
QoS parameters. In such a scenario, the ephemeral changes are removed
using the rm_val() method exposed by the config subsytem and logging
this information.

Other changes:

1. Add a standalone test to exercise the fix.
2. Add documentation note on the outcome of the attempt to modify
   built-in profile defaults.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
sseshasa added a commit to sseshasa/ceph that referenced this pull request May 22, 2023
…tion

This is a follow-up to PR: ceph#48703.
Modify the mon caps to allow OSDs to run the "config rm" command with
restriction to remove only the following config keys from the mon store:
- osd_max_backfills
- osd_recovery_max_active(.*)
  - osd_recovery_max_active and
  - osd_recovery_max_active_(hdd|ssd)
- osd_mclock_scheduler_(.*) -> all the QoS specific config options.

The above is similar to the change in mon caps to run the "config set"
command as implemented in PR: ceph#42853.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit 6916008)
sseshasa added a commit to sseshasa/ceph that referenced this pull request May 22, 2023
This is a follow-up to PR: ceph#48703.
This commit also considers changes made ephemerally using either the
'daemon' or the 'tell' interfaces to override the built-in mClock
QoS parameters. In such a scenario, the ephemeral changes are removed
using the rm_val() method exposed by the config subsytem and logging
this information.

Other changes:

1. Add a standalone test to exercise the fix.
2. Add documentation note on the outcome of the attempt to modify
   built-in profile defaults.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit 414ac7d)
sseshasa added a commit to sseshasa/ceph that referenced this pull request May 22, 2023
…tion

This is a follow-up to PR: ceph#48703.
Modify the mon caps to allow OSDs to run the "config rm" command with
restriction to remove only the following config keys from the mon store:
- osd_max_backfills
- osd_recovery_max_active(.*)
  - osd_recovery_max_active and
  - osd_recovery_max_active_(hdd|ssd)
- osd_mclock_scheduler_(.*) -> all the QoS specific config options.

The above is similar to the change in mon caps to run the "config set"
command as implemented in PR: ceph#42853.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit 6916008)
sseshasa added a commit to sseshasa/ceph that referenced this pull request May 22, 2023
This is a follow-up to PR: ceph#48703.
This commit also considers changes made ephemerally using either the
'daemon' or the 'tell' interfaces to override the built-in mClock
QoS parameters. In such a scenario, the ephemeral changes are removed
using the rm_val() method exposed by the config subsytem and logging
this information.

Other changes:

1. Add a standalone test to exercise the fix.
2. Add documentation note on the outcome of the attempt to modify
   built-in profile defaults.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit 414ac7d)
esmaeil-mirvakili pushed a commit to esmaeil-mirvakili/ceph that referenced this pull request May 22, 2023
…tion

This is a follow-up to PR: ceph#48703.
Modify the mon caps to allow OSDs to run the "config rm" command with
restriction to remove only the following config keys from the mon store:
- osd_max_backfills
- osd_recovery_max_active(.*)
  - osd_recovery_max_active and
  - osd_recovery_max_active_(hdd|ssd)
- osd_mclock_scheduler_(.*) -> all the QoS specific config options.

The above is similar to the change in mon caps to run the "config set"
command as implemented in PR: ceph#42853.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
esmaeil-mirvakili pushed a commit to esmaeil-mirvakili/ceph that referenced this pull request May 22, 2023
This is a follow-up to PR: ceph#48703.
This commit also considers changes made ephemerally using either the
'daemon' or the 'tell' interfaces to override the built-in mClock
QoS parameters. In such a scenario, the ephemeral changes are removed
using the rm_val() method exposed by the config subsytem and logging
this information.

Other changes:

1. Add a standalone test to exercise the fix.
2. Add documentation note on the outcome of the attempt to modify
   built-in profile defaults.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
muahdib69 pushed a commit to muahdib69/ceph that referenced this pull request May 26, 2023
…tion

This is a follow-up to PR: ceph#48703.
Modify the mon caps to allow OSDs to run the "config rm" command with
restriction to remove only the following config keys from the mon store:
- osd_max_backfills
- osd_recovery_max_active(.*)
  - osd_recovery_max_active and
  - osd_recovery_max_active_(hdd|ssd)
- osd_mclock_scheduler_(.*) -> all the QoS specific config options.

The above is similar to the change in mon caps to run the "config set"
command as implemented in PR: ceph#42853.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
muahdib69 pushed a commit to muahdib69/ceph that referenced this pull request May 26, 2023
This is a follow-up to PR: ceph#48703.
This commit also considers changes made ephemerally using either the
'daemon' or the 'tell' interfaces to override the built-in mClock
QoS parameters. In such a scenario, the ephemeral changes are removed
using the rm_val() method exposed by the config subsytem and logging
this information.

Other changes:

1. Add a standalone test to exercise the fix.
2. Add documentation note on the outcome of the attempt to modify
   built-in profile defaults.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
joscollin pushed a commit to joscollin/ceph that referenced this pull request Jul 31, 2023
…tion

This is a follow-up to PR: ceph#48703.
Modify the mon caps to allow OSDs to run the "config rm" command with
restriction to remove only the following config keys from the mon store:
- osd_max_backfills
- osd_recovery_max_active(.*)
  - osd_recovery_max_active and
  - osd_recovery_max_active_(hdd|ssd)
- osd_mclock_scheduler_(.*) -> all the QoS specific config options.

The above is similar to the change in mon caps to run the "config set"
command as implemented in PR: ceph#42853.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
joscollin pushed a commit to joscollin/ceph that referenced this pull request Jul 31, 2023
This is a follow-up to PR: ceph#48703.
This commit also considers changes made ephemerally using either the
'daemon' or the 'tell' interfaces to override the built-in mClock
QoS parameters. In such a scenario, the ephemeral changes are removed
using the rm_val() method exposed by the config subsytem and logging
this information.

Other changes:

1. Add a standalone test to exercise the fix.
2. Add documentation note on the outcome of the attempt to modify
   built-in profile defaults.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
yuvalif pushed a commit to yuvalif/ceph that referenced this pull request Aug 14, 2023
…tion

This is a follow-up to PR: ceph#48703.
Modify the mon caps to allow OSDs to run the "config rm" command with
restriction to remove only the following config keys from the mon store:
- osd_max_backfills
- osd_recovery_max_active(.*)
  - osd_recovery_max_active and
  - osd_recovery_max_active_(hdd|ssd)
- osd_mclock_scheduler_(.*) -> all the QoS specific config options.

The above is similar to the change in mon caps to run the "config set"
command as implemented in PR: ceph#42853.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit 6916008)
(cherry picked from commit 431e3ed)

Resolves: rhbz#2124137
yuvalif pushed a commit to yuvalif/ceph that referenced this pull request Aug 14, 2023
This is a follow-up to PR: ceph#48703.
This commit also considers changes made ephemerally using either the
'daemon' or the 'tell' interfaces to override the built-in mClock
QoS parameters. In such a scenario, the ephemeral changes are removed
using the rm_val() method exposed by the config subsytem and logging
this information.

Other changes:

1. Add a standalone test to exercise the fix.
2. Add documentation note on the outcome of the attempt to modify
   built-in profile defaults.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit 414ac7d)
(cherry picked from commit 41c903a)

Resolves: rhbz#2124137
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants