Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: Set initial mClock QoS params at CONF_DEFAULT level #46700

Merged
merged 1 commit into from
Jul 7, 2022

Conversation

sseshasa
Copy link
Contributor

@sseshasa sseshasa commented Jun 15, 2022

Create the initial mClock QoS params at CONF_DEFAULT level using
set_val_default(). This allows switching to a custom profile on a
running OSD and to make necessary changes to the desired QoS params.
Note that Switching to ‘custom’ profile and then subsequently changing
the QoS params using “config set osd.n …” will be at a higher level i.e.
at CONF_MON.

But When switching back to a built-in profile, the new values won’t take
effect since CONF_DEFAULT < CONF_MON. For the values to take effect, the
config keys created as part of the ‘custom’ profile must be removed from
the ConfigMonitor store after switching back to a built-in profile.

  • Added a couple of standalone tests to exercise the scenario.
  • Updated the mClock configuration document and the mClock internal
    documentation with a couple of typos relating to the best effort weights.
  • Added new sections to the mClock configuration document outlining the
    steps to switch between the built-in and custom profile and vice-versa.

Fixes: https://tracker.ceph.com/issues/55153
Signed-off-by: Sridhar Seshasayee sseshasa@redhat.com

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@sseshasa
Copy link
Contributor Author

jenkins test make check

@sseshasa
Copy link
Contributor Author

jenkins test api

@sseshasa sseshasa force-pushed the wip-fix-mclock-config-set branch 3 times, most recently from 12d7877 to 1399cd8 Compare June 17, 2022 06:59
@sseshasa
Copy link
Contributor Author

jenkins test api

anthonyeleven
anthonyeleven previously approved these changes Jun 18, 2022
@anthonyeleven anthonyeleven dismissed their stale review June 18, 2022 01:41

not ready for merge

Copy link
Contributor

@anthonyeleven anthonyeleven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs LGTM, the code is beyond my ken

Copy link
Member

@neha-ojha neha-ojha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code changes look fine, left some comments on the documentation

doc/rados/configuration/mclock-config-ref.rst Outdated Show resolved Hide resolved
doc/rados/configuration/mclock-config-ref.rst Show resolved Hide resolved
doc/rados/configuration/mclock-config-ref.rst Show resolved Hide resolved
@sseshasa sseshasa force-pushed the wip-fix-mclock-config-set branch 2 times, most recently from 6268d3c to 8dca8b2 Compare June 24, 2022 09:00
doc/rados/configuration/mclock-config-ref.rst Outdated Show resolved Hide resolved
doc/rados/configuration/mclock-config-ref.rst Show resolved Hide resolved
doc/rados/configuration/mclock-config-ref.rst Outdated Show resolved Hide resolved
doc/rados/configuration/mclock-config-ref.rst Outdated Show resolved Hide resolved
doc/rados/configuration/mclock-config-ref.rst Show resolved Hide resolved
doc/rados/configuration/mclock-config-ref.rst Outdated Show resolved Hide resolved
doc/rados/configuration/mclock-config-ref.rst Outdated Show resolved Hide resolved
doc/rados/configuration/mclock-config-ref.rst Outdated Show resolved Hide resolved
doc/rados/configuration/mclock-config-ref.rst Outdated Show resolved Hide resolved
doc/rados/configuration/mclock-config-ref.rst Outdated Show resolved Hide resolved
@sseshasa sseshasa force-pushed the wip-fix-mclock-config-set branch 2 times, most recently from 13c5a96 to d7e8bfe Compare June 27, 2022 13:45
@sseshasa
Copy link
Contributor Author

jenkins test api

@sseshasa
Copy link
Contributor Author

jenkins test make check arm64

@sseshasa
Copy link
Contributor Author

jenkins test api

@sseshasa
Copy link
Contributor Author

jenkins test make check

@sseshasa
Copy link
Contributor Author

sseshasa commented Jul 5, 2022

@anthonyeleven Can you please check if your comments are addressed? Thanks!

doc/rados/configuration/mclock-config-ref.rst Outdated Show resolved Hide resolved
doc/rados/configuration/mclock-config-ref.rst Outdated Show resolved Hide resolved
doc/rados/configuration/mclock-config-ref.rst Outdated Show resolved Hide resolved
Create the initial mClock QoS params at CONF_DEFAULT level using
set_val_default(). This allows switching to a custom profile on a
running OSD and to make necessary changes to the desired QoS params.
Note that Switching to ‘custom’ profile and then subsequently changing
the QoS params using “config set osd.n …” will be at a higher level i.e.
at CONF_MON.

But When switching back to a built-in profile, the new values won’t take
effect since CONF_DEFAULT < CONF_MON. For the values to take effect, the
config keys created as part of the ‘custom’ profile must be removed from
the ConfigMonitor store after switching back to a built-in profile.

- Added a couple of standalone tests to exercise the scenario.
- Updated the mClock configuration document and the mClock internal
  documentation with a couple of typos relating to the best effort weights.
- Added new sections to the mClock configuration document outlining the
  steps to switch between the built-in and custom profile and vice-versa.

Fixes: https://tracker.ceph.com/issues/55153
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
@sseshasa
Copy link
Contributor Author

sseshasa commented Jul 6, 2022

jenkins test make check

@anthonyeleven
Copy link
Contributor

jenkins test make check

@sseshasa
Copy link
Contributor Author

sseshasa commented Jul 7, 2022

Teuthology Test Result
https://pulpito.ceph.com/yuriw-2022-06-27_15:15:17-rados-wip-yuri2-testing-2022-06-24-1331-distro-default-smithi
https://pulpito.ceph.com/?branch=wip-yuri2-testing-2022-06-29-0820

Unrelated Failures

  1. Failure Reason: Valgrind failures in mon and osd.
    Existing Trackers: https://tracker.ceph.com/issues/52124 and https://tracker.ceph.com/issues/54603

  2. Failure Reason: 'wait for operator' reached maximum tries (90) after waiting for 900 seconds
    Existing tracker: https://tracker.ceph.com/issues/52420

  3. Failure Reason: "2022-06-30T02:07:39.827451+0000 mon.a (mon.0) 142 : cluster [WRN] Health check failed: Degraded data redundancy: 2/4 objects degraded (50.000%), 1 pg degraded (PG_DEGRADED)" in cluster log
    Existing Tracker: https://tracker.ceph.com/issues/51282

  4. Failure Reason: Exiting scrub checking -- not all pgs scrubbed
    Existing Tracker: https://tracker.ceph.com/issues/53342

  5. Failure Reason: mgr get_metadata_python Requested missing service mon.a
    Existing Tracker: https://tracker.ceph.com/issues/55322

  6. Failure Reason: Tests cls_rgw.index_list and cls_rgw.index_list_delimited failed in qa/workunits/cls/test_cls_rgw.sh
    Existing Tracker: https://tracker.ceph.com/issues/55853

  7. Failure Reason: "2022-06-30T04:02:32.040193+0000 osd.3 (osd.3) 33 : cluster [ERR] osd.3(0) found snap mapper error on pg 3.bs0> oid 3:d81a0fb3:::smithi10749189-30:137 snaps missing in mapper, should be: 132,137 ...repaired" in cluster log
    Existing Tracker: https://tracker.ceph.com/issues/49525 was fixed, but this is reappearing again.
    Raised New Tracker: https://tracker.ceph.com/issues/56438

  8. Failure Reason: Command failed on smithi080 with status 123: "find /home/ubuntu/cephtest/archive/syslog -name '*.log' -print0 | sudo xargs -0 --no-run-if-empty -- gzip --"
    Existing Tracker: https://tracker.ceph.com/issues/50868

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants