Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: Add config option to skip running the osd benchmark during init and update documentation. #42604

Merged
merged 4 commits into from Sep 8, 2021

Conversation

sseshasa
Copy link
Contributor

@sseshasa sseshasa commented Aug 3, 2021

  1. Introduce a new dev config option "osd_mclock_skip_benchmark" that when
    set skips running the OSD benchmark on start-up. By default this option is
    disabled. This is useful in the following scenarios:

    • Dev/CI testing,
    • Configurations that don't need QoS.
  2. Update mclock-config-ref doc steps to override osd max iops capacity.

Fixes: https://tracker.ceph.com/issues/52025
Signed-off-by: Sridhar Seshasayee sseshasa@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@sseshasa sseshasa marked this pull request as ready for review August 3, 2021 12:06
@sseshasa sseshasa force-pushed the wip-skip-osd-benchmark branch 2 times, most recently from fbedee8 to 11881c6 Compare August 3, 2021 14:37
@yuriw
Copy link
Contributor

yuriw commented Aug 26, 2021

Per @sseshasa this requires some changes

Introduce a new dev config option "osd_mclock_skip_benchmark" that
when set skips running the OSD benchmark on start-up. By default
this option is disabled. This is useful in the following scenarios:

 - Dev/CI testing,
 - Configurations that don't need QoS.

If the option is enabled, the default OSD iops capacity is read from
osd_mclock_max_capacity_iops_[hdd,ssd].

Fixes: https://tracker.ceph.com/issues/52025
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
…city.

Update the steps in the mclock config reference document to manually
override an OSDs max IOPS capacity. Provide information on the alternative
ways to override the osd_mclock_max_capacity_iops_[hdd,ssd] options for
an OSD.

Fixes: https://tracker.ceph.com/issues/52025
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
…ark option

Add a standalone test - test_activate_osd_skip_benchmark() in ceph-helpers.sh
that exercises the osd-mclock-skip-benchmark option.

Fixes: https://tracker.ceph.com/issues/52025
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
@sseshasa sseshasa force-pushed the wip-skip-osd-benchmark branch 2 times, most recently from 4f6a252 to 266b1ae Compare September 1, 2021 11:57
Force a subset of tests that explicitly employ the filestore backend to
use WPQ scheduler. This is because mclock scheduler will not be
optimized for filestore.

Fixes: https://tracker.ceph.com/issues/52025
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
@sseshasa
Copy link
Contributor Author

sseshasa commented Sep 2, 2021

@neha-ojha I had made changes to qa/objectstore/filestore-xfs.yaml and qa/objectstore_debug/filestore-xfs.yaml to use 'wpq'. But when teuthology runs the tests, the settings for 'osd op queue' in the above files are overridden by the settings in rados.yaml that is applied subsequently thus making the changes redundant. For e.g. see the tests in the run http://pulpito.front.sepia.ceph.com/sseshasa-2021-09-02_07:53:45-rados-wip-sseshasa2-testing-2021-09-01-1728-distro-basic-smithi/. One such test is shown below:

rados/singleton/{all/erasure-code-nonregression mon_election/connectivity msgr-failures/none msgr/async-v2only objectstore/filestore-xfs rados supported-random-distro$/{centos_8}}

The overrides in rados.yaml are applied after the filestore-xfs overrides (where I made the changes to use 'wpq') are applied and therefore made redundant. But the tests do pass with mclock scheduler and therefore I am removing it from this PR.

Unless there's a way to prevent/avoid the override, this PR can be merged since there were no related failures in Yuri's run. I can update this PR with details of that run if this can be merged.

@neha-ojha
Copy link
Member

@neha-ojha I had made changes to qa/objectstore/filestore-xfs.yaml and qa/objectstore_debug/filestore-xfs.yaml to use 'wpq'. But when teuthology runs the tests, the settings for 'osd op queue' in the above files are overridden by the settings in rados.yaml that is applied subsequently thus making the changes redundant. For e.g. see the tests in the run http://pulpito.front.sepia.ceph.com/sseshasa-2021-09-02_07:53:45-rados-wip-sseshasa2-testing-2021-09-01-1728-distro-basic-smithi/. One such test is shown below:

rados/singleton/{all/erasure-code-nonregression mon_election/connectivity msgr-failures/none msgr/async-v2only objectstore/filestore-xfs rados supported-random-distro$/{centos_8}}

The overrides in rados.yaml are applied after the filestore-xfs overrides (where I made the changes to use 'wpq') are applied and therefore made redundant. But the tests do pass with mclock scheduler and therefore I am removing it from this PR.

Unless there's a way to prevent/avoid the override, this PR can be merged since there were no related failures in Yuri's run. I can update this PR with details of that run if this can be merged.

We'll probably need to restructure the facets to make filestore-xfs overrides take precedence, but I don't think it is required if the tests are passing. Let's go ahead and update rados run details and merge this PR.

@sseshasa
Copy link
Contributor Author

sseshasa commented Sep 8, 2021

Teuthology Testing Result:

Pulpito Run:
http://pulpito.front.sepia.ceph.com/yuriw-2021-08-24_19:42:41-rados-wip-yuri8-testing-2021-08-24-0913-distro-basic-smithi/

Failures (Unrelated):

  1. http://pulpito.front.sepia.ceph.com/yuriw-2021-08-24_19:42:41-rados-wip-yuri8-testing-2021-08-24-0913-distro-basic-smithi/6356552

    BlueStore test assertion failure
    Tracked by: https://tracker.ceph.com/issues/52398

  2. http://pulpito.front.sepia.ceph.com/yuriw-2021-08-24_19:42:41-rados-wip-yuri8-testing-2021-08-24-0913-distro-basic-smithi/6356560

    Failure in qa/workunits/cephadm/test_dashboard_e2e.sh: orchestrator/01-hosts.e2e-spec.ts
    New tracker added: https://tracker.ceph.com/issues/52417

  3. http://pulpito.front.sepia.ceph.com/yuriw-2021-08-24_19:42:41-rados-wip-yuri8-testing-2021-08-24-0913-distro-basic-smithi/6356582

    src/os/bluestore/BlueStore.cc: 17517: FAILED ceph_assert(lcl_extnt_map[offset] == length)
    Tracked by: https://tracker.ceph.com/issues/52138

  4. http://pulpito.front.sepia.ceph.com/yuriw-2021-08-24_19:42:41-rados-wip-yuri8-testing-2021-08-24-0913-distro-basic-smithi/6356625

    First error reported: raise RuntimeError("Synthetic exception in serve")
    Test that failed: tasks.mgr.test_module_selftest.TestModuleSelftest.
    Tracked by: https://tracker.ceph.com/issues/38455

  5. http://pulpito.front.sepia.ceph.com/yuriw-2021-08-24_19:42:41-rados-wip-yuri8-testing-2021-08-24-0913-distro-basic-smithi/6356636

    RGW test failure: cls_rgw.bi_list
    Tracked by: https://tracker.ceph.com/issues/52315

  6. http://pulpito.front.sepia.ceph.com/yuriw-2021-08-24_19:42:41-rados-wip-yuri8-testing-2021-08-24-0913-distro-basic-smithi/6356646

    Test failure in: src/test/cls_cmpomap/test_cls_cmpomap.cc - [ FAILED ] CmpOmap.cmp_vals_u64_invalid_default
    Fixed by: pacific: cls/cmpomap: empty values are 0 in U64 comparisons #42908

  7. http://pulpito.front.sepia.ceph.com/yuriw-2021-08-24_19:42:41-rados-wip-yuri8-testing-2021-08-24-0913-distro-basic-smithi/6356682

    api_watch_notify_pp: [ FAILED ] LibRadosWatchNotifyECPP.WatchNotify
    Core dump due to Segmentation fault during the test on smithi067
    2021-08-25T01:40:59.442 INFO:tasks.workunit.client.0.smithi067.stderr:bash: line 1: 44976 Segmentation fault (core dumped) ceph_test_rados_api_watch_notify_pp 2>&
    Tracked by: https://tracker.ceph.com/issues/50042

  8. http://pulpito.front.sepia.ceph.com/yuriw-2021-08-24_19:42:41-rados-wip-yuri8-testing-2021-08-24-0913-distro-basic-smithi/6356691

    ERROR: test_standby (tasks.mgr.test_prometheus.TestPrometheus)
    urllib3.exceptions.LocationParseError: Failed to parse: http://172.21.15.125:7789metrics
    Fixed by: mgr/{prometheus,restful}: Fix url generation again #42886

  9. http://pulpito.front.sepia.ceph.com/yuriw-2021-08-24_19:42:41-rados-wip-yuri8-testing-2021-08-24-0913-distro-basic-smithi/6356708

    Same as 5 above.

  10. http://pulpito.front.sepia.ceph.com/yuriw-2021-08-24_19:42:41-rados-wip-yuri8-testing-2021-08-24-0913-distro-basic-smithi/6356725

    Same as 2 above

  11. http://pulpito.front.sepia.ceph.com/yuriw-2021-08-24_19:42:41-rados-wip-yuri8-testing-2021-08-24-0913-distro-basic-smithi/6356726

    Rook related: 'check osd count' reached maximum tries
    Tracked by: https://tracker.ceph.com/issues/52321

  12. http://pulpito.front.sepia.ceph.com/yuriw-2021-08-24_19:42:41-rados-wip-yuri8-testing-2021-08-24-0913-distro-basic-smithi/6356777

    RuntimeError: uid/gid not found:
    Tracked by: https://tracker.ceph.com/issues/50280 or its related trackers.

  13. http://pulpito.front.sepia.ceph.com/yuriw-2021-08-24_19:42:41-rados-wip-yuri8-testing-2021-08-24-0913-distro-basic-smithi/6356797

    Test: workloads/dedup-io-snaps Assert Failure: src/test/osd/RadosModel.h: 2966: FAILED ceph_assert(!context->check_oldest_snap_flushed(oid, snap))
    New tracker added: https://tracker.ceph.com/issues/52418

  14. http://pulpito.front.sepia.ceph.com/yuriw-2021-08-24_19:42:41-rados-wip-yuri8-testing-2021-08-24-0913-distro-basic-smithi/6356805

    Rook related. ERROR:tasks.rook:'wait for operator' reached maximum tries (90) after waiting for 900 second
    New tracker added: https://tracker.ceph.com/issues/52420

  15. http://pulpito.front.sepia.ceph.com/yuriw-2021-08-24_19:42:41-rados-wip-yuri8-testing-2021-08-24-0913-distro-basic-smithi/6356807

    Same as 6 above

  16. http://pulpito.front.sepia.ceph.com/yuriw-2021-08-24_19:42:41-rados-wip-yuri8-testing-2021-08-24-0913-distro-basic-smithi/6356808

    Assertion failure: ceph_assert(session_map.sessions.empty())
    Tracked by: https://tracker.ceph.com/issues/39150

  17. http://pulpito.front.sepia.ceph.com/yuriw-2021-08-24_19:42:41-rados-wip-yuri8-testing-2021-08-24-0913-distro-basic-smithi/6356845

    Same as 4 above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants