Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon/OSDMonitor, osd: Add warning on filestore deprecation and force use of wpq scheduler for filestore OSDs #39440

Merged
merged 3 commits into from Jan 12, 2022

Conversation

pdvian
Copy link

@pdvian pdvian commented Feb 12, 2021

This PR addresses 2 changes :

  • Notify end user on filestore deprecation through ceph health detail warning.
  • The 'mclock_scheduler' is not supported for filestore. The default scheduler type is 'wpq' for filestore OSDs and will enforce scheduler type to 'wpq' if user tries to override it.

Fixes: https://tracker.ceph.com/issues/49275

Signed-off-by: Prashant D pdhange@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

src/mon/OSDMonitor.cc Outdated Show resolved Hide resolved
src/mon/OSDMonitor.cc Outdated Show resolved Hide resolved
src/mon/OSDMonitor.cc Show resolved Hide resolved
@vumrao
Copy link
Contributor

vumrao commented Jun 2, 2021

ping @pdvian

@neha-ojha
Copy link
Member

@pdvian I think we could incorporate FileStore deprecation message in Quincy in this PR based on https://www.spinics.net/lists/ceph-users/msg66668.html

@pdvian
Copy link
Author

pdvian commented Nov 12, 2021

Moving discussion about commit message here.

Prashant, Thu 11:52 PM
@Neha About filestore deprecation message in PR, does below message in commit looks good to you ?

"Filestore will be deprecated in the next release (Quincy)
considering bluestore is the default objectstore now and
more widely used since quite some time."

Also should we include this message in PendingRelease as well ?

I am doing testing on latest changes according to David's inputs and will be pushing changes to PR#39440 by tomorrow.

Brad, Yesterday 8:11 AM
@prashant maybe change 'since quite some time' to 'for quite some time'?

Neha, Yesterday 8:12 AM
Hey @prashant @brad Hubbard!

Neha, Yesterday 8:15 AM
Let's move this discussion to the PR, just for everyone's record.

Something like "Filestore will be deprecated in Quincy,
considering BlueStore has been the default objectstore for quite some time" makes sense

and yes to release notes!

@pdvian
Copy link
Author

pdvian commented Nov 13, 2021

@neha-ojha @badone I did not consider the scenario where filestore OSD(s) getting removed from the cluster and those filestore OSDs should not be seen in the health warn. Re-pushing the changes for handling this scenario.

Copy link
Contributor

@badone badone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks OK to me mate.

@neha-ojha
Copy link
Member

@pdvian pdvian changed the title mon/OSDMonitor: Raise health warning for filestore osds mon/OSDMonitor, osd: Add warning on filestore deprecation and force use of wpq scheduler for filestore OSDs Dec 14, 2021
@github-actions github-actions bot added the tests label Dec 14, 2021
@pdvian
Copy link
Author

pdvian commented Dec 16, 2021

Does this looks good from ceph status and health detail specific ??

ceph status :
health: HEALTH_WARN
9 osd(s) are on filestore; Filestore has been deprecated, the 'osd_op_queue' is enforced to 'wpq' for filestore OSDs.

ceph health detail :
[WRN] OSD_FILESTORE: 9 osd(s) are on filestore; Filestore has been deprecated, the 'osd_op_queue' is enforced to 'wpq' for filestore OSDs.
Filestore OSDs [osd.0,osd.1,osd.2,osd.3,osd.4,osd.5,osd.6,osd.7,osd.8]

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

src/mon/OSDMonitor.cc Outdated Show resolved Hide resolved
PendingReleaseNotes Outdated Show resolved Hide resolved
src/mon/OSDMonitor.cc Outdated Show resolved Hide resolved
@neha-ojha
Copy link
Member

Does this looks good from ceph status and health detail specific ??

@pdvian @sseshasa How about?

ceph status : health: HEALTH_WARN 9 osd(s) are on filestore; Filestore has been deprecated, the 'osd_op_queue' is enforced to 'wpq' for filestore OSDs.

ceph status : health: HEALTH_WARN 9 osd(s) are running Filestore [Deprecated]

ceph health detail : [WRN] OSD_FILESTORE: 9 osd(s) are on filestore; Filestore has been deprecated, the 'osd_op_queue' is enforced to 'wpq' for filestore OSDs. Filestore OSDs [osd.0,osd.1,osd.2,osd.3,osd.4,osd.5,osd.6,osd.7,osd.8]

ceph health detail : [WRN] OSD_FILESTORE: 9 osd(s) are running Filestore, which has been deprecated and not been optimized for QoS (Filestore OSDs will use osd_op_queue=wpq and cannot be used with mclock_scheduler). List Filestore OSDs using <command>.

I think we should avoid listing all the osds in ceph health detail, given that for a large Filestore cluster, this list will be huge. Instead we should provide a way for users to list all the Filestore OSDs using existing commands like ceph osd metadata or ceph report. @badone had a good idea of providing a jquery command that the parses the output of such commands.

src/mon/OSDMonitor.cc Outdated Show resolved Hide resolved
@pdvian
Copy link
Author

pdvian commented Dec 21, 2021

Does this looks good from ceph status and health detail specific ??

@pdvian @sseshasa How about?

ceph status : health: HEALTH_WARN 9 osd(s) are on filestore; Filestore has been deprecated, the 'osd_op_queue' is enforced to 'wpq' for filestore OSDs.

ceph status : health: HEALTH_WARN 9 osd(s) are running Filestore [Deprecated]

ceph health detail : [WRN] OSD_FILESTORE: 9 osd(s) are on filestore; Filestore has been deprecated, the 'osd_op_queue' is enforced to 'wpq' for filestore OSDs. Filestore OSDs [osd.0,osd.1,osd.2,osd.3,osd.4,osd.5,osd.6,osd.7,osd.8]

ceph health detail : [WRN] OSD_FILESTORE: 9 osd(s) are running Filestore, which has been deprecated and not been optimized for QoS (Filestore OSDs will use osd_op_queue=wpq and cannot be used with mclock_scheduler). List Filestore OSDs using <command>.

I think we should avoid listing all the osds in ceph health detail, given that for a large Filestore cluster, this list will be huge. Instead we should provide a way for users to list all the Filestore OSDs using existing commands like ceph osd metadata or ceph report. @badone had a good idea of providing a jquery command that the parses the output of such commands.

@neha-ojha We can use either of below commands to list filestore OSDs :

  1. list with id and osd_objectstore
    ceph report | jq -c '."osd_metadata" | .[] | {id, osd_objectstore} | select(.osd_objectstore | contains("filestore"))

  2. list with id only
    ceph report | jq -c '."osd_metadata" | .[] | {id, osd_objectstore} | select(.osd_objectstore | contains("filestore"))|{id}

The health detail will look like :
HEALTH_WARN 3 osd(s) are running Filestore [Deprecated] [WRN] OSD_FILESTORE: 3 osd(s) are running Filestore [Deprecated] 3 osd(s) are running Filestore, which has been deprecated and not been optimized for QoS (Filestore OSDs will use 'osd_op_queue = wpq' and cannot be used with mclock_scheduler). List Filestore OSDs using 'ceph report | jq -c '."osd_metadata" | .[] | {id, osd_objectstore} | select(.osd_objectstore | contains("filestore"))''

@badone Does this looks good to you ?

@pdvian pdvian requested a review from badone December 21, 2021 01:44
Prashant D and others added 3 commits January 5, 2022 10:08
Filestore will be deprecated in Quincy, considering
that BlueStore has been the default objectstore for
quite some time.

Fixes: https://tracker.ceph.com/issues/49275

Signed-off-by: Prashant D <pdhange@redhat.com>
…OSDs

The 'mclock_scheduler' is not supported for filestore OSDs. Enforce the
usage of 'wpq' scheduler for such OSDs to avoid issues.

Also, in this scenario, the override of various config settings for the
'mclock_scheduler' are not performed.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Signed-off-by: Prashant D <pdhange@redhat.com>
@neha-ojha
Copy link
Member

jenkins test api

@neha-ojha
Copy link
Member

@pdvian filestore is used by a lot of test suites, we should probably have @yuriw run this PR through sanity testing of all suites, including upgrades

@sseshasa
Copy link
Contributor

Teuthology Test report

RADOS Suite Runs

  1. http://pulpito.front.sepia.ceph.com/yuriw-2022-01-07_22:42:09-rados-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/
  2. http://pulpito.front.sepia.ceph.com/yuriw-2022-01-06_15:57:04-rados-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/

Unrelated Failures (Across both runs)

  1. http://pulpito.front.sepia.ceph.com/yuriw-2022-01-06_15:57:04-rados-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/6599449
    Existing tracker: https://tracker.ceph.com/issues/45721

  2. http://pulpito.front.sepia.ceph.com/yuriw-2022-01-06_15:57:04-rados-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/6599471
    Existing tracker: https://tracker.ceph.com/issues/52319

  3. http://pulpito.front.sepia.ceph.com/yuriw-2022-01-06_15:57:04-rados-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/6599513
    New tracker created: https://tracker.ceph.com/issues/53827

  4. http://pulpito.front.sepia.ceph.com/yuriw-2022-01-06_15:57:04-rados-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/6599589
    Existing tracker: https://tracker.ceph.com/issues/49287

  5. http://pulpito.front.sepia.ceph.com/yuriw-2022-01-07_22:42:09-rados-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/6602050
    Existing tracker: https://tracker.ceph.com/issues/52124

  6. http://pulpito.front.sepia.ceph.com/yuriw-2022-01-07_22:42:09-rados-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/6602060
    Existing tracker: https://tracker.ceph.com/issues/49287

Upgrade Suite Run

http://pulpito.front.sepia.ceph.com/yuriw-2022-01-06_15:53:32-upgrade-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/

Unrelated Failures

  1. http://pulpito.front.sepia.ceph.com/yuriw-2022-01-06_15:53:32-upgrade-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/6598852
    http://pulpito.front.sepia.ceph.com/yuriw-2022-01-06_15:53:32-upgrade-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/6598902
    http://pulpito.front.sepia.ceph.com/yuriw-2022-01-06_15:53:32-upgrade-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/6598951
    http://pulpito.front.sepia.ceph.com/yuriw-2022-01-06_15:53:32-upgrade-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/6599001

    Failure Reason: rgw multisite test failures
    Existing tracker: https://tracker.ceph.com/issues/52653

  2. http://pulpito.front.sepia.ceph.com/yuriw-2022-01-06_15:53:32-upgrade-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/6598858
    http://pulpito.front.sepia.ceph.com/yuriw-2022-01-06_15:53:32-upgrade-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/6598880
    http://pulpito.front.sepia.ceph.com/yuriw-2022-01-06_15:53:32-upgrade-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/6598931
    http://pulpito.front.sepia.ceph.com/yuriw-2022-01-06_15:53:32-upgrade-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/6598959
    http://pulpito.front.sepia.ceph.com/yuriw-2022-01-06_15:53:32-upgrade-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/6598981
    http://pulpito.front.sepia.ceph.com/yuriw-2022-01-06_15:53:32-upgrade-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/6599010
    http://pulpito.front.sepia.ceph.com/yuriw-2022-01-06_15:53:32-upgrade-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/6599030

    Failure Reason: Command failed on smithi* with status 2: 'cd /home/ubuntu/cephtest/ragweed && ./bootstrap'
    New tracker created: https://tracker.ceph.com/issues/53829

@yuriw yuriw merged commit a8bb49d into ceph:master Jan 12, 2022
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants