-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quincy: osd: Apply randomly selected scheduler type across all OSD shards #54980
Conversation
mClockPriorityQueue (mClockQueue class) is an older mClock implementation of the OpQueue abstraction. This was replaced by a simpler implementation of the OpScheduler abstraction as part of ceph#30650. The simpler implementation of mClockScheduler is being currently used. This commit removes the unused src/common/mClockPriorityQueue.h along with the associated unit test file: test_mclock_priority_queue.cc. Other miscellaneous changes, - Remove the cmake references to the unit test file - Remove the inclusion of the header file in mClockScheduler.h Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com> (cherry picked from commit 28a26f7)
…ards Originally, the choice of 'debug_random' for osd_op_queue resulted in the selection of a random scheduler type for each OSD shard. A more realistic scenario for testing would be the selection of the random scheduler type applied globally for all shards of an OSD. In other words, all OSD shards would employ the same scheduler type. For e.g., this scenario would be possible during upgrades when the scheduler type has changed between releases. The following changes are made as part of the commit: 1. Introduce enum class op_queue_type_t within osd_types.h that holds the various op queue types supported. This header in included by OpQueue.h. Add helper functions osd_types.cc to return the op_queue_type_t as enum or a string representing the enum member. 2. Determine the scheduler type before initializing the OSD shards in OSD class constructor. 3. Pass the determined op_queue_type_t to the OSDShard's make_scheduler() method for each shard. This ensures all shards of the OSD are initialized with the same scheduler type. 4. Rename & modify the unused OSDShard::get_scheduler_type() method to return op_queue_type_t set for the queue. 5. Introduce OpScheduler::get_type() and OpQueue::get_type() pure virtual functions and define them within the respective queue implementation. This returns a value pertaining to the op queue type. This is called by OSDShard::get_op_queue_type(). 6. Add OSD::osd_op_queue_type() method for determining the scheduler type set on the OSD shards. Since all OSD shards are set to use the same scheduler type, the shard with the lowest id is used to get the scheduler type using OSDShard::get_op_queue_type(). 7. Improve comment description related to 'osd_op_queue' option in common/options/osd.yaml.in. Call Flow -------- OSD OSDShard OpScheduler/OpQueue --- -------- ------------------- osd_op_queue_type() -> get_op_queue_type() -> get_type() Fixes: https://tracker.ceph.com/issues/62171 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com> (cherry picked from commit 96df279)
…system All OSD shards are guaranteed to use the same scheduler type. Therefore, OSD::osd_op_queue_type() is used where applicable to determine the scheduler type. This results in the appropriate setting of other config options based on the randomly selected scheduler type in case the global 'osd_op_queue' config option is set to 'debug_random' (for e.g., in CI tests). Note: If 'osd_op_queue' is set to 'debug_random', the PG specific code (PGPeering, PrimaryLogPG) would continue to use the existing mechanism of querying the config option key (osd_op_queue) as before using get_val(). Fixes: https://tracker.ceph.com/issues/62171 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com> (cherry picked from commit fadc097) Conflicts: src/osd/OSD.cc - Removed OSD::maybe_override_cost_for_qos() definition which is yet to be backported to quincy.
Determine the op priority cutoff for an OSD and apply it on all the OSD shards, which is a more realistic scenario. Previously, the cut off value was randomized between OSD shards leading to issues in testing. The IO priority cut off is first determined before initializing the OSD shards. The cut off value is then passed to the OpScheduler implementations that are modified accordingly to apply the values during initialization. Fixes: https://tracker.ceph.com/issues/62171 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com> (cherry picked from commit bfbc6b6)
Note to self: The commit related to cutoff_priority (28a940f) should not be applied to the mClockScheduler code since the high priority queue implementation was not backported to quincy (PR: #49482) as it was introduced only in reef. I will either close or modify the PR once I get clarification on this. Until then I am marking this as DNM. |
Closing this backport as per comment #54980 (comment) and since it primarily affects CI tests. These changes will only be available on Reef releases and beyond. |
backport tracker: https://tracker.ceph.com/issues/63873
backport of #53524
parent tracker: https://tracker.ceph.com/issues/62171
this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/master/src/script/ceph-backport.sh