Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quincy: osd: Apply randomly selected scheduler type across all OSD shards #54980

Closed
wants to merge 4 commits into from

Conversation

sseshasa
Copy link
Contributor

backport tracker: https://tracker.ceph.com/issues/63873


backport of #53524
parent tracker: https://tracker.ceph.com/issues/62171

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/master/src/script/ceph-backport.sh

mClockPriorityQueue (mClockQueue class) is an older mClock implementation
of the OpQueue abstraction. This was replaced by a simpler implementation
of the OpScheduler abstraction as part of
ceph#30650.

The simpler implementation of mClockScheduler is being currently used.
This commit removes the unused src/common/mClockPriorityQueue.h along
with the associated unit test file: test_mclock_priority_queue.cc.

Other miscellaneous changes,
 - Remove the cmake references to the unit test file
 - Remove the inclusion of the header file in mClockScheduler.h

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit 28a26f7)
…ards

Originally, the choice of 'debug_random' for osd_op_queue resulted in the
selection of a random scheduler type for each OSD shard. A more realistic
scenario for testing would be the selection of the random scheduler type
applied globally for all shards of an OSD. In other words, all OSD shards
would employ the same scheduler type. For e.g., this scenario would be
possible during upgrades when the scheduler type has changed between
releases.

The following changes are made as part of the commit:
 1. Introduce enum class op_queue_type_t within osd_types.h that holds the
    various op queue types supported. This header in included by OpQueue.h.
    Add helper functions osd_types.cc to return the op_queue_type_t as
    enum or a string representing the enum member.
 2. Determine the scheduler type before initializing the OSD shards in
    OSD class constructor.
 3. Pass the determined op_queue_type_t to the OSDShard's make_scheduler()
    method for each shard. This ensures all shards of the OSD are
    initialized with the same scheduler type.
 4. Rename & modify the unused OSDShard::get_scheduler_type() method to
    return op_queue_type_t set for the queue.
 5. Introduce OpScheduler::get_type() and OpQueue::get_type() pure
    virtual functions and define them within the respective queue
    implementation. This returns a value pertaining to the op queue type.
    This is called by OSDShard::get_op_queue_type().
 6. Add OSD::osd_op_queue_type() method for determining the scheduler
    type set on the OSD shards. Since all OSD shards are set to use
    the same scheduler type, the shard with the lowest id is used to
    get the scheduler type using OSDShard::get_op_queue_type().
 7. Improve comment description related to 'osd_op_queue' option in
    common/options/osd.yaml.in.

Call Flow
--------
OSD                     OSDShard                 OpScheduler/OpQueue
---                     --------                 -------------------
osd_op_queue_type() ->
                        get_op_queue_type() ->
                                                 get_type()

Fixes: https://tracker.ceph.com/issues/62171
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit 96df279)
…system

All OSD shards are guaranteed to use the same scheduler type. Therefore,
OSD::osd_op_queue_type() is used where applicable to determine the
scheduler type. This results in the appropriate setting of other config
options based on the randomly selected scheduler type in case the global
'osd_op_queue' config option is set to 'debug_random' (for e.g., in CI
tests).

Note: If 'osd_op_queue' is set to 'debug_random', the PG specific code
(PGPeering, PrimaryLogPG) would continue to use the existing mechanism of
querying the config option key (osd_op_queue) as before using get_val().

Fixes: https://tracker.ceph.com/issues/62171
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit fadc097)

Conflicts:
        src/osd/OSD.cc
- Removed OSD::maybe_override_cost_for_qos() definition which is yet to
  be backported to quincy.
Determine the op priority cutoff for an OSD and apply it on all the OSD
shards, which is a more realistic scenario. Previously, the cut off value
was randomized between OSD shards leading to issues in testing. The IO
priority cut off is first determined before initializing the OSD shards.
The cut off value is then passed to the OpScheduler implementations that
are modified accordingly to apply the values during initialization.

Fixes: https://tracker.ceph.com/issues/62171
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit bfbc6b6)
@sseshasa sseshasa requested a review from a team as a code owner December 21, 2023 07:11
@sseshasa sseshasa added this to the quincy milestone Dec 21, 2023
@sseshasa sseshasa added the core label Dec 21, 2023
@sseshasa sseshasa added the DNM label Dec 21, 2023
@sseshasa
Copy link
Contributor Author

Note to self: The commit related to cutoff_priority (28a940f) should not be applied to the mClockScheduler code since the high priority queue implementation was not backported to quincy (PR: #49482) as it was introduced only in reef. I will either close or modify the PR once I get clarification on this. Until then I am marking this as DNM.

@sseshasa
Copy link
Contributor Author

Closing this backport as per comment #54980 (comment) and since it primarily affects CI tests. These changes will only be available on Reef releases and beyond.

@sseshasa sseshasa closed this Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant