osd: add an external operation queue #18280

bspark8 · 2017-10-13T02:44:30Z

As in the number of shards in Op Queue (#16369),
an external operation queue was introduced to solve deepening of the dmClock queue depth problem due to multiple sharded op queue. (https://www.slideshare.net/ssusercee823/implementing-distributed-mclock-in-ceph#13)

As a result of measuring the performance of the external operation queue, there was no significant difference in performance from when the external operation queue was not used.

FIO 4KB Random Write without external operation queue, bluestore (original)
137352 IOPs
FIO 4KB Random Write with external operation queue, bluestore (external opqueue: WPQ, internal sharded opqueue: WPQ)
136914 IOPs

That is responsible for external client requests only. Signed-off-by: bspark <bspark8@sk.com>

myoungwon · 2017-10-17T10:35:11Z

@liewegas @ivancich We need to discuss this PR. A single external operation queue really does not have a negative effect on performance?

liewegas · 2017-10-17T12:09:16Z

Can you explain how this is different than configuring one shard with many threads servicing that shard?

bspark8 · 2017-10-17T13:19:26Z

When applying dmClock to Ceph, there are some problems as mentioned at #16369.

Problems

The identifier of dmClock ServiceTracker
The number of shards in Op Queue
Weight Control's Delta/Rho for Background I/O

Currently, we are working on the following two solutions (A,B) at the same time.
Because A is not considered as a perfect solution.

A. (#16369) + one shard with many threads servicing that shard
B. Applied to external op queue

In the case of the A method, the problems 1 and 2 are solved.
However, this should force the user to set one shard with many threads servicing that shard.
This can be seen as removing the existing Ceph shard structure.
Also, 3 problem is not solved. (Alternative is currently being proposed, use the normalized Delta/Rho value occurring in the current OSD)

When B methed (the external op queue) is used, the problems 1,2 and 3 are solved at the same time.
The user does not need to know the presence of the shard and operates independently of the existing Ceph shard structure.
3 problem Also, in the external op queue, only QoS between the client IOs is performed, so there is no need to care about the QoS between the client IOs and the background IOs.

yuyuyu101 · 2017-10-17T13:40:32Z

it looks introduce another queue and threadpool. I don't think it's a free lunch especially in this critical path....

liewegas · 2017-10-17T13:54:34Z

Yeah, I'm very interesting in hearing whether we can normalize delta/rho instead...

bspark8 · 2017-10-18T13:55:39Z

(https://www.slideshare.net/ssusercee823/implementing-distributed-mclock-in-ceph#12)
As mentioned in the URL above, when applying per-client QoS in Ceph, we have to handle Client I/O and Background I/O in one queue at the same time in current OP queue structure.

In case of Client I/O, Delta/Rho are required to calculate tagging value. Therefore, Background I/O also needs Delta/Rho to perform fairly QoS between Client I/O and Background I/O.

However, in the case of Background I/O, the location of dmClock ServiceTracker for calculating Delta/Rho is somewhat ambiguous, and direct comparison with Delta/Rho of Client I/O is also difficult.

Therefore, the using the normalized (e.g. unit time average value) Delta/Rho value occurring in the current OSD could be the way.

bspark8 · 2017-10-18T14:46:58Z

Then, according to the opinion of Sage, Haomai, the external op is paused and tries to proceed with the following method first.

dmclock: Delivery of the dmclock delta, rho and phase parameter + Enabling the client service tracker #16369 's rebase with Eric's PR (osd: replace mclock subop opclass w/ rep_op opclass; combine duplicated code #18194)
using one shard with many threads servicing that shard
using the normalized Delta/Rho value

Thank you.

ivancich · 2017-10-24T14:43:53Z

@bspark8 I would like to hear more about delta/rho normalization. My understanding is that you feel we need this because the delta and rho values that come in with client requests disadvantage external client ops relative to background ops. But the reason for this is that those external clients are also getting some of their ops serviced by different OSDs, and delta and rho exist to factor that in. So it's not clear to me why this is an issue.

Perhaps a related and overarching issue is how to combine the different (but interrelated) purposes we see dmclock playing. We want dmclock to prioritize different classes of operations (client ops, background ops -- snaptrip, scrub, recovery). On top of that, we want to control the relative priority of different external client ops (initially by pool or by rbd image).

Right now we're flattening these various kinds of requests and priorities into a single dmclock queue on the OSD (assuming 1 shard). So each client competes on an equal basis (modulo dmclock params) to every other client and every background process. The more clients requests that are out there the more they could diminish necessary background processes, because each brings with it its own reservation and weight. Perhaps this is what we ultimately want, but I'm thinking about another possibility.

I'm thinking about a hierarchical dmclock. At the top level we could set "global" priorities of the various background processes against all all client ops collectively. So there'd be perhaps four or five top-level categories (not sure whether replication ops would be separate at this level or not). Then, within the client ops category we'd have another dmclock queue where the clients would compete for their place using the dmclock priorities for, for example, the rbd image or pool.

Now we have global controls to weight the background processes against client requests. And we have separate control to prioritize various clients.

[Depending on our ultimate goals, perhaps it would be useful to consider the inversion of this hierarchy -- with each client at the top level along with a collective of all background ops, and then a lower level for the various types of background ops. That, however, seems to get us further from our goals as I understand them, though.]

I don't think the implementation would be that difficult. I imagine a dmclock queue at the top level, where as each client op comes in, it receives a proxy op with the global client configuration (reservation, weight, limit). Then when we're ready to pull a client op for execution, we descend to the next level dmclock to choose among the clients using client specific configurations (reservation, weight, limit).

I'd be curious what others think of this.

bspark8 · 2017-10-25T15:36:24Z

Thank you for your feedback and I totally agree with you.
Based on what you have said, I rearranged it as follows.

The main reason for the need of hierarchical dmclock is:
In case of flattening these various kinds of requests and priorities into a single dmclock queue on the OSD,
as you say, as the number of individual client types grows, the share of necessary background ops will be relatively diminished.

The current PR, external and internal dmclock op queues can also be considered hierarchical dmclock.

The first thing to discuss is about the implementation of a hierarchical structure.

Like the current PR, two layers of dmclock queue are used in Ceph directly.
Implement two hierarchical structures within the dmclock library.

In addition, unrelated to the topic, the in delta/rho normalization, thought as follows.

clients ops from client: use delta/rho generated from client tracker
clients ops from OSD (replication ops): use normalized delta/rho
background ops (snaptrip, scrub, recovery): use normalized delta/rho

stale · 2018-10-18T08:53:58Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
If you are a maintainer or core committer, please follow-up on this issue to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

stale · 2018-12-18T12:42:25Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
If you are a maintainer or core committer, please follow-up on this issue to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

stale · 2019-04-22T16:34:51Z

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

osd: add an external operation queue

5137c1b

That is responsible for external client requests only. Signed-off-by: bspark <bspark8@sk.com>

bspark8 mentioned this pull request Oct 13, 2017

dmclock: Delivery of the dmclock delta, rho and phase parameter + Enabling the client service tracker #16369

Merged

myoungwon requested review from liewegas and ivancich October 17, 2017 10:35

myoungwon added core feature labels Oct 17, 2017

bspark8 mentioned this pull request Jan 2, 2018

osd: add mClockPoolQueue to support pool unit qos #19340

Closed

stale bot added the stale label Oct 18, 2018

liewegas closed this Oct 19, 2018

liewegas reopened this Oct 19, 2018

stale bot removed the stale label Oct 19, 2018

stale bot added the stale label Dec 18, 2018

stale bot closed this Apr 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osd: add an external operation queue #18280

osd: add an external operation queue #18280

bspark8 commented Oct 13, 2017

myoungwon commented Oct 17, 2017 •

edited

liewegas commented Oct 17, 2017

bspark8 commented Oct 17, 2017

yuyuyu101 commented Oct 17, 2017

liewegas commented Oct 17, 2017

bspark8 commented Oct 18, 2017

bspark8 commented Oct 18, 2017

ivancich commented Oct 24, 2017

bspark8 commented Oct 25, 2017 •

edited

stale bot commented Oct 18, 2018

stale bot commented Dec 18, 2018

stale bot commented Apr 22, 2019

osd: add an external operation queue #18280

osd: add an external operation queue #18280

Conversation

bspark8 commented Oct 13, 2017

myoungwon commented Oct 17, 2017 • edited

liewegas commented Oct 17, 2017

bspark8 commented Oct 17, 2017

yuyuyu101 commented Oct 17, 2017

liewegas commented Oct 17, 2017

bspark8 commented Oct 18, 2017

bspark8 commented Oct 18, 2017

ivancich commented Oct 24, 2017

bspark8 commented Oct 25, 2017 • edited

stale bot commented Oct 18, 2018

stale bot commented Dec 18, 2018

stale bot commented Apr 22, 2019

myoungwon commented Oct 17, 2017 •

edited

bspark8 commented Oct 25, 2017 •

edited