Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: Make dmclock's anticipation timeout be configurable #18827

Merged
merged 1 commit into from Jan 7, 2018

Conversation

TaewoongKim
Copy link
Contributor

@TaewoongKim TaewoongKim commented Nov 9, 2017

This adds a configuration option that can control anticipation timeout for dmclock

This helps more accurate QoS or priority based scheduling when dmclock is used with this.

By setting with an appropriate value, a client or an operation type could take their unused resource that could be forfeited by other aggressive clients or operation types.

Signed-off-by: Taewoong Kim taewoong.kim@sk.com

mClockClientQueue::mClockClientQueue(CephContext *cct,
double anticipation_timeout) :
queue(std::bind(&mClockClientQueue::op_class_client_info_f, this, _1),
anticipation_timeout),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is anticipation_timeout used ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. I missed something.
Thank you for pointing that. I fixed it.

@myoungwon
Copy link
Member

@TaewoongKim This PR seems that only the anticipation timeout is set. Could you explain how this value is used ?

@TaewoongKim
Copy link
Contributor Author

TaewoongKim commented Nov 27, 2017

@myoungwon Yes, it just set dmclock's parameter
I requested PR of dmclock project about anticipation timeout.
It was merged a few weeks ago. (ceph/dmclock#43, ceph/dmclock#34)
Now, this PR is for enabling the dmclock anticipation timeout PR on Ceph.

Dmclock scheduler supports IOPS reservation.
However, an aggressive worker can take light woker's reserved shares.
Assume that worker A is a light worker whose IOPS reservation is 100 IOPS and worker B is an aggressive worker whose IOPS reservation is 1 IOPS.
Also assume that Woker A generates just 10 IOs in one second but every its IO does not come in exactly every 10 ms(arrived with a very small variation)
In this case, Worker A couldn't get serviced 10 IOPS, because worker B takes worker A's share.
If Worker A's IO is a little late(even 1ms) Worker B's IOs will be processed rather than Woker A's and Worker A's IO will be delayed.
(You can see an example in ceph/dmclock#34)

This is because dmclock reset the time tag of worker A's IO.
In a normal case, dmclock set time tag for IO based on previous IO's tag.
However, if an IO arrived more than (1/reserved IOPS) ms later since the previous IO that belongs to the same worker arrived,
the time tag of newly arrived IO is reset by the current time.

Setting anticipation timeout can prevent this situation.
Reset will be deferred by anticipation timeout and time tag will be set based on previous IO's tag.

@myoungwon
Copy link
Member

@TaewoongKim need rebase

Signed-off-by: Taewoong Kim <taewoong.kim@sk.com>
@TaewoongKim
Copy link
Contributor Author

@myoungwon Rebased

@myoungwon
Copy link
Member

@tchaikov @liewegas @ivancich Could you take a look? dmclock code related to this PR has already been merged.

@yuriw
Copy link
Contributor

yuriw commented Jan 6, 2018

@ivancich ivancich merged commit 158f317 into ceph:master Jan 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants