librbd: reduce the TokenBucket fill cycle and support bursting io configuration. #24214

irides · 2018-09-21T07:43:41Z

Firstly, the TokenBucket shall put one token per 1/cir second(one tick), and I set the minimum of tick to 50ms for good accuracy.
Secondly, the bursting limit configuration is supported. The default is 0 for disabling bursting io.

Signed-off-by: Shiyang Ruan ruansy.fnst@cn.fujitsu.com

irides · 2018-09-27T03:03:17Z

@dillaman @yangdongsheng Hi, pelase take a review. Thanks!

yangdongsheng · 2018-09-27T05:52:12Z

Please split your change into two commits, one for improvement and the other for bursting.

BTW, yes, in theory, it's better to put token smoothly, but could you give a test case to show what's the difference we can get from your change about "good accuracy"?

Thanx

irides · 2018-09-27T08:50:54Z

@yangdongsheng Thank you for your advice.

I've split it into two commits.

The "good accuracy" is why I set the minimum value of tick to 50ms. Because we use the SafeTimer ( a software timer), it is realized by judging whether the time has arrived and triggering the callback function. Therefore, too small a tick value will result in too many callbacks, which will eventually lead to a long time and increase the error.

The purpose of the first commit is to shorten the token filling period to conform to the definition of the token bucket. If the fill period is 1s, the network IO will be concentrated at the beginning of 1s, which may cause other client access delays.

The implementation of the bursting IO limit is therefore simple.

yangdongsheng · 2018-09-27T09:21:05Z

@iridescent-rsy

What I am concerning is we are using the timer_singleton in ImageCtx, then if we schedule too much events on this thread, that will affect the other operations on this thread.

You want to make the filling cycle smaller and smaller, but how small is good enough? 50ms? is there any calculation about this setting?

BTW, the first commit looks not focus on filling cycle improvement, I found lots of bursting implementation, which should be in the second commit. If you can move these change into commit 2, we can make our discussion and review faster.

Thanx :)

yangdongsheng · 2018-09-27T09:29:55Z

src/common/Throttle.cc

    got = remain;
    remain = 0;
+  } else {
+    got = c;
+    remain -= c;


Any reason for this change?

I rewrite the class Bucket. Not deliberately want to change it.

src/common/Throttle.cc

yangdongsheng · 2018-09-27T09:30:05Z

src/common/Throttle.cc

  m_token_ctx = new FunctionContext(
      [this](int r) {
        schedule_timer();
      });
+  m_timer->add_event_after(m_schedule_tick / 1000.0, m_token_ctx);


add event every ms?

m_schedule_tick is the filling period, and it is caculated by the Bucket.

src/common/Throttle.cc

yangdongsheng · 2018-09-27T09:30:11Z

src/common/Throttle.h

+    std::atomic<uint64_t> tick = { 50 };
+    std::atomic<uint64_t> ticks_per_second = { 0 };
+    std::atomic<uint64_t> tokens_per_tick = { 0 };
+    std::atomic<uint64_t> _divide_left = { 0 };


what does cir, cbs and _devide_left mean? and are they atomic?

CIR (Committed Information Rate) is the rate that tokens generated.
CBS (Committed Burst Size) is the size of the TokenBucket, which is also means the burst limit.

... also agree about the atomic question. Shouldn't these be protected already by the lock?

yangdongsheng · 2018-09-27T09:30:16Z

src/librbd/io/ImageRequestWQ.h

@@ -75,6 +75,7 @@ class ImageRequestWQ

  void apply_qos_limit(uint64_t limit, const uint64_t flag);

+  void apply_qos_limit(uint64_t limit, uint64_t bursting, const uint64_t flag);


Looks don't need a new define. just add a parameter into apply_qos_limit().

@yangdongsheng My bad. Define a new one is to avoid changing the calling in test case.
How about moving the flag to the first argument ?
such as:
void apply_qos_limit(const uint64_t flag, uint64_t limit, uint64_t bursting);

This looks clear when calling it.
Thanks.

I am not sure I understand all what you want to say, but no need for a new function. And you need to update the unit-test at the same time, rather than avoid to change it.

dillaman · 2018-09-28T00:45:21Z

src/librbd/io/ImageRequestWQ.h

@@ -75,6 +75,7 @@ class ImageRequestWQ

  void apply_qos_limit(uint64_t limit, const uint64_t flag);

+  void apply_qos_limit(uint64_t limit, uint64_t bursting, const uint64_t flag);


dillaman · 2018-09-28T00:47:19Z

src/common/Throttle.h

-	remain(m), max(m)
-    {
-    }
+    std::atomic<uint64_t> cir = { 0 };


Nit: I think all these acronym variables should be spelled out (i.e. committed_burst_size) for clarity

dillaman · 2018-09-28T00:47:45Z

src/common/Throttle.h

+    std::atomic<uint64_t> tick = { 50 };
+    std::atomic<uint64_t> ticks_per_second = { 0 };
+    std::atomic<uint64_t> tokens_per_tick = { 0 };
+    std::atomic<uint64_t> _divide_left = { 0 };


Nit: drop the _ prefix

OK. Thanks!

dillaman · 2018-09-28T00:52:41Z

src/common/Throttle.h

+    std::atomic<uint64_t> tick = { 50 };
+    std::atomic<uint64_t> ticks_per_second = { 0 };
+    std::atomic<uint64_t> tokens_per_tick = { 0 };
+    std::atomic<uint64_t> _divide_left = { 0 };


... also agree about the atomic question. Shouldn't these be protected already by the lock?

dillaman · 2018-09-28T00:53:24Z

src/librbd/io/ImageRequestWQ.cc

@@ -624,7 +624,27 @@ void ImageRequestWQ<I>::apply_qos_limit(uint64_t limit, const uint64_t flag) {
    }
  }
  ceph_assert(throttle != nullptr);
-  throttle->set_max(limit);
+  throttle->set_max(0);


Nit: shouldn't most of these librbd changes be moved to the next commit?

dillaman · 2018-09-28T00:54:48Z

src/librbd/ImageCtx.cc

-	"rbd_qos_write_iops_limit", false)(
-	"rbd_qos_read_bps_limit", false)(
-	"rbd_qos_write_bps_limit", false);
+        "rbd_qos_iops_limit", false)(


Make sure you also add these to librbd/api/Config.cc

dillaman · 2018-10-15T14:15:12Z

@yangdongsheng are you still requesting changes?

yangdongsheng · 2018-10-16T02:21:58Z

@dillaman sorry I did not get notification about the update in this PR, will review it today.

yangdongsheng · 2018-10-16T02:47:52Z

src/common/Throttle.h

-    {
-    }
+    // minimum of the filling period.
+    static const uint64_t tick_min = 50;


As what we discussed offline, make this configurable. hard code as 50ms is not a good idea for all usecase.

yangdongsheng · 2018-10-16T02:47:56Z

src/common/Throttle.h

+    uint64_t put();
+    // set the committed_information_rate
+    bool set_average(uint64_t c);
+    // set the committed_burst_size


@iridescent-rsy thanx for the detailed comment here, it's helpful for review. But there is a comment about the design. Class Bucket should just be a "bucket", it should not take so many works here. I mean, the ticks_per_second, tokens_per_tick, devide_left, set_argerage() should be in the TokenBucketThrottle.

Bucket should only have a simple design, name, max, remain, get() put(). that's all. But how to use this bucket, as you wish in upper user, TokenBucketThrottle. There is only a user of Bucket, tokenBucketThrottle, but if we want a leakyBucketThrottle in future, we will reuse the bucket.

So move the Throttling related information from Bucket to TokenBucketThrottle.

@yangdongsheng Thanks for your advice. I'll fix it immediately.

yangdongsheng · 2018-10-16T03:08:38Z

src/common/Throttle.cc

-  if (remain > m || max == 0)
-    remain = m;
-  max = m;
+  committed_burst_size = m;


We need to update remain, if remain is too large here. For example, if the max is 100, average is 10 and remain is 100. that means the burst size is 100. But if we wan to set the burst size to 10. average to 1. If you don't set the remain to 10, user can still get an bursting 100 tokens.

yangdongsheng · 2018-10-16T03:10:31Z

@iridescent-rsy Commit about burst and unittest looks good. About the first commit, there are some comments. Please feel free to @yangdongsheng when you updated this PR. Thanx :)

irides · 2018-10-16T03:13:00Z

@yangdongsheng All right. Thank you very much.

irides · 2018-10-18T01:13:54Z

Hi @yangdongsheng, I've updated the commits. Please take a review. Thanks.

yangdongsheng · 2018-10-18T03:15:43Z

src/common/Throttle.h


-class TokenBucketThrottle {
+  uint64_t max = 0;
+  uint64_t rate = 0;


Why a bucket should have a rate? I mean, what is the rate mean in a real Bucket? I don't think this should be in Bucket.

In my opinion, the role of the Bucket is speed limit. So, the rate is a property of the Bucket.

yangdongsheng · 2018-10-18T03:25:42Z

src/common/Throttle.h

+  /**
+   * Set the Bucket size
+   * @param m bucket size
+   * @return true if size is not zero


You did not get the return-val, I am not sure why you want to return bool here.

yangdongsheng · 2018-10-18T03:30:09Z

src/common/Throttle.h

+
+class TokenBucketThrottle {
+
+  struct TokenBucket : Bucket {


Yes, you got what I said. But should we have a subclass of TokenBucket ? I think the TokenBucketThrottle can take all the work of TokenBucket here. I mean, TokenBucketThrottle can use Bucket and decide how to use it. No need to split the management work into TokenBucketThrottle and TokenBucket.

yangdongsheng · 2018-10-18T03:53:54Z

On 10/18/2018 11:33 AM, MrRyan wrote: ***@***.**** commented on this pull request. ------------------------------------------------------------------------ In src/common/Throttle.h <#24214 (comment)>: > -class TokenBucketThrottle { + uint64_t max = 0; + uint64_t rate = 0; In my opinion, the role of the Bucket is speed limit. So, the |rate| is a property of the Bucket.

Actually, Bucket is just a Bucket, it can be fill with something and you can get from it or put into it. Let's come back to the topic of your commit, reduce the fill cycle, so you should focus on the TokenBucketThrottle who is controlling the token filling work. Just add your logic into TokenBucketThrottle.

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#24214 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC35PYo9uAIav5Vjg2JM7w1QXZuIvKQlks5ul_akgaJpZM4Wznfa>.

irides · 2018-10-19T01:08:33Z

Hi @yangdongsheng, I've moved the control logic. Please take a review. Thanks : )

yangdongsheng · 2018-10-19T02:11:55Z

Thanx @iridescent-rsy , but sorry I will review it next Monday.

irides · 2018-10-23T07:57:22Z

Hi @yangdongsheng, I've updated the code. Please take a review. Thanks!!

yangdongsheng

@iridescent-rsy This version looks much better. Just few comments.

yangdongsheng · 2018-10-24T05:36:52Z

src/common/Throttle.cc

+  if (0 == m && 0 != m_avg) {
+    m_throttle.set_max(m_avg);
+  } else {
+    m_throttle.set_max(m);


So set_max(0) means no burst, right? What if I am going to set a max but the max is less than m_avg?

Yes. 0 is the default value for burst configuration.

The max should never less than m_avg. I'll correct it.

I prefer to return error if max < m_avg, rather than change the max silently. For example:
user want to set_max(100), but the m_avg is 1000, if we set the max to 1000 silently, user would think max is set to 100. but he will find some strange thing if the following code believe max is 100.

So I think we can return error in this case in set_max(), to tell user what happened.
Maybe we should do it like that:

class TokenBucketThrottle { m_avg; m_burst; int set_burst(burst) { if (burst > 0 && burst < m_avg) return -EINVAL; m_burst = burst; throttle.set_max(m_burst > 0? m_burst : m_avg); } int set_average(avg) { if (m_burst > 0 && avg > m_burst) return -EINVAL; ... ... } }

@yangdongsheng
I got it !
But I am not sure if I also need to handle the error in apply_qos_limit ?
Thanks.

@dillaman what's your opinion? when we are going to apply some metadata but they are invalid. should we fail in image opening?

No -- I don't think you should prevent the image from opening (since that would also prevent you from fixing it). I think librbd can log to lderr if it detects invalid configuration settings.

@dillaman Agreed. @iridescent-rsy Then just log to lderr .

@dillaman OK. Thanks.

Hi @dillaman @yangdongsheng
I'am adding the log. But there is a confusing problem.

As we know, the metadata will be applied twice when we set it by rbd config image set aaa bbb. The first time is when we are opening the image. And the second time is to apply the new config.

The problem is:
When we want to correct the value after we set an invalid one, it will still show the error log ( when opening the image, the image will still apply the old config, which contains the invalid value. )

I think this error log may confuse the users. Does that matter?

@iridescent-rsy seems like a necessary evil to me. I think if the log message includes the current values for average and burst it would be less confusing.

src/common/Throttle.cc

yangdongsheng · 2018-10-24T05:36:59Z

src/common/Throttle.cc

+      m_divide_left = avg % m_ticks_per_second;
+
+      if (0 == m_throttle.max) {
+        m_throttle.set_max(avg);


What if we are going to set a average larger than max?

src/common/Throttle.h

irides · 2018-10-24T09:22:44Z

@yangdongsheng Updated : )

yangdongsheng

@iridescent-rsy Just one comment. Others looks almost good to me. But one question, the calculation in set_average looks not elegant. can we make it more easy and safe?

yangdongsheng · 2018-10-25T01:11:24Z

src/common/Throttle.cc

+  if (0 == m && 0 != m_avg) {
+    m_throttle.set_max(m_avg);
+  } else {
+    m_throttle.set_max(m);


I prefer to return error if max < m_avg, rather than change the max silently. For example:
user want to set_max(100), but the m_avg is 1000, if we set the max to 1000 silently, user would think max is set to 100. but he will find some strange thing if the following code believe max is 100.

So I think we can return error in this case in set_max(), to tell user what happened.
Maybe we should do it like that:

class TokenBucketThrottle { m_avg; m_burst; int set_burst(burst) { if (burst > 0 && burst < m_avg) return -EINVAL; m_burst = burst; throttle.set_max(m_burst > 0? m_burst : m_avg); } int set_average(avg) { if (m_burst > 0 && avg > m_burst) return -EINVAL; ... ... } }

yangdongsheng · 2018-10-26T01:49:46Z

@iridescent-rsy Also, you need to update https://github.com/ceph/ceph/blob/master/qa/suites/rbd/thrash/workloads/rbd_fsx_rate_limit.yaml for more option case.

irides · 2018-10-26T02:18:46Z

@yangdongsheng All right. I'm working on it. Thank you.

irides · 2018-10-29T08:44:06Z

@yangdongsheng Updated. Please take a review. Thanks.

yangdongsheng · 2018-10-30T12:25:13Z

src/common/Throttle.h

+   */
+  uint64_t m_ticks_per_second = 0;
+  uint64_t m_tokens_per_tick = 0;
+  uint64_t m_divide_left = 0;


Can we make this more simple? Example:

uint64_t m_ticks_per_second = 0; uint64_t m_current_tick = 0;

And then

uint64_t TokenBucketThrottle::tokens_filled(double tick) { return (tick / m_ticks_per_second * m_avg); } uint64_t TokenBucketThrottle::tokens_this_tick() { return (tokens_filled(m_current_tick) - tokens_filled(m_current_tick - 1)); }

Same example as what you mentioned, 950 iops. Then m_ticks_per_second = 20.
tick 1: 1/20 * 950 - 0/20 * 950 = 47 - 0 = 47;
tick 2: 2/20 * 950 - 1/20 * 950 = 95 - 47 = 48;
tick 3: 3/20 * 950 - 2/20 * 950 = 142 - 95 = 47;

* tick | 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|16|17|18|19|20| * tokens |47|48|47|48|47|48|47|48|47|48|47|48|47|48|47|48|47|48|47|48|

This way, we can make the tokens to be filled more smoothly, and we need only two variable m_ticks_per_second and m_current_tick.

Just FYI, you can think more ideas about it. What I am concerned is the m_divide_left way looks not simple enough.

What a great idea! Thank you very much. 👍

But I think it will take more time to calculate (two floating multiplication and division calculations within each tick). Or maybe it doesn't matter and I don't have to worry about it. 😄

I'll take your idea and think more about it. Thanks.

😉 because I think there was too many if and else in m_divide_left way, then I give you a suggestion above, maybe you can find some better idea.

I think we should not worry too much on the overhead on calculation in this case.

TokenBucket shall put one token per 1/cir second (one schedule tick for QoS, set the minimum 50ms here). The cir is Committed Information Rate, which is defined by TokenBucket. Also means the IOPS(or BPS) limit we set. Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>

irides · 2018-11-01T07:01:32Z

@yangdongsheng Updated.
Your solution is wonderful and I took it. Please take a review. Thanks.

yangdongsheng

Just one nit. others looks good.

yangdongsheng · 2018-11-02T01:30:56Z

src/common/options.cc

+
+    Option("rbd_qos_schedule_tick_min", Option::TYPE_UINT, Option::LEVEL_ADVANCED)
+    .set_default(50)
+    .set_min(50)


should be set_min(1) ?

@yangdongsheng
All right. Should leave the choice to users.
Code Updated.
Thanks.

yangdongsheng

Looks good. @dillaman Please take a look.

yangdongsheng · 2018-11-07T01:20:32Z

Hi @dillaman , this PR looks good to me now. What do you think about it?

dillaman · 2018-11-07T01:58:49Z

src/librbd/io/ImageRequestWQ.cc

-  throttle->set_max(limit);
-  throttle->set_average(limit);
+
+  int r_burst = throttle->set_burst(burst);


I'd recommend combining set_burst / set_average into a single method since otherwise you can potentially log an error about the burst being < average if the average hasn't been updated yet.

@dillaman OK. It's a good idea. I'll fix it.

Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>

irides · 2018-11-07T12:14:07Z

@dillaman Updated. Please take a review. Thanks.

dillaman

lgtm

batrick added rbd needs-review labels Sep 21, 2018

yangdongsheng added the performance label Sep 27, 2018

irides force-pushed the bursting_io branch from 49c346b to 170fd8a Compare September 27, 2018 08:12

yangdongsheng self-requested a review September 27, 2018 09:08

yangdongsheng suggested changes Sep 27, 2018

View reviewed changes

dillaman reviewed Sep 28, 2018

View reviewed changes

irides force-pushed the bursting_io branch 3 times, most recently from f33a22b to b427bda Compare October 15, 2018 10:25

yangdongsheng reviewed Oct 16, 2018

View reviewed changes

irides force-pushed the bursting_io branch from b427bda to ee86f91 Compare October 17, 2018 12:17

yangdongsheng reviewed Oct 18, 2018

View reviewed changes

irides force-pushed the bursting_io branch from ee86f91 to 124efc6 Compare October 18, 2018 11:22

yangdongsheng reviewed Oct 24, 2018

View reviewed changes

irides force-pushed the bursting_io branch from 541ec79 to ea310e3 Compare October 24, 2018 08:59

yangdongsheng reviewed Oct 25, 2018

View reviewed changes

irides force-pushed the bursting_io branch 2 times, most recently from afc5b82 to 9743b7f Compare October 29, 2018 07:14

yangdongsheng reviewed Oct 30, 2018

View reviewed changes

irides force-pushed the bursting_io branch from 9743b7f to 8bc385e Compare November 1, 2018 06:24

yangdongsheng reviewed Nov 2, 2018

View reviewed changes

irides force-pushed the bursting_io branch from 8bc385e to e3942a2 Compare November 2, 2018 01:59

yangdongsheng approved these changes Nov 2, 2018

View reviewed changes

yangdongsheng removed the needs-review label Nov 2, 2018

dillaman reviewed Nov 7, 2018

View reviewed changes

irides added 3 commits November 7, 2018 13:25

librbd: support burst limit configuration.

761393a

Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>

unittest: add testcase for librbd QosLimit.

1705b2c

Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>

librbd: make the minimum of schedule tick for QoS configurable.

fd613b1

Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>

irides force-pushed the bursting_io branch from e3942a2 to fd613b1 Compare November 7, 2018 06:33

dillaman approved these changes Nov 7, 2018

View reviewed changes

dillaman added needs-qa wip-jason-testing labels Nov 7, 2018

dillaman merged commit 6ac41cc into ceph:master Nov 10, 2018

irides deleted the bursting_io branch November 12, 2018 01:05

		@@ -75,6 +75,7 @@ class ImageRequestWQ

		void apply_qos_limit(uint64_t limit, const uint64_t flag);

		void apply_qos_limit(uint64_t limit, uint64_t bursting, const uint64_t flag);

librbd: reduce the TokenBucket fill cycle and support bursting io configuration. #24214

librbd: reduce the TokenBucket fill cycle and support bursting io configuration. #24214

Conversation

irides commented Sep 21, 2018

irides commented Sep 27, 2018

yangdongsheng commented Sep 27, 2018

irides commented Sep 27, 2018

yangdongsheng commented Sep 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dillaman commented Oct 15, 2018

yangdongsheng commented Oct 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yangdongsheng commented Oct 16, 2018 • edited Loading

irides commented Oct 16, 2018

irides commented Oct 18, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yangdongsheng commented Oct 18, 2018 via email

irides commented Oct 19, 2018

yangdongsheng commented Oct 19, 2018

irides commented Oct 23, 2018

yangdongsheng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

irides commented Oct 24, 2018

yangdongsheng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yangdongsheng commented Oct 26, 2018

irides commented Oct 26, 2018

irides commented Oct 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

irides commented Nov 1, 2018

yangdongsheng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yangdongsheng left a comment

Choose a reason for hiding this comment

yangdongsheng commented Nov 7, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

irides commented Nov 7, 2018

dillaman left a comment

Choose a reason for hiding this comment

yangdongsheng commented Oct 16, 2018 •

edited

Loading