Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

librbd: reduce the TokenBucket fill cycle and support bursting io configuration. #24214

Merged
merged 4 commits into from
Nov 10, 2018

Conversation

irides
Copy link
Contributor

@irides irides commented Sep 21, 2018

Firstly, the TokenBucket shall put one token per 1/cir second(one tick), and I set the minimum of tick to 50ms for good accuracy.
Secondly, the bursting limit configuration is supported. The default is 0 for disabling bursting io.

Signed-off-by: Shiyang Ruan ruansy.fnst@cn.fujitsu.com

@irides
Copy link
Contributor Author

irides commented Sep 27, 2018

@dillaman @yangdongsheng Hi, pelase take a review. Thanks!

@yangdongsheng
Copy link
Contributor

Please split your change into two commits, one for improvement and the other for bursting.

BTW, yes, in theory, it's better to put token smoothly, but could you give a test case to show what's the difference we can get from your change about "good accuracy"?

Thanx

@irides
Copy link
Contributor Author

irides commented Sep 27, 2018

@yangdongsheng Thank you for your advice.

I've split it into two commits.

The "good accuracy" is why I set the minimum value of tick to 50ms. Because we use the SafeTimer ( a software timer), it is realized by judging whether the time has arrived and triggering the callback function. Therefore, too small a tick value will result in too many callbacks, which will eventually lead to a long time and increase the error.

The purpose of the first commit is to shorten the token filling period to conform to the definition of the token bucket. If the fill period is 1s, the network IO will be concentrated at the beginning of 1s, which may cause other client access delays.

The implementation of the bursting IO limit is therefore simple.

@yangdongsheng
Copy link
Contributor

@iridescent-rsy

What I am concerning is we are using the timer_singleton in ImageCtx, then if we schedule too much events on this thread, that will affect the other operations on this thread.

You want to make the filling cycle smaller and smaller, but how small is good enough? 50ms? is there any calculation about this setting?

BTW, the first commit looks not focus on filling cycle improvement, I found lots of bursting implementation, which should be in the second commit. If you can move these change into commit 2, we can make our discussion and review faster.

Thanx :)

got = remain;
remain = 0;
} else {
got = c;
remain -= c;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrite the class Bucket. Not deliberately want to change it.

src/common/Throttle.cc Show resolved Hide resolved
m_token_ctx = new FunctionContext(
[this](int r) {
schedule_timer();
});
m_timer->add_event_after(m_schedule_tick / 1000.0, m_token_ctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add event every ms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_schedule_tick is the filling period, and it is caculated by the Bucket.

src/common/Throttle.cc Show resolved Hide resolved
std::atomic<uint64_t> tick = { 50 };
std::atomic<uint64_t> ticks_per_second = { 0 };
std::atomic<uint64_t> tokens_per_tick = { 0 };
std::atomic<uint64_t> _divide_left = { 0 };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does cir, cbs and _devide_left mean? and are they atomic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CIR (Committed Information Rate) is the rate that tokens generated.
CBS (Committed Burst Size) is the size of the TokenBucket, which is also means the burst limit.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... also agree about the atomic question. Shouldn't these be protected already by the lock?

@@ -75,6 +75,7 @@ class ImageRequestWQ

void apply_qos_limit(uint64_t limit, const uint64_t flag);

void apply_qos_limit(uint64_t limit, uint64_t bursting, const uint64_t flag);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks don't need a new define. just add a parameter into apply_qos_limit().

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yangdongsheng My bad. Define a new one is to avoid changing the calling in test case.
How about moving the flag to the first argument ?
such as:
void apply_qos_limit(const uint64_t flag, uint64_t limit, uint64_t bursting);

This looks clear when calling it.
Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand all what you want to say, but no need for a new function. And you need to update the unit-test at the same time, rather than avoid to change it.

@@ -75,6 +75,7 @@ class ImageRequestWQ

void apply_qos_limit(uint64_t limit, const uint64_t flag);

void apply_qos_limit(uint64_t limit, uint64_t bursting, const uint64_t flag);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

remain(m), max(m)
{
}
std::atomic<uint64_t> cir = { 0 };

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think all these acronym variables should be spelled out (i.e. committed_burst_size) for clarity

std::atomic<uint64_t> tick = { 50 };
std::atomic<uint64_t> ticks_per_second = { 0 };
std::atomic<uint64_t> tokens_per_tick = { 0 };
std::atomic<uint64_t> _divide_left = { 0 };

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: drop the _ prefix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Thanks!

std::atomic<uint64_t> tick = { 50 };
std::atomic<uint64_t> ticks_per_second = { 0 };
std::atomic<uint64_t> tokens_per_tick = { 0 };
std::atomic<uint64_t> _divide_left = { 0 };

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... also agree about the atomic question. Shouldn't these be protected already by the lock?

@@ -624,7 +624,27 @@ void ImageRequestWQ<I>::apply_qos_limit(uint64_t limit, const uint64_t flag) {
}
}
ceph_assert(throttle != nullptr);
throttle->set_max(limit);
throttle->set_max(0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: shouldn't most of these librbd changes be moved to the next commit?

"rbd_qos_write_iops_limit", false)(
"rbd_qos_read_bps_limit", false)(
"rbd_qos_write_bps_limit", false);
"rbd_qos_iops_limit", false)(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure you also add these to librbd/api/Config.cc

@irides irides force-pushed the bursting_io branch 3 times, most recently from f33a22b to b427bda Compare October 15, 2018 10:25
@dillaman
Copy link

@yangdongsheng are you still requesting changes?

@yangdongsheng
Copy link
Contributor

@dillaman sorry I did not get notification about the update in this PR, will review it today.

{
}
// minimum of the filling period.
static const uint64_t tick_min = 50;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As what we discussed offline, make this configurable. hard code as 50ms is not a good idea for all usecase.

uint64_t put();
// set the committed_information_rate
bool set_average(uint64_t c);
// set the committed_burst_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iridescent-rsy thanx for the detailed comment here, it's helpful for review. But there is a comment about the design. Class Bucket should just be a "bucket", it should not take so many works here. I mean, the ticks_per_second, tokens_per_tick, devide_left, set_argerage() should be in the TokenBucketThrottle.

Bucket should only have a simple design, name, max, remain, get() put(). that's all. But how to use this bucket, as you wish in upper user, TokenBucketThrottle. There is only a user of Bucket, tokenBucketThrottle, but if we want a leakyBucketThrottle in future, we will reuse the bucket.

So move the Throttling related information from Bucket to TokenBucketThrottle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yangdongsheng Thanks for your advice. I'll fix it immediately.

if (remain > m || max == 0)
remain = m;
max = m;
committed_burst_size = m;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update remain, if remain is too large here. For example, if the max is 100, average is 10 and remain is 100. that means the burst size is 100. But if we wan to set the burst size to 10. average to 1. If you don't set the remain to 10, user can still get an bursting 100 tokens.

@yangdongsheng
Copy link
Contributor

yangdongsheng commented Oct 16, 2018

@iridescent-rsy Commit about burst and unittest looks good. About the first commit, there are some comments. Please feel free to @yangdongsheng when you updated this PR. Thanx :)

@irides
Copy link
Contributor Author

irides commented Oct 16, 2018

@yangdongsheng All right. Thank you very much.

@irides
Copy link
Contributor Author

irides commented Oct 18, 2018

Hi @yangdongsheng, I've updated the commits. Please take a review. Thanks.


class TokenBucketThrottle {
uint64_t max = 0;
uint64_t rate = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a bucket should have a rate? I mean, what is the rate mean in a real Bucket? I don't think this should be in Bucket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, the role of the Bucket is speed limit. So, the rate is a property of the Bucket.

/**
* Set the Bucket size
* @param m bucket size
* @return true if size is not zero
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You did not get the return-val, I am not sure why you want to return bool here.


class TokenBucketThrottle {

struct TokenBucket : Bucket {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you got what I said. But should we have a subclass of TokenBucket ? I think the TokenBucketThrottle can take all the work of TokenBucket here. I mean, TokenBucketThrottle can use Bucket and decide how to use it. No need to split the management work into TokenBucketThrottle and TokenBucket.

@yangdongsheng
Copy link
Contributor

yangdongsheng commented Oct 18, 2018 via email

@irides
Copy link
Contributor Author

irides commented Oct 19, 2018

Hi @yangdongsheng, I've moved the control logic. Please take a review. Thanks : )

@yangdongsheng
Copy link
Contributor

Thanx @iridescent-rsy , but sorry I will review it next Monday.

@irides
Copy link
Contributor Author

irides commented Oct 23, 2018

Hi @yangdongsheng, I've updated the code. Please take a review. Thanks!!

Copy link
Contributor

@yangdongsheng yangdongsheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iridescent-rsy This version looks much better. Just few comments.

if (0 == m && 0 != m_avg) {
m_throttle.set_max(m_avg);
} else {
m_throttle.set_max(m);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So set_max(0) means no burst, right? What if I am going to set a max but the max is less than m_avg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes. 0 is the default value for burst configuration.
  2. The max should never less than m_avg. I'll correct it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to return error if max < m_avg, rather than change the max silently. For example:
user want to set_max(100), but the m_avg is 1000, if we set the max to 1000 silently, user would think max is set to 100. but he will find some strange thing if the following code believe max is 100.

So I think we can return error in this case in set_max(), to tell user what happened.
Maybe we should do it like that:

class TokenBucketThrottle {
        m_avg;
        m_burst;

        int set_burst(burst) {
           if (burst > 0 && burst < m_avg)
              return -EINVAL;

           m_burst = burst;
           throttle.set_max(m_burst > 0? m_burst : m_avg);
        }

        int set_average(avg) {
            if (m_burst > 0 && avg > m_burst)
              return -EINVAL;

            ... ...
        }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yangdongsheng
I got it !
But I am not sure if I also need to handle the error in apply_qos_limit ?
Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dillaman what's your opinion? when we are going to apply some metadata but they are invalid. should we fail in image opening?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No -- I don't think you should prevent the image from opening (since that would also prevent you from fixing it). I think librbd can log to lderr if it detects invalid configuration settings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dillaman Agreed. @iridescent-rsy Then just log to lderr .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dillaman OK. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dillaman @yangdongsheng
I'am adding the log. But there is a confusing problem.

As we know, the metadata will be applied twice when we set it by rbd config image set aaa bbb. The first time is when we are opening the image. And the second time is to apply the new config.

The problem is:
When we want to correct the value after we set an invalid one, it will still show the error log ( when opening the image, the image will still apply the old config, which contains the invalid value. )

I think this error log may confuse the users. Does that matter?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iridescent-rsy seems like a necessary evil to me. I think if the log message includes the current values for average and burst it would be less confusing.

src/common/Throttle.cc Show resolved Hide resolved
m_divide_left = avg % m_ticks_per_second;

if (0 == m_throttle.max) {
m_throttle.set_max(avg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we are going to set a average larger than max?

src/common/Throttle.h Show resolved Hide resolved
@irides
Copy link
Contributor Author

irides commented Oct 24, 2018

@yangdongsheng Updated : )

Copy link
Contributor

@yangdongsheng yangdongsheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iridescent-rsy Just one comment. Others looks almost good to me. But one question, the calculation in set_average looks not elegant. can we make it more easy and safe?

if (0 == m && 0 != m_avg) {
m_throttle.set_max(m_avg);
} else {
m_throttle.set_max(m);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to return error if max < m_avg, rather than change the max silently. For example:
user want to set_max(100), but the m_avg is 1000, if we set the max to 1000 silently, user would think max is set to 100. but he will find some strange thing if the following code believe max is 100.

So I think we can return error in this case in set_max(), to tell user what happened.
Maybe we should do it like that:

class TokenBucketThrottle {
        m_avg;
        m_burst;

        int set_burst(burst) {
           if (burst > 0 && burst < m_avg)
              return -EINVAL;

           m_burst = burst;
           throttle.set_max(m_burst > 0? m_burst : m_avg);
        }

        int set_average(avg) {
            if (m_burst > 0 && avg > m_burst)
              return -EINVAL;

            ... ...
        }
}

@yangdongsheng
Copy link
Contributor

@iridescent-rsy Also, you need to update https://github.com/ceph/ceph/blob/master/qa/suites/rbd/thrash/workloads/rbd_fsx_rate_limit.yaml for more option case.

@irides
Copy link
Contributor Author

irides commented Oct 26, 2018

@yangdongsheng All right. I'm working on it. Thank you.

@irides irides force-pushed the bursting_io branch 2 times, most recently from afc5b82 to 9743b7f Compare October 29, 2018 07:14
@irides
Copy link
Contributor Author

irides commented Oct 29, 2018

@yangdongsheng Updated. Please take a review. Thanks.

*/
uint64_t m_ticks_per_second = 0;
uint64_t m_tokens_per_tick = 0;
uint64_t m_divide_left = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this more simple? Example:

uint64_t m_ticks_per_second = 0;
uint64_t m_current_tick = 0;

And then

uint64_t TokenBucketThrottle::tokens_filled(double tick) {
    return (tick / m_ticks_per_second * m_avg);
}

uint64_t TokenBucketThrottle::tokens_this_tick() {
    return (tokens_filled(m_current_tick) - tokens_filled(m_current_tick - 1));
}

Same example as what you mentioned, 950 iops. Then m_ticks_per_second = 20.
tick 1: 1/20 * 950 - 0/20 * 950 = 47 - 0 = 47;
tick 2: 2/20 * 950 - 1/20 * 950 = 95 - 47 = 48;
tick 3: 3/20 * 950 - 2/20 * 950 = 142 - 95 = 47;

   * tick    | 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|16|17|18|19|20|
   * tokens  |47|48|47|48|47|48|47|48|47|48|47|48|47|48|47|48|47|48|47|48|

This way, we can make the tokens to be filled more smoothly, and we need only two variable m_ticks_per_second and m_current_tick.

Just FYI, you can think more ideas about it. What I am concerned is the m_divide_left way looks not simple enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a great idea! Thank you very much. 👍

But I think it will take more time to calculate (two floating multiplication and division calculations within each tick). Or maybe it doesn't matter and I don't have to worry about it. 😄

I'll take your idea and think more about it. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😉 because I think there was too many if and else in m_divide_left way, then I give you a suggestion above, maybe you can find some better idea.

I think we should not worry too much on the overhead on calculation in this case.

TokenBucket shall put one token per 1/cir second (one schedule tick for QoS, set the minimum 50ms here).
The cir is Committed Information Rate, which is defined by TokenBucket. Also means the IOPS(or BPS) limit we set.

Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
@irides
Copy link
Contributor Author

irides commented Nov 1, 2018

@yangdongsheng Updated.
Your solution is wonderful and I took it. Please take a review. Thanks.

Copy link
Contributor

@yangdongsheng yangdongsheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one nit. others looks good.


Option("rbd_qos_schedule_tick_min", Option::TYPE_UINT, Option::LEVEL_ADVANCED)
.set_default(50)
.set_min(50)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be set_min(1) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yangdongsheng
All right. Should leave the choice to users.
Code Updated.
Thanks.

Copy link
Contributor

@yangdongsheng yangdongsheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. @dillaman Please take a look.

@yangdongsheng
Copy link
Contributor

Hi @dillaman , this PR looks good to me now. What do you think about it?

throttle->set_max(limit);
throttle->set_average(limit);

int r_burst = throttle->set_burst(burst);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend combining set_burst / set_average into a single method since otherwise you can potentially log an error about the burst being < average if the average hasn't been updated yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dillaman OK. It's a good idea. I'll fix it.

Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
@irides
Copy link
Contributor Author

irides commented Nov 7, 2018

@dillaman Updated. Please take a review. Thanks.

Copy link

@dillaman dillaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dillaman dillaman merged commit 6ac41cc into ceph:master Nov 10, 2018
@irides irides deleted the bursting_io branch November 12, 2018 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants