Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the dmclock library to handle the "cost" of an operation/request #46

Merged
merged 2 commits into from May 11, 2018

Conversation

ivancich
Copy link
Member

@ivancich ivancich commented Jan 25, 2018

Allow the dmclock library to handle the "cost" of an operation/request. This is in contrast to just measuring ops. If the cost is set at the value 1, then it's equivalent to the old code.

The cost is calculated on the server side and affects the tag values for the request. Furthermore, the cost is then sent back to the client, so the ServiceTracker can use the cost to properly calculate delta and rho values.

We now allow the delta and rho values sent with a request to be zero, since the request cost is included on the server-side tag calculation, and the request cost must be positive guaranteeing the advancement of tags.

The OrigTracker has now been updated so as not to add one when computing delta and rho.

The unit tests have been updated to use request costs. And the simulator has been updated to use cost, and the cost of a client group's requests can be configured.

The simulation has been changed to user the OrigTracker as the default tracker.

Signed-off-by: J. Eric Ivancich ivancich@redhat.com

@ivancich
Copy link
Member Author

@bspark8 @TaewoongKim Please let me know your thoughts on this PR. Thanks!

}
return std::max(time, prev + increment);
double tag_increment = increment * (dist_req_val + cost);
return std::max(time, prev + tag_increment);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not digging the code in depth yet. Sorry about that.
I just looked the code roughly and I have a question about the manner dealing the cost.
In the code, the cost is usually added to rho or delta value. However, I think cost is somewhat needed to be multiplied. Because cost is a relative value. So it should represent how much times consume the resource than the unit request(e.g. 8KB write request.)
In the dmclock paper, author said "So, for tagging purposes, a single request of IO size S is treated as equivalent to: (1+S/(Tm×Bpeak)) IO requests"
I understood that sentence as (1+S/(Tm×Bpeak)) is something like a cost and it should be multiplied with tag incremental value to deal the IO request as multiple IO requests.

Copy link
Collaborator

@tchaikov tchaikov Feb 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i feel the same. cost is quite the same as the number of unit requests. and both \rho_{i} and \delta_{i} denote the number of served requests. if this certain request is treated a multiple of unit request, we should certainly use something like
ql_b2f61a2495fb7623243337d1135b7ea0_l3

Copy link
Member Author

@ivancich ivancich Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both for your detailed look into this. This is the type of discussion I was hoping we'd have!

So cost is calculated on the server side by a formula as discussed in the ceph-devel discussion. So yes, on a hard disk it would have to incorporate seek times and read/write times.

One thing to keep in mind is that cost, rho, and delta are all in the same units. On the client side, when the response comes back, delta and possibly rho (if the request was handled during the reservation phase) are incremented by the cost value. See dmclock_client.h:dmclock::SimpleTracker::resp_update.

So delta (or rho) represents costs of services already received, and cost represents cost of services being scheduled to be received. Because they are in the same unit, they should not be multiplied but instead be added. It's not clear to me what cost^2 units would represent.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cost doesn't include unit concept. It's relative value like weight or ratio value that don't have any unit info.
Therefore, we need another unit concept that represents an abstract amount of resources that would be needed to process a unit request(e.g 8KB write) so that a cost can be multiplied with this unit and the result can have a meaning in the real world. This means a cost for a specific request will be represented as a ratio to the unit cost(e.g. cost for an 8KB write).
Rho & delta is also a kind of ratio value and these don't have a unit also.
So, we can multiply these values with each other multiple times.

Copy link
Contributor

@yzhan298 yzhan298 Apr 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I reproduce the experiment from the mclock paper, I find the behavior of the dmclock with cost integrated is not the same as expected (when cost=1). The reason for that is due to line 224: double tag_increment = increment * (dist_req_val + cost). My understanding is that cost should replace the delta and rho (or multiply it). In dmclock_client.h, we already added the cost into delta and rho, so here at line 224, we should multiply increment with cost directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchaikov yes, I have updated the comment. Thanks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to update everyone with some conversations that @yzhan298 and I had. @yzhan298's suggested fix cannot be correct, because it would mean that rho and delta would have no impact on tag calculation, and that cannot be correct. Once we eliminate delta and rho from tag calculation we no longer have "dmclock" but instead "mclock". The modification did produce results from the simulation more in line with expectations. But that should serve as a reminder for the importance of having a theoretical grounding for changes.

But working through that with @yzhan298 made me question the calculation of delta and rho, and that's done in the client-side trackers. So I re-evaluated that and since we're adding cost in on the server side, we no longer have to insure delta and rho are at least 1. So I modified the OrigTracker to get rid of the "1 +" and @yzhan298 was kind enough to test it out, and it produced the expected results.

In the past we'd migrated from the OrigTracker to the BorrowingTracker. The OrigTracker was written to implement the algorithm in the dmclock paper. It subtracts out responses it gets from the same server in the calculation of delta and rho.

But it seems like the new OrigTracker is the way to go.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TaewoongKim You are absolutely correct that cost is a scalar. That is why when we discussed the addition of cost, we decided that we should re-think how we conceptualize how we think about reservation and limit. Currently they are in the unit of ops per second. But it would likely be more appropriate to think of them as cost units per second. So a simple/small operation might be the foundation as one cost unit. Operations that require more data to be written/read would represent multiple cost units; the same would be the case for operations that ultimately turn into multiple sub-operations.

We are even contemplating using cost units per time unit that is not seconds in order to make sure reservations and limits are expressed in positive integers given some issues in sending doubles in messages.

For the initial announcement and follow-up conversation, please see the ceph-devel RFC in:
https://marc.info/?l=ceph-devel&m=151362386931501&w=2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchaikov Yes, rho and delta are defined to count operations. But in the transition to thinking about costs and operations of widely varying costs, we need to include cost in rho and delta, which this PR does. They are measures of work done for a client by other servers in a given time interval (time since last request to the same server). And they should include not just # of ops but combined cost of those ops. That is why in the tracker the client increments delta by cost (and rho by cost when done in the reservation phase).

So when we calculate the tag for a new request, we need to incorporate the cost of work done by other servers and the calculated cost of this operation. We can think of it as cost_recent_past and cost_this and we therefore must add them together. Multiplying them together would result in "units" of cost_squared, which is not meaningful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TaewoongKim The part of the dmclock paper that you quote deals specifically with characteristics of a spinning hard disk. We need something more general as we're not working strictly with hard disks, but SSDs and non-volatile memory. Plus the cost of operations sent to the OSD will vary widely. So we need to work with a more general concept of cost, it seems.

@ivancich
Copy link
Member Author

@yzhan298 This is exciting and I can't wait to spend some time looking over your suggested changes. I'm on vacation, but I think I'll have time this weekend. Thanks for your help on this!!

@ivancich ivancich force-pushed the wip-add-cost branch 2 times, most recently from 030ee12 to 30f66fc Compare May 4, 2018 15:26
@ivancich ivancich requested a review from cbodley May 4, 2018 17:47
double reservation;
double proportion;
double limit;
uint64_t cost;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider exposing a typedef for Cost as part of the interface. this would help disambiguate it from other uint64_t parameters, and calling code would be less likely to rely on its exact representation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice idea. Latest version does this.

ready(false),
arrival(time)
{
assert(cost > 0.0);
Copy link
Contributor

@cbodley cbodley May 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cost is no longer floating point, so cost > 0 here? (and again on line 180)

for requests that pass cost=0, it's probably better to fail with EINVAL error instead of aborting

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 0 literal issues.

I don't know that I want to take on improving error returning results in this PR. But I think it's an excellent idea. At the moment there are a lot of asserts that'd need to be tackled.

@ivancich ivancich force-pushed the wip-add-cost branch 3 times, most recently from 0b08294 to 373be44 Compare May 9, 2018 17:36
@@ -33,20 +33,20 @@ namespace crimson {

struct ReqParams {
// count of all replies since last request; MUSTN'T BE 0
uint32_t delta;
uint64_t delta;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am just curious why we want to use uint64_t for delta and rho now? because we could be running into integer overflow with uint32_t. as IMO, uint32_t is able to represent a fairly large number already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good point. I think we had a mention that rho and delta would be u64s on our etherpad when we discussed it. But that seems to large. 4 billion should be more than enough to count the number of replies received from other servers. I'll switch it back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That raises a related question. Should Cost (which is a type currently defined to be uint64_t) also be uint32_t? If we use cost=1 as a base, would there be an op that had a relative cost of more than 4 billion? If not, uint32_t would be sufficient.

@cbodley @tchaikov Your thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can imagine a strategy that sets cost=bytes in a request, and radosgw will happily serve requests bigger than 4G. but given a Cost=uint32_t, you could easily adapt the strategy to some fraction of the bytes instead

are we doing much math with these values, where there'd be risk of an intermediate multiplication overflowing?

it seems like the benefits of uint32_t would be less overhead in osd messages, and less per-request memory for the dmclock prio queue? those saving could definitely add up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine someone trying that strategy as well, although I'd hope people would think in terms of blocks rather than bytes. Assuming 4K blocks a request of 4G would result in a cost of 1M. You'd then need a request of 16T to overflow cost.

So I think we're piecing together a case for uint32_t for cost. @tchaikov -- your thoughts?

// future changes; we're assuming that Cost is relatively small
// and that it would be more efficient to pass-by-value than
// by-reference.
using Cost = uint64_t;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchaikov's question regarding whether delta and rho should be uint64_t's raises a similar question w.r.t. Cost. Should Cost also be uint32_t? If we use cost=1 as a base, would there be an op that had a relative cost of more than 4 billion? If not, uint32_t would be sufficient.

@cbodley @tchaikov Your thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think 4G is a very large number in the context of a networking protocol at this moment. we can cap the cost at std::numeric_limit<uint32>::max() (which is 4G). if its value is actually greater than it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you've convinced me that uint32_t is sufficient here

increment *= dist_req_val;
}
return std::max(time, prev + increment);
double tag_increment = increment * (dist_req_val + cost);
Copy link
Member Author

@ivancich ivancich May 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbodley This is the key calculation with cost. If type Cost is changed to uint32_t, then we'd be adding two uint32_t's and then multiplying by a double. If we're concerned about overflow, we could cast one of the addends to a uint64_t to force uint64_t addition, as in:
double tag_increment = increment * (uint64_t(dist_req_val) + cost);

@ivancich
Copy link
Member Author

@tchaikov I'd like to get this merged. You had indicated doubts about how cost was integrated. Have your doubts been addressed?

@tchaikov
Copy link
Collaborator

@ivancich yes. and regarding to #46 (comment) , i don't feel strong either way.

sim/src/config.h Outdated
@@ -37,6 +37,7 @@ namespace crimson {
double client_reservation;
double client_limit;
double client_weight;
uint64_t client_req_cost;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can stuff in sim/ use the Cost typedef?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up doing an odd work-around here. The simulator defines its own Cost and then I assert upon start-up that the sizeofs of the two are the same. I played a bit with SFINAE but couldn't get it to work and didn't want to spend more time on it. Your thoughts?

++the_delta;
++my_delta;
Counter& the_rho,
Counter cost) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counter -> Cost?

// future changes; we're assuming that Cost is relatively small
// and that it would be more efficient to pass-by-value than
// by-reference.
using Cost = uint64_t;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you've convinced me that uint32_t is sufficient here

@@ -45,6 +45,7 @@ TEST(test_client, full_bore_timing) {
sim::TestResponse resp(0);
dmc::PhaseType resp_params = dmc::PhaseType::priority;
test::DmcClient* client;
const uint64_t request_cost = 1u;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint64_t -> Cost

@@ -45,3 +46,13 @@ void test::dmc_client_accumulate_f(test::DmcAccum& a,
++a.proportion_count;
}
}


// Note: this would be better if we could use std::is_same and SFINAE
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbodley Here's where I played with SFINAE....

@ivancich ivancich force-pushed the wip-add-cost branch 2 times, most recently from ee91f28 to 9cbf6c3 Compare May 11, 2018 19:42
operation/request. This is in contrast to just measuring ops. If the
cost is set at the value 1, then it's equivalent to the old code.

The cost is calculated on the server side and affects the tag values
for the request. Furthermore, the cost is then sent back to the
client, so the ServiceTracker can use the cost to properly calculate
delta and rho values.

We now allow the delta and rho values sent with a request to be zero,
since the request cost is included on the server-side tag calculation,
and the request cost must be positive guaranteeing the advancement of
tags.

The OrigTracker has now been updated so as not to add one when
computing delta and rho.

The unit tests have been updated to use request costs. And the
simulator has been updated to use cost, and the cost of a client
group's requests can be configured.

The simulation has been changed to user the OrigTracker as the default
tracker.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
@ivancich ivancich merged commit effea49 into ceph:master May 11, 2018
// Note: if this static_assert fails then our two definitions of Cost
// do not match; change crimson::qos_simulation::Cost to match the
// definition of crimson::dmclock::Cost.
static_assert(std::is_same<crimson::qos_simulation::Cost,crimson::dmclock::Cost>::value,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, could use std::is_same_v next time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks; I did not know about it.

theanalyst added a commit to theanalyst/ceph that referenced this pull request Jul 5, 2018
This was introduced in ceph/dmclock#46, let the default
behaviour be that cost is 1 as we assert this.

Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
theanalyst added a commit to theanalyst/ceph that referenced this pull request Aug 7, 2018
This was introduced in ceph/dmclock#46, let the default
behaviour be that cost is 1 as we assert this.

Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
theanalyst added a commit to theanalyst/ceph that referenced this pull request Aug 7, 2018
This was introduced in ceph/dmclock#46, let the default
behaviour be that cost is 1 as we assert this.

Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
theanalyst added a commit to theanalyst/ceph that referenced this pull request Aug 30, 2018
This was introduced in ceph/dmclock#46, let the default
behaviour be that cost is 1 as we assert this.

Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
theanalyst added a commit to theanalyst/ceph that referenced this pull request Sep 7, 2018
This was introduced in ceph/dmclock#46, let the default
behaviour be that cost is 1 as we assert this.

Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
theanalyst added a commit to theanalyst/ceph that referenced this pull request Sep 18, 2018
This was introduced in ceph/dmclock#46, let the default
behaviour be that cost is 1 as we assert this.

Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
theanalyst added a commit to theanalyst/ceph that referenced this pull request Jan 4, 2019
This was introduced in ceph/dmclock#46, let the default
behaviour be that cost is 1 as we assert this.

Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
theanalyst added a commit to theanalyst/ceph that referenced this pull request Jan 7, 2019
This was introduced in ceph/dmclock#46, let the default
behaviour be that cost is 1 as we assert this.

Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
theanalyst added a commit to theanalyst/ceph that referenced this pull request Jan 30, 2019
This was introduced in ceph/dmclock#46, let the default
behaviour be that cost is 1 as we assert this.

Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
theanalyst added a commit to theanalyst/ceph that referenced this pull request Jan 30, 2019
This was introduced in ceph/dmclock#46, let the default
behaviour be that cost is 1 as we assert this.

Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
theanalyst added a commit to theanalyst/ceph that referenced this pull request Jan 31, 2019
This was introduced in ceph/dmclock#46, let the default
behaviour be that cost is 1 as we assert this.

Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants