Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common: OpTracker age histogram calculation is not correct #5065

Merged
merged 1 commit into from Oct 10, 2015

Conversation

wonzhq
Copy link
Contributor

@wonzhq wonzhq commented Jun 24, 2015

While the TrackedOps in each sharded list are ordered, there are no
order among different shared lists. Looping the sharded list one by one
won't be able to get the right histogram.

Signed-off-by: Zhiqiang Wang zhiqiang.wang@intel.com

@wonzhq
Copy link
Contributor Author

wonzhq commented Aug 13, 2015

@tchaikov do you have time to take a look at this one? I'm wondering if I could also include this one in the teuthology run together with my others.

@tchaikov tchaikov self-assigned this Aug 13, 2015
@ghost
Copy link

ghost commented Sep 1, 2015

@wonzhq do you have access to a teuthology cluster ?

@wonzhq
Copy link
Contributor Author

wonzhq commented Sep 2, 2015

@dachary yes, we have a local teuthology cluster. And I can also access sepia.

@tchaikov
Copy link
Contributor

tchaikov commented Sep 2, 2015

@wonzhq will review it shortly. it fell off the cracks =(

@tchaikov
Copy link
Contributor

tchaikov commented Sep 9, 2015

@wonzhq the fix makes sense to me. the only concern is that the size of all_ms could be large when something goes wrong, probably an alternative is to

  1. update calc_bits_of(int t) as following to avoid the tight loop:
    return x ? (sizeof(x) * 8 - __builtin_clz(x)) : 0;
  1. have a std::map<bin,count> m. and replace all_ms.insert(ms); with m[calc_bits_of(ms)]++; and dump it to pow2_hist_t after collecting the ops in all shards.

what do you think?

@wonzhq
Copy link
Contributor Author

wonzhq commented Sep 9, 2015

@tchaikov thanks for taking a look!

update calc_bits_of(int t) as following to avoid the tight loop

I'd like to leave this to someone else since I'm not quite clear which one performs better and if it is worth doing this.

have a std::map<bin,count> m. and replace all_ms.insert(ms); with m[calc_bits_of(ms)]++; and dump it to pow2_hist_t after collecting the ops in all shards.

Good idea! But why not do h->add(ms) directly?

@tchaikov
Copy link
Contributor

tchaikov commented Sep 9, 2015

Good idea! But why not do h->add(ms) directly?

i just don't like the overhead of

_expand_to(bin + 1);
///
_contract();

calls. but yeah, i'd admit that's a sign of premature optimisation. as this function is called every 6 seconds.

@tchaikov
Copy link
Contributor

tchaikov commented Sep 9, 2015

I'd like to leave this to someone else since I'm not quite clear which one performs better and if it is worth doing this.

yeah, makes sense to me. i will do some benchmark on my side.

@wonzhq
Copy link
Contributor Author

wonzhq commented Sep 9, 2015

i just don't like the overhead of

_expand_to(bin + 1);
///
_contract();

calls. but yeah, i'd admit that's a sign of premature optimisation. as this function is called every 6 seconds.

Since we loops from the oldest to the newest, most likely we just need to expand/contract the vector only once, which is the same as set_bin.

@tchaikov
Copy link
Contributor

tchaikov commented Sep 9, 2015

agreed, they will be just no-ops.

@wonzhq
Copy link
Contributor Author

wonzhq commented Sep 10, 2015

@tchaikov i've updated this PR, please take another look, thanks!

While the TrackedOps in each sharded list are ordered, there are no
order among different shared lists. Looping the sharded list one by one
won't be able to get the right histogram. Instead, we can increment the
bin count for each of the TrackedOp. Btw, for those TrackedOp whose age
is >= 2^30 ms, they are now been put in the respective number of bits
bin, instead of been put in a single bin.

Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
@wonzhq
Copy link
Contributor Author

wonzhq commented Sep 10, 2015

Looks like after updating the commit message, @tchaikov 's comment is gone. I'm pasting his previous comment below:

this changes the behaviour of get_age_ms_histogram(), it puts all ops which were initiated before 2^30 ms into a single bin. and now it puts them into (2^(n-1), 2^n] buckets. i'm okay with it. but could you put a note in the commit message?

@gregsfortytwo , are you good with this change?

@gregsfortytwo
Copy link
Member

I don't know how the histogram bits work. @athanatos is back beginning of next week, ask him. ;)

tchaikov added a commit that referenced this pull request Oct 10, 2015
common: OpTracker age histogram calculation is not correct

Reviewed-by: Kefu Chai <kchai@redhat.com>
@tchaikov tchaikov merged commit ec64463 into ceph:master Oct 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants