New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr: reweight analysis tool #16361
mgr: reweight analysis tool #16361
Conversation
src/mon/PGMap.cc
Outdated
@@ -4602,16 +4607,38 @@ int reweight::by_utilization( | |||
osd_util.second = (double)p.second.kb_used / (double)p.second.kb; | |||
} | |||
util_by_osd.push_back(osd_util); | |||
|
|||
// Overweighted devices and their weights are factors to | |||
// calculate reweight_emergency. One 10% underfilled device |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/reweight_emergency/reweight_urgency/
src/mon/PGMap.cc
Outdated
|
||
// cdf of normal distribution:https://stackoverflow.com/a/18786808 | ||
reweight_urgency += | ||
weight * (erfc(-((osd_util.second - average_util)/average_util) * M_SQRT1_2)-1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we can model the distribution of the deviation from average utilization using normal distribution?
the cdf of normal distribution is: \Phi(x) = 1 / 2 erfc(-x / \sqrt{2}), per https://en.wikipedia.org/wiki/Error_function#Cumulative_distribution_function
assuming the random variable in this case is (osd_util.second - average_util)/average_util
, the cdf should be:
1 / 2 * erfc(-((osd_util.second - average_util) / average_util) * M_SQRT1_2)
but not
weight * (erfc(-((osd_util.second - average_util)/average_util) * M_SQRT1_2)-1);
so, could you elaborate on how reweight_urgency
is related to CDF of a normal distribution. and what the normal distribution is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I used a shifted and scaled version of cdf of standard normal distribution.
-
I wanted to have a function F(x), where x =
(osd_util.second - average_util) / average_util
, which is bounded between 0 and 1, so that ultimatelyreweight_urgency
will also be bounded. A larger value of x, should imply more urgency to reweight. Also, the difference between F(x) when x is large, should be minimal. The value of F(x) should get close to 1 (highest urgency to reweight) with steeply. I had also thought of using F(x) = (1 - e^(-x)). But that had slower convergence to 1, compared to the function currently in use. -
Since I am only considering the over-utilized devices, so
(osd_util.second - average_util) / average_util
will be greater than 0. The \Phi(0) = 0.5 . I shifted the function by 0.5 and scaled it by 2 times, so that, F(0) = 0 and F(x) asymptotically tends to 1. So, the function I am using is 2*(Phi(x) - 0.5). I will include it in a comment.
src/mon/PGMap.cc
Outdated
// sort by absolute deviation from the mean utilization, | ||
// in descending order. | ||
std::sort(util_by_osd.begin(), util_by_osd.end(), | ||
[average_util](std::pair<int, float> l, std::pair<int, float> r) { | ||
return abs(l.second - average_util) > abs(r.second - average_util); | ||
} | ||
); | ||
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the trailing space.
src/mon/PGMap.cc
Outdated
@@ -4559,6 +4560,9 @@ int reweight::by_utilization( | |||
// but aggressively adjust weights up whenever possible. | |||
const double underload_util = average_util; | |||
|
|||
// a measure of how uneven the data distribution is and hence how urgently reweighting is needed. | |||
double reweight_urgency = 0.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please move the definition down to b576d88#diff-ecab4c883be988760d61a8a883ddc23fR4585. we can postpone the definition to where it is used for the first time.
src/osd/OSDMap.cc
Outdated
@@ -3501,7 +3501,9 @@ int OSDMap::summarize_mapping_stats( | |||
OSDMap *newmap, | |||
const set<int64_t> *pools, | |||
std::string *out, | |||
Formatter *f) const | |||
Formatter *f, | |||
int *pg_moved = NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, s/NULL/nullptr/
src/mon/OSDMonitor.cc
Outdated
@@ -4067,7 +4067,7 @@ bool OSDMonitor::preprocess_command(MonOpRequestRef op) | |||
} | |||
} else if (prefix == "osd utilization") { | |||
string out; | |||
osdmap.summarize_mapping_stats(NULL, NULL, &out, f.get()); | |||
osdmap.summarize_mapping_stats(NULL, NULL, &out, f.get(), NULL, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, s/NULL/nullptr/
src/mon/PGMap.cc
Outdated
@@ -4707,9 +4709,12 @@ int reweight::by_utilization( | |||
double expected_data_movement = (double) pgm.osd_sum.kb_used; | |||
expected_data_movement *= (double) pg_moved / (double) pg_total; | |||
if (f) { | |||
f->dump_float("Expected data flow in KB: ", expected_data_movement); | |||
f->dump_float("Expected data flow (in KB): ", expected_data_movement); | |||
f->dump_int("Time for optization (in s): ", std::difftime(end, start)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: optimization.
BTW, why would our user be interested in the time used to perform the re-weight? especially, it has already happened. if the user is really interested in time spend on reweight, he/she can just
time ceph osd test-reweight-by-pg
src/mon/PGMap.cc
Outdated
@@ -4501,6 +4502,9 @@ int reweight::by_utilization( | |||
|
|||
vector<int> pgs_by_osd(osdmap.get_max_osd()); | |||
|
|||
std::time_t start, end; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could use ceph::mono_clock::now()
.
could you prefix the title of your commit message with the subcomponent your are changing ? see https://github.com/ceph/ceph/blob/master/SubmittingPatches.rst#3-describe-your-changes. |
src/mon/PGMap.cc
Outdated
std::time_t start, end; | ||
start = std::time(nullptr); | ||
std::clock_t time; | ||
time = std::clock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change should be folded into the previous commit.
6cb4f77
to
b12de2e
Compare
Rebased. |
b12de2e
to
7a5ca5f
Compare
* implements 'analyze-reweight-by-pg' and 'analyze-reweight-by-utilization' * reweight_urgency is between 0 to 100. * use cdf of normal distribution to prioritize highly over-weighted device. * consider only over-weighted devices to calculate reweight_urgency. Signed-off-by: Spandan Kumar Sahu <spandankumarsahu@gmail.com>
7a5ca5f
to
cbec9cf
Compare
src/mon/PGMap.cc
Outdated
@@ -4550,8 +4556,41 @@ int reweight::by_utilization( | |||
osd_util.second = (double)p.second.kb_used / (double)p.second.kb; | |||
} | |||
util_by_osd.push_back(osd_util); | |||
|
|||
// Overweighted devices and their weights are factors to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SpandanKumarSahu multi-line comments should be inside /* */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amitkumar50 could you please explain the rationale behind it? i asked the same question at #16715 (comment), but no response so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tchaikov I guess, @amitkumar50 is correct.
This (the note at the bottom) explains, why we should shy away from using single line comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe Ceph uses https://google.github.io/styleguide/cppguide.html#Comment_Style instead of the one on wikibooks. see https://github.com/ceph/ceph/blob/master/CodingStyle#L21
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was using both single line and multi-line comments. I changes everything to multi-line comments, to be consistent, as the Ceph coding style guide mentions.
src/mon/PGMap.cc
Outdated
convergence to 1, compared to the function currently in use. | ||
*/ | ||
|
||
// cdf of normal distribution:https://stackoverflow.com/a/18786808 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SpandanKumarSahu Don't know how good it is to include stackoverflow link.
This may change over the time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amitkumar50 Would a Wikipedia link for the same be okay?
adds 'expected data movement' as a metric in mgr commands of analyze-reweight-by-pg and in analyze-reweight-by-utilization. Expected data movement = fraction of PGs moved * total data Signed-off-by: Spandan Kumar Sahu <spandankumarsahu@gmail.com>
cbec9cf
to
7ec8722
Compare
A similar version of this has gone into the balancer branch! |
Implements a simple tool to show how "uneven" is the current distribution is, and what would be the expected data flow over the network, if we apply "reweight-by-utilization" or "reweight-by-pg"
@tchaikov @liewegas