Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr: reweight analysis tool #16361

Closed

Conversation

SpandanKumarSahu
Copy link
Contributor

Implements a simple tool to show how "uneven" is the current distribution is, and what would be the expected data flow over the network, if we apply "reweight-by-utilization" or "reweight-by-pg"

@tchaikov @liewegas

src/mon/PGMap.cc Outdated
@@ -4602,16 +4607,38 @@ int reweight::by_utilization(
osd_util.second = (double)p.second.kb_used / (double)p.second.kb;
}
util_by_osd.push_back(osd_util);

// Overweighted devices and their weights are factors to
// calculate reweight_emergency. One 10% underfilled device
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/reweight_emergency/reweight_urgency/

src/mon/PGMap.cc Outdated

// cdf of normal distribution:https://stackoverflow.com/a/18786808
reweight_urgency +=
weight * (erfc(-((osd_util.second - average_util)/average_util) * M_SQRT1_2)-1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we can model the distribution of the deviation from average utilization using normal distribution?

the cdf of normal distribution is: \Phi(x) = 1 / 2 erfc(-x / \sqrt{2}), per https://en.wikipedia.org/wiki/Error_function#Cumulative_distribution_function

assuming the random variable in this case is (osd_util.second - average_util)/average_util, the cdf should be:
1 / 2 * erfc(-((osd_util.second - average_util) / average_util) * M_SQRT1_2)
but not

weight * (erfc(-((osd_util.second - average_util)/average_util) * M_SQRT1_2)-1);

so, could you elaborate on how reweight_urgency is related to CDF of a normal distribution. and what the normal distribution is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I used a shifted and scaled version of cdf of standard normal distribution.

  2. I wanted to have a function F(x), where x = (osd_util.second - average_util) / average_util, which is bounded between 0 and 1, so that ultimately reweight_urgency will also be bounded. A larger value of x, should imply more urgency to reweight. Also, the difference between F(x) when x is large, should be minimal. The value of F(x) should get close to 1 (highest urgency to reweight) with steeply. I had also thought of using F(x) = (1 - e^(-x)). But that had slower convergence to 1, compared to the function currently in use.

  3. Since I am only considering the over-utilized devices, so
    (osd_util.second - average_util) / average_util will be greater than 0. The \Phi(0) = 0.5 . I shifted the function by 0.5 and scaled it by 2 times, so that, F(0) = 0 and F(x) asymptotically tends to 1. So, the function I am using is 2*(Phi(x) - 0.5). I will include it in a comment.

src/mon/PGMap.cc Outdated
// sort by absolute deviation from the mean utilization,
// in descending order.
std::sort(util_by_osd.begin(), util_by_osd.end(),
[average_util](std::pair<int, float> l, std::pair<int, float> r) {
return abs(l.second - average_util) > abs(r.second - average_util);
}
);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the trailing space.

src/mon/PGMap.cc Outdated
@@ -4559,6 +4560,9 @@ int reweight::by_utilization(
// but aggressively adjust weights up whenever possible.
const double underload_util = average_util;

// a measure of how uneven the data distribution is and hence how urgently reweighting is needed.
double reweight_urgency = 0.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move the definition down to b576d88#diff-ecab4c883be988760d61a8a883ddc23fR4585. we can postpone the definition to where it is used for the first time.

@@ -3501,7 +3501,9 @@ int OSDMap::summarize_mapping_stats(
OSDMap *newmap,
const set<int64_t> *pools,
std::string *out,
Formatter *f) const
Formatter *f,
int *pg_moved = NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, s/NULL/nullptr/

@@ -4067,7 +4067,7 @@ bool OSDMonitor::preprocess_command(MonOpRequestRef op)
}
} else if (prefix == "osd utilization") {
string out;
osdmap.summarize_mapping_stats(NULL, NULL, &out, f.get());
osdmap.summarize_mapping_stats(NULL, NULL, &out, f.get(), NULL, NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, s/NULL/nullptr/

src/mon/PGMap.cc Outdated
@@ -4707,9 +4709,12 @@ int reweight::by_utilization(
double expected_data_movement = (double) pgm.osd_sum.kb_used;
expected_data_movement *= (double) pg_moved / (double) pg_total;
if (f) {
f->dump_float("Expected data flow in KB: ", expected_data_movement);
f->dump_float("Expected data flow (in KB): ", expected_data_movement);
f->dump_int("Time for optization (in s): ", std::difftime(end, start));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: optimization.

BTW, why would our user be interested in the time used to perform the re-weight? especially, it has already happened. if the user is really interested in time spend on reweight, he/she can just

time ceph osd test-reweight-by-pg

src/mon/PGMap.cc Outdated
@@ -4501,6 +4502,9 @@ int reweight::by_utilization(

vector<int> pgs_by_osd(osdmap.get_max_osd());

std::time_t start, end;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use ceph::mono_clock::now().

@tchaikov
Copy link
Contributor

could you prefix the title of your commit message with the subcomponent your are changing ? see https://github.com/ceph/ceph/blob/master/SubmittingPatches.rst#3-describe-your-changes.

src/mon/PGMap.cc Outdated
std::time_t start, end;
start = std::time(nullptr);
std::clock_t time;
time = std::clock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change should be folded into the previous commit.

@jcsp
Copy link
Contributor

jcsp commented Jul 17, 2017

This seems like something that would be a good fit for a python module rather than adding to the C++ bits. @liewegas has a work in progress that exposes some extra bits of OSDMap for use cases like this in #16272

@SpandanKumarSahu SpandanKumarSahu force-pushed the ReweightAnalysis branch 3 times, most recently from 6cb4f77 to b12de2e Compare July 18, 2017 15:37
@SpandanKumarSahu
Copy link
Contributor Author

Rebased.

* implements 'analyze-reweight-by-pg' and 'analyze-reweight-by-utilization'
* reweight_urgency is between 0 to 100.
* use cdf of normal distribution to prioritize highly over-weighted device.
* consider only over-weighted devices to calculate reweight_urgency.

Signed-off-by: Spandan Kumar Sahu <spandankumarsahu@gmail.com>
src/mon/PGMap.cc Outdated
@@ -4550,8 +4556,41 @@ int reweight::by_utilization(
osd_util.second = (double)p.second.kb_used / (double)p.second.kb;
}
util_by_osd.push_back(osd_util);

// Overweighted devices and their weights are factors to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SpandanKumarSahu multi-line comments should be inside /* */

Copy link
Contributor

@tchaikov tchaikov Aug 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amitkumar50 could you please explain the rationale behind it? i asked the same question at #16715 (comment), but no response so far.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchaikov I guess, @amitkumar50 is correct.
This (the note at the bottom) explains, why we should shy away from using single line comments.

Copy link
Contributor

@tchaikov tchaikov Aug 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using both single line and multi-line comments. I changes everything to multi-line comments, to be consistent, as the Ceph coding style guide mentions.

src/mon/PGMap.cc Outdated
convergence to 1, compared to the function currently in use.
*/

// cdf of normal distribution:https://stackoverflow.com/a/18786808

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SpandanKumarSahu Don't know how good it is to include stackoverflow link.
This may change over the time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amitkumar50 Would a Wikipedia link for the same be okay?

adds 'expected data movement' as a metric in mgr commands of
analyze-reweight-by-pg and in analyze-reweight-by-utilization.
Expected data movement = fraction of PGs moved * total data

Signed-off-by: Spandan Kumar Sahu <spandankumarsahu@gmail.com>
@liewegas
Copy link
Member

A similar version of this has gone into the balancer branch!

@liewegas liewegas closed this Aug 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants