New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hammer: mon: improve reweight_by_utilization() logic #9416
Conversation
@chardan This looks really good - thanks for helping with this non-trivial backport! I have just two "nits", both regarding the commit message in a3d5114 First, please remove the first line so the title is the same as the original commit. The second "nit" is more of a judgment call - would it be nicer/more useful to expand the conflict resolution sentence to include the "Important note" that you put in the PR description? (Hypothetical future forensic investigations will typically be only looking at the git commits and their descriptions, so all important information should be included.) |
Marking DNM until commit message is fixed and we get a successful make check |
test this please |
1e68abb
to
bf14d65
Compare
…y_utilization() logic Reviewed-by: Nathan Cutler <ncutler@suse.com>
…y_utilization() logic Reviewed-by: Nathan Cutler <ncutler@suse.com>
…y_utilization() logic Reviewed-by: Nathan Cutler <ncutler@suse.com>
@liewegas This PR is in the latest round of hammer-backports integration tests, which passed a rados run (the only failures are a valgrind false positive that has since been fixed by ceph/teuthology#915 and http://tracker.ceph.com/issues/15139 which is an infrastructure issue with two of the tests) - for details, see: http://tracker.ceph.com/issues/15895#note-18 Do you think this PR is OK to merge? |
@liewegas @athanatos This PR passed a /200 rados run on Ubuntu. None of the failures were reproducible. For details see http://tracker.ceph.com/issues/15895#note-18 Do you think it's OK to merge? |
@@ -466,7 +466,7 @@ struct Sorter { | |||
|
|||
double average_util; | |||
|
|||
Sorter(const average_util_) | |||
Sorter(const double average_util_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have been in the previous commit?
There seems to be a problem with the commits. If you can fix it and verify that the git diff between the two resulting branches is empty, I'm ok with merging it without another round of testing. |
By calling reweight_by_utilization() method, we are aiming at an evener result of utilization among all osds. To achieve this, we shall decrease weights of osds which are currently overloaded, and try to increase weights of osds which are currently underloaded when it is possible. However, we can't do this all at a time in order to avoid a massive pg migrations between osds. Thus we introduce a max_osds limit to smooth the progress. The problem here is that we have sorted the utilization of all osds in a descending manner and we always try to decrease the weights of the most overloaded osds since they are most likely to encounter a nearfull/full transition soon, but we won't increase the weights from the most underloaded(least utilized by contrast) at the same time, which I think is not quite reasonable. Actually, the best thing would probably be to iterate over teh low and high osds in parallel, and do the ones that are furthest from the average first. Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn> (cherry picked from commit e7a3253) Conflicts: src/mon/OSDMonitor.cc Resolved by picking the lambda implemenation. NOTE: Because hammer does not support C++11, the lambda functionality from the current master has been moved into the "Sorter" function object.
The grace calculation during check_failure() is now very complicated and time-consuming. Therefore we shall skip this when it is possible. Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn> (cherry picked from commit 3557903) Conflicts: src/mon/OSDMonitor.cc Resolved by choosing the move-to-top implementation. Removed unused vars.
bf14d65
to
0d22121
Compare
rebased |
@athanatos We fixed it in
Shall I merge wip-15770-hammer-2 manually, then? |
@athanatos On second thought, it'll be better to reset this branch to |
@chardan: Please do
|
0d22121
to
23498a9
Compare
Done! |
lgtm |
http://tracker.ceph.com/issues/15770
Important note: The logic from the lambda function appearing in master was ported into the C++98 friendly function object because hammer does not support C++11.