Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/balancer: cast config vals to int or float #19493

Merged
merged 1 commit into from Dec 15, 2017

Conversation

Projects
None yet
4 participants
@dvanders
Copy link
Contributor

commented Dec 13, 2017

upmap_max_iterations and other config vals need to be numeric.
Cast them appropriately.

Signed-off-by: Dan van der Ster daniel.vanderster@cern.ch
Fixes: http://tracker.ceph.com/issues/22429

mgr/balancer: cast config vals to int or float
upmap_max_iterations and other config vals need to be numeric.
Cast them appropriately.

Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
Fixes: http://tracker.ceph.com/issues/22429
@liewegas
Copy link
Member

left a comment

yes!

@liewegas liewegas added the needs-qa label Dec 13, 2017

@liewegas

This comment has been minimized.

Copy link
Member

commented Dec 13, 2017

Note that there is a fair bit of work that can be done to improve the current balancer behavior (although what is there now works reasonably well!). I'm very excited to see how well it does on a large cluster--I only had our janky lab cluster to test against (which was a real mess weight-wise but only ~100 osds).

@dvanders

This comment has been minimized.

Copy link
Contributor Author

commented Dec 13, 2017

Yeah I'm slowly testing it now -- so far so good :) (except this other small learning-curve issue http://tracker.ceph.com/issues/22424)

crush-compat seemed to work well enough on an older 3-host, 72-osd cluster -- but that has only one root, is uniform. Here's how it managed on that cluster:

# ceph osd utilization
avg 172.667
stddev 0.471405 (expected baseline 13.0487)
min osd.15 with 172 pgs (0.996139 * mean)
max osd.1 with 173 pgs (1.00193 * mean)

And with upmap on a newer 381-osd, 127-host cluster, much less uniform (but still single root) it managed:

# ceph osd utilization
avg 96.7559
stddev 0.728678 (expected baseline 9.82354)
min osd.0 with 96 pgs (0.992188 * mean)
max osd.1 with 98 pgs (1.01286 * mean)

Both were quite imbalanced to start.

One potential problem I noticed is that upmap works sequentially by pool -- in my test cluster I have 3 (currently empty) pools, and it managed to balance things by upmapping only the first pool. I guess this was the intention of random.shuffle(pools) in do_upmap but it doesn't seem to work.

And BTW it was very nice that crush-compat mode gradually increased the osd reweights back to 1.0 -- this will greatly simplify the phase-in on our already-reweighed clusters. Last thing I need is a way to slowly phase in new hardware (start with 0 pgs then let the balancer slowly map PGs to them) -- possibly its already doable with the existing balancer, or otherwise it shouldn't be too complicated.

@liewegas

This comment has been minimized.

Copy link
Member

commented Dec 13, 2017

That was definitely the intention of the shuffle.. curious why it didn't work!

Note that the upmap mode doesn't do the reweights back to 1.0. :(

@dvanders

This comment has been minimized.

Copy link
Contributor Author

commented Dec 13, 2017

http://tracker.ceph.com/issues/22431 for that shuffle issue.

@jcsp

jcsp approved these changes Dec 14, 2017

@tchaikov tchaikov merged commit 5043e4b into ceph:master Dec 15, 2017

5 checks passed

Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
make check (arm64) make check succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.