Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/balancer: mgr module to automatically balance PGs across OSDs #16272

Merged
merged 15 commits into from Sep 8, 2017

Conversation

liewegas
Copy link
Member

@liewegas liewegas commented Jul 11, 2017

  • back off when degraded, unknown, inactive
  • throttle against misplaced ratio
  • upmap (luminous+)
  • crush-legacy (compat with pre-luminous)
  • crush (luminous+)
  • osd_weights (legacy osd weight-based approach) (probably not worth doing this!)
  • phase out balance optimizations from other modes (e.g., phase out osd_weight if we are optimizing crush weights)

@liewegas
Copy link
Member Author

There's enough here that this actually works! The real win is the 'crush-legacy' mode.

'prefix': 'osd rm-pg-upmap-items',
'format': 'json',
'pgid': pgid,
}), 'foo')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tag ("foo") is used to identify the command upon completion. see MonCommandCompletion::finish() and PyModules::notify_all(). if we don't care about it, probably we can just pass an empty string here. but we can also pass something more meaningful here, like "upmap", in case we want to check it in future.

def __init__(self, handle):
self._handle = handle

def get_epoch(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, could be

@property
def epoch(self):
  return ceph_osdmap.get_epoch(self._handle)

@liewegas liewegas force-pushed the wip-balancer branch 2 times, most recently from 21c47e8 to a26ba1b Compare July 27, 2017 14:09
@liewegas liewegas force-pushed the wip-balancer branch 2 times, most recently from 51bfe83 to 930ca8b Compare August 4, 2017 22:06
# adjust/normalize by weight
adjusted = float(v) / target[k] / float(num)
dev += (avg - adjusted) * (avg - adjusted)
stddev = math.sqrt(dev / float(max(num - 1, 1)))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we keep the actual deviation here ? We have small numbers and having them reflect the actual difference will make things easier to read and easier to use to reach the optimum.

self.log.debug('pools %s' % pools)
self.log.debug('pool_rule %s' % pool_rule)

# get expected distributions by root
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should support multiple roots per rule by looping on the rules instead of the roots maybe ?

total_did += did
left -= did
if left <= 0:
break
self.log.info('prepared %d/%d changes' % (total_did, max_iterations))

incdump = inc.dump()
self.log.debug('resulting inc is %s' % incdump)
def do_crush_compat(self):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why compat needs a specifc strategy ? If the conditions are right (i.e. one rule per pool mostly) the crushmap can be converted to a pre-luminous compatible format. If not, there is no way to rebalance so that it can be exported and manual intervention is required.

@jcsp
Copy link
Contributor

jcsp commented Aug 28, 2017

The fix for the MonCommandCompletion crash is #17308 -- anyone testing this PR should pull that patch in too

liewegas and others added 14 commits September 6, 2017 16:45
This is summary info, same as what's in 'ceph status'.

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
- wake up every minute
- back off when unknown, inactive, degraded
- throttle against misplaced ratio
- apply some optimization step
  - initially implement 'upmap' only

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
…ts too

Allow us to specify a root node in the hierarchy instead of a rule.
This way we can use it in conjunction with find_takes().

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
These let us identify distinct CRUSH hierarchies that rules distribute
data over, and create relative weight maps for the OSDs they map to.

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
* score lies in [0, 1), 0 being perfect distribution
* use shifted and scaled cdf of normal distribution
  to prioritize highly over-weighted device.
* consider only over-weighted devices to calculate score

Signed-off-by: Spandan Kumar Sahu <spandankumarsahu@gmail.com>
(with upmap at least)

Signed-off-by: Sage Weil <sage@redhat.com>
@liewegas liewegas changed the title WIP: pybind/mgr/balancer: mgr module to automatically balance PGs across OSDs mgr/balancer: mgr module to automatically balance PGs across OSDs Sep 6, 2017
@liewegas
Copy link
Member Author

liewegas commented Sep 6, 2017

@jcsp the 'upmap' mode is fully functional. I'm thinking it makes sense to merge this now in master, and later (when the more useful crush-compat mode is working) backport it to luminous.

I'm thinking it'd be good to get the mgr glue bits in sooner rather than later, though. But some of that probably needs some rework to align with the native object stuff you have in flight?

@jcsp
Copy link
Contributor

jcsp commented Sep 7, 2017

@liewegas I'm completely happy for this to merge to master (luminous should wait, as you say), it makes my life easier with the other branches

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants