Skip to content

Commit

Permalink
pybind/mgr/balancer: fix pool-deletion vs auto-optimization race
Browse files Browse the repository at this point in the history
This patch fixes the error below:
```
File "/usr/lib/ceph/mgr/balancer/module.py", line 722, in optimize
  return self.do_crush_compat(plan)
File "/usr/lib/ceph/mgr/balancer/module.py", line 781, in do_crush_compat
  pe = self.calc_eval(ms, plan.pools)
File "/usr/lib/ceph/mgr/balancer/module.py", line 570, in calc_eval
  objects_by_osd[osd] += ms.pg_stat[pgid]['num_objects']
KeyError: ('5.1b',)
```

The root cause is that balancer is basically collecting cluster
information from two separate maps (OSDMap and PGMap), and hence
there is a small window/chance that the pool statistics might
become divergent. E.g.:
1) auto-optimization begin
2) get osdmap
3) a pool is gone (deleted by admin); pg_dump refreshed
4) get pg_dump (balancer is now with both the newest pg_dump
   and an obsolute osdmap in hand)
5) execute optimization; balancer complains some PGs are missing
   in the pg_dump map..

Fix the above problem by tracing pools existing in both maps only.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit a57b803)
  • Loading branch information
xiexingguo committed Mar 10, 2018
1 parent e50a50c commit dcdeae3
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion src/pybind/mgr/balancer/module.py
Expand Up @@ -31,7 +31,9 @@ def __init__(self, osdmap, pg_dump, desc=''):
self.pg_stat = {
i['pgid']: i['stat_sum'] for i in pg_dump.get('pg_stats', [])
}
self.poolids = [p['pool'] for p in self.osdmap_dump.get('pools', [])]
osd_poolids = [p['pool'] for p in self.osdmap_dump.get('pools', [])]
pg_poolids = [p['poolid'] for p in pg_dump.get('pool_stats', [])]
self.poolids = set(osd_poolids) & set(pg_poolids)
self.pg_up = {}
self.pg_up_by_poolid = {}
for poolid in self.poolids:
Expand Down Expand Up @@ -408,6 +410,9 @@ def calc_eval(self, ms, pools):
for p in ms.osdmap_dump.get('pools',[]):
if len(pools) and p['pool_name'] not in pools:
continue
# skip dead or not-yet-ready pools too
if p['pool'] not in ms.poolids:
continue
pe.pool_name[p['pool']] = p['pool_name']
pe.pool_id[p['pool_name']] = p['pool']
pool_rule[p['pool_name']] = p['crush_rule']
Expand Down

0 comments on commit dcdeae3

Please sign in to comment.