luminous: mon, osd: fix potential collided Up Set after PG remapping #20829

xiexingguo · 2018-03-10T03:01:02Z

http://tracker.ceph.com/issues/23118

The mgr balancer module are basically doing optimizations based on the snapshots of OSDMap at certain moments, which turns out to be the culprit of data loss since it can produce bad PG mapping results sometimes while in upmap mode. I.e.: 1) original cluster topology: -5 2.00000 host host-a 0 ssd 1.00000 osd.0 up 1.00000 1.00000 1 ssd 1.00000 osd.1 up 1.00000 1.00000 -7 2.00000 host host-b 2 ssd 1.00000 osd.2 up 1.00000 1.00000 3 ssd 1.00000 osd.3 up 1.00000 1.00000 -9 2.00000 host host-c 4 ssd 1.00000 osd.4 up 1.00000 1.00000 5 ssd 1.00000 osd.5 up 1.00000 1.00000 2) mgr balancer applies optimization for PG 3.f: pg-upmap-items[3.f : 1->4] 3.f [1 3] + -------------------------> [4 3] 3) osd.3 is out/reweighted etc., original crush mapping of 3.f changed (while pg-upmap-items did not): pg-upmap-items[3.f : 1->4] 3.f [1 5] + -------------------------> [4 5] 4) we are now mapping PG 3.f to two OSDs(osd.4 & osd.5) on the same host (host-c). Fix the above problem by putting a guard procedure before we can finally encode these *unsafe* upmap remappings into OSDMap. If any of them turns out to be inappropriate, we can simply cancel it since balancer can still re-calculate and re-generate later if necessary. Fixes: http://tracker.ceph.com/issues/23118 Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn> (cherry picked from commit 5fa467e) Conflicts: src/crush/CrushWrapper.cc drop **const** constraint of CrushWrapper::_get_leaves from master

For testing purpose, we may create pools of failure-domain OSD. In this case get_parent_of_type() will always return 0 and all parents of these PGs' *up set* are considered as collided, and hence the relevant pg_upmap/items are incorrectly removed during the maybe_remove_pg_upmaps() procedure. Fix the above problem by skipping these PGs and also try to be more specific if we are unable to load the parent of a specified PG. Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn> (cherry picked from commit f9a35aa)

There are quite a lot mgr/balancer related tests that I can observe the following logged errors: ``` 2018-03-03 08:14:00.359946 7fdda18cc700 -1 maybe_remove_pg_upmaps unable to load crush-rule of pg 5.b ``` which turns out to be a *pool-deletion* vs *balancer-auto-injected-upmap-changes* race issue. The root cause is that we don't clean up those *pending* pg_upmap/pg_upmap_items injected by the mgr/balancer properly simultaneously when the corresponding pool is gone, and hence the above problem can be fixed by: 1. clean up any pending upmap changes too if the corresponding pool is gone 2. re-check pending pool removal queue before we can safely apply any new upmap changes Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn> (cherry picked from commit bd828fc)

theanalyst · 2018-04-03T16:13:17Z

@xiexingguo can you add the ticket number and original PR this backport is from

xiexingguo · 2018-04-04T00:50:07Z

@theanalyst The first commit includes the original ticket number..

yuriw · 2018-04-06T19:48:49Z

wip-yuri2-testing-2018-04-06-1946-luminous

liewegas · 2018-04-09T13:59:54Z

does not look good:

2018-04-06T23:06:35.044 INFO:tasks.ceph.mgr.x.smithi118.stderr:*** Caught signal (Segmentation fault) **
2018-04-06T23:06:35.044 INFO:tasks.ceph.mgr.x.smithi118.stderr: in thread 7f84692bd700 thread_name:balancer
2018-04-06T23:06:35.045 INFO:tasks.ceph.mgr.x.smithi118.stderr: ceph version 12.2.4-372-g3ec4869 (3ec4869d8a1d6ef06aa5698b52347ce41aa0d068) luminous (stable)
2018-04-06T23:06:35.045 INFO:tasks.ceph.mgr.x.smithi118.stderr: 1: (()+0x40c194) [0x557093897194]
2018-04-06T23:06:35.045 INFO:tasks.ceph.mgr.x.smithi118.stderr: 2: (()+0x11390) [0x7f8478180390]
2018-04-06T23:06:35.045 INFO:tasks.ceph.mgr.x.smithi118.stderr: 3: (OSDMap::_apply_upmap(pg_pool_t const&, pg_t, std::vector >*) const+0x1af) [0x5570939c551f]
2018-04-06T23:06:35.045 INFO:tasks.ceph.mgr.x.smithi118.stderr: 4: (OSDMap::_pg_to_up_acting_osds(pg_t const&, std::vector >*, int*, std::vector >*, int*, bool) const+0x257) [0x5570939c9f97]
2018-04-06T23:06:35.045 INFO:tasks.ceph.mgr.x.smithi118.stderr: 5: (OSDMap::calc_pg_upmaps(CephContext*, float, int, std::set, std::allocator > const&, OSDMap::Incremental*)+0x367) [0x5570939db527]
2018-04-06T23:06:35.045 INFO:tasks.ceph.mgr.x.smithi118.stderr: 6: (()+0x2efef8) [0x55709377aef8]
2018-04-06T23:06:35.045 INFO:tasks.ceph.mgr.x.smithi118.stderr: 7: (PyEval_EvalFrameEx()+0x8a51) [0x7f847925c971]
2018-04-06T23:06:35.046 INFO:tasks.ceph.mgr.x.smithi118.stderr: 8: (PyEval_EvalCodeEx()+0x85c) [0x7f847939205c]
2018-04-06T23:06:35.046 INFO:tasks.ceph.mgr.x.smithi118.stderr: 9: (PyEval_EvalFrameEx()+0x6ffd) [0x7f847925af1d]
2018-04-06T23:06:35.046 INFO:tasks.ceph.mgr.x.smithi118.stderr: 10: (PyEval_EvalFrameEx()+0x7124) [0x7f847925b044]
2018-04-06T23:06:35.046 INFO:tasks.ceph.mgr.x.smithi118.stderr: 11: (PyEval_EvalFrameEx()+0x7124) [0x7f847925b044]
2018-04-06T23:06:35.046 INFO:tasks.ceph.mgr.x.smithi118.stderr: 12: (PyEval_EvalCodeEx()+0x85c) [0x7f847939205c]
2018-04-06T23:06:35.046 INFO:tasks.ceph.mgr.x.smithi118.stderr: 13: (()+0x13e370) [0x7f84792e8370]
2018-04-06T23:06:35.046 INFO:tasks.ceph.mgr.x.smithi118.stderr: 14: (PyObject_Call()+0x43) [0x7f84792bb273]
2018-04-06T23:06:35.046 INFO:tasks.ceph.mgr.x.smithi118.stderr: 15: (()+0x1853ac) [0x7f847932f3ac]
2018-04-06T23:06:35.046 INFO:tasks.ceph.mgr.x.smithi118.stderr: 16: (PyObject_Call()+0x43) [0x7f84792bb273]
2018-04-06T23:06:35.046 INFO:tasks.ceph.mgr.x.smithi118.stderr: 17: (PyObject_CallMethod()+0xf4) [0x7f84792bc444]
2018-04-06T23:06:35.047 INFO:tasks.ceph.mgr.x.smithi118.stderr: 18: (PyModuleRunner::serve()+0x5c) [0x5570937780dc]
2018-04-06T23:06:35.047 INFO:tasks.ceph.mgr.x.smithi118.stderr: 19: (PyModuleRunner::PyModuleRunnerThread::entry()+0x1b8) [0x5570937788e8]
2018-04-06T23:06:35.047 INFO:tasks.ceph.mgr.x.smithi118.stderr: 20: (()+0x76ba) [0x7f84781766ba]
2018-04-06T23:06:35.047 INFO:tasks.ceph.mgr.x.smithi118.stderr: 21: (clone()+0x6d) [0x7f84771e241d]
2018-04-06T23:06:35.047 INFO:tasks.ceph.mgr.x.smithi118.stderr:2018-04-06 23:06:35.044306 7f84692bd700 -1 *** Caught signal (Segmentation fault) **

@xiexingguo do you mind looking?
/a/yuriw-2018-04-06_21:35:03-rados-wip-yuri2-testing-2018-04-06-1946-luminous-distro-basic-smithi/2365151

xiexingguo · 2018-04-10T00:35:55Z

@xiexingguo do you mind looking?

Sure, I'm on it.

xiexingguo · 2018-04-10T11:51:16Z

@liewegas I believe the above test failure is not related to this change or #20840.
See #21325

yuriw · 2018-04-10T16:44:48Z

wip-yuri5-testing-2018-04-10-1643-luminous

yuriw

👍

Reviewed-by: Sage Weil sage@redhat.com

xiexingguo added 2 commits March 10, 2018 10:56

xiexingguo added the bug-fix label Mar 10, 2018

xiexingguo added this to the luminous milestone Mar 10, 2018

theanalyst added the luminous-batch-1 label Apr 6, 2018

yuriw added the wip-yuri2-testing label Apr 6, 2018

yuriw removed the wip-yuri2-testing label Apr 9, 2018

yuriw added the wip-yuri5-testing label Apr 10, 2018

yuriw approved these changes Apr 11, 2018

View reviewed changes

yuriw merged commit 719e13c into ceph:luminous Apr 11, 2018

xiexingguo deleted the backport-pr-20653 branch April 12, 2018 00:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

luminous: mon, osd: fix potential collided Up Set after PG remapping #20829

luminous: mon, osd: fix potential collided Up Set after PG remapping #20829

xiexingguo commented Mar 10, 2018 •

edited

theanalyst commented Apr 3, 2018

xiexingguo commented Apr 4, 2018

yuriw commented Apr 6, 2018

liewegas commented Apr 9, 2018

xiexingguo commented Apr 10, 2018

xiexingguo commented Apr 10, 2018 •

edited

yuriw commented Apr 10, 2018

yuriw left a comment

luminous: mon, osd: fix potential collided *Up Set* after PG remapping #20829

luminous: mon, osd: fix potential collided *Up Set* after PG remapping #20829

Conversation

xiexingguo commented Mar 10, 2018 • edited

theanalyst commented Apr 3, 2018

xiexingguo commented Apr 4, 2018

yuriw commented Apr 6, 2018

liewegas commented Apr 9, 2018

xiexingguo commented Apr 10, 2018

xiexingguo commented Apr 10, 2018 • edited

yuriw commented Apr 10, 2018

yuriw left a comment

Choose a reason for hiding this comment

luminous: mon, osd: fix potential collided Up Set after PG remapping #20829

luminous: mon, osd: fix potential collided Up Set after PG remapping #20829

xiexingguo commented Mar 10, 2018 •

edited

xiexingguo commented Apr 10, 2018 •

edited