New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
luminous: mon, osd: fix potential collided *Up Set* after PG remapping #20829
Conversation
The mgr balancer module are basically doing optimizations based on the snapshots of OSDMap at certain moments, which turns out to be the culprit of data loss since it can produce bad PG mapping results sometimes while in upmap mode. I.e.: 1) original cluster topology: -5 2.00000 host host-a 0 ssd 1.00000 osd.0 up 1.00000 1.00000 1 ssd 1.00000 osd.1 up 1.00000 1.00000 -7 2.00000 host host-b 2 ssd 1.00000 osd.2 up 1.00000 1.00000 3 ssd 1.00000 osd.3 up 1.00000 1.00000 -9 2.00000 host host-c 4 ssd 1.00000 osd.4 up 1.00000 1.00000 5 ssd 1.00000 osd.5 up 1.00000 1.00000 2) mgr balancer applies optimization for PG 3.f: pg-upmap-items[3.f : 1->4] 3.f [1 3] + -------------------------> [4 3] 3) osd.3 is out/reweighted etc., original crush mapping of 3.f changed (while pg-upmap-items did not): pg-upmap-items[3.f : 1->4] 3.f [1 5] + -------------------------> [4 5] 4) we are now mapping PG 3.f to two OSDs(osd.4 & osd.5) on the same host (host-c). Fix the above problem by putting a guard procedure before we can finally encode these *unsafe* upmap remappings into OSDMap. If any of them turns out to be inappropriate, we can simply cancel it since balancer can still re-calculate and re-generate later if necessary. Fixes: http://tracker.ceph.com/issues/23118 Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn> (cherry picked from commit 5fa467e) Conflicts: src/crush/CrushWrapper.cc drop **const** constraint of CrushWrapper::_get_leaves from master
For testing purpose, we may create pools of failure-domain OSD. In this case get_parent_of_type() will always return 0 and all parents of these PGs' *up set* are considered as collided, and hence the relevant pg_upmap/items are incorrectly removed during the maybe_remove_pg_upmaps() procedure. Fix the above problem by skipping these PGs and also try to be more specific if we are unable to load the parent of a specified PG. Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn> (cherry picked from commit f9a35aa)
There are quite a lot mgr/balancer related tests that I can observe the following logged errors: ``` 2018-03-03 08:14:00.359946 7fdda18cc700 -1 maybe_remove_pg_upmaps unable to load crush-rule of pg 5.b ``` which turns out to be a *pool-deletion* vs *balancer-auto-injected-upmap-changes* race issue. The root cause is that we don't clean up those *pending* pg_upmap/pg_upmap_items injected by the mgr/balancer properly simultaneously when the corresponding pool is gone, and hence the above problem can be fixed by: 1. clean up any pending upmap changes too if the corresponding pool is gone 2. re-check pending pool removal queue before we can safely apply any new upmap changes Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn> (cherry picked from commit bd828fc)
@xiexingguo can you add the ticket number and original PR this backport is from |
@theanalyst The first commit includes the original ticket number.. |
does not look good: 2018-04-06T23:06:35.044 INFO:tasks.ceph.mgr.x.smithi118.stderr:*** Caught signal (Segmentation fault) ** 2018-04-06T23:06:35.044 INFO:tasks.ceph.mgr.x.smithi118.stderr: in thread 7f84692bd700 thread_name:balancer 2018-04-06T23:06:35.045 INFO:tasks.ceph.mgr.x.smithi118.stderr: ceph version 12.2.4-372-g3ec4869 (3ec4869d8a1d6ef06aa5698b52347ce41aa0d068) luminous (stable) 2018-04-06T23:06:35.045 INFO:tasks.ceph.mgr.x.smithi118.stderr: 1: (()+0x40c194) [0x557093897194] 2018-04-06T23:06:35.045 INFO:tasks.ceph.mgr.x.smithi118.stderr: 2: (()+0x11390) [0x7f8478180390] 2018-04-06T23:06:35.045 INFO:tasks.ceph.mgr.x.smithi118.stderr: 3: (OSDMap::_apply_upmap(pg_pool_t const&, pg_t, std::vector >*) const+0x1af) [0x5570939c551f] 2018-04-06T23:06:35.045 INFO:tasks.ceph.mgr.x.smithi118.stderr: 4: (OSDMap::_pg_to_up_acting_osds(pg_t const&, std::vector >*, int*, std::vector >*, int*, bool) const+0x257) [0x5570939c9f97] 2018-04-06T23:06:35.045 INFO:tasks.ceph.mgr.x.smithi118.stderr: 5: (OSDMap::calc_pg_upmaps(CephContext*, float, int, std::set, std::allocator > const&, OSDMap::Incremental*)+0x367) [0x5570939db527] 2018-04-06T23:06:35.045 INFO:tasks.ceph.mgr.x.smithi118.stderr: 6: (()+0x2efef8) [0x55709377aef8] 2018-04-06T23:06:35.045 INFO:tasks.ceph.mgr.x.smithi118.stderr: 7: (PyEval_EvalFrameEx()+0x8a51) [0x7f847925c971] 2018-04-06T23:06:35.046 INFO:tasks.ceph.mgr.x.smithi118.stderr: 8: (PyEval_EvalCodeEx()+0x85c) [0x7f847939205c] 2018-04-06T23:06:35.046 INFO:tasks.ceph.mgr.x.smithi118.stderr: 9: (PyEval_EvalFrameEx()+0x6ffd) [0x7f847925af1d] 2018-04-06T23:06:35.046 INFO:tasks.ceph.mgr.x.smithi118.stderr: 10: (PyEval_EvalFrameEx()+0x7124) [0x7f847925b044] 2018-04-06T23:06:35.046 INFO:tasks.ceph.mgr.x.smithi118.stderr: 11: (PyEval_EvalFrameEx()+0x7124) [0x7f847925b044] 2018-04-06T23:06:35.046 INFO:tasks.ceph.mgr.x.smithi118.stderr: 12: (PyEval_EvalCodeEx()+0x85c) [0x7f847939205c] 2018-04-06T23:06:35.046 INFO:tasks.ceph.mgr.x.smithi118.stderr: 13: (()+0x13e370) [0x7f84792e8370] 2018-04-06T23:06:35.046 INFO:tasks.ceph.mgr.x.smithi118.stderr: 14: (PyObject_Call()+0x43) [0x7f84792bb273] 2018-04-06T23:06:35.046 INFO:tasks.ceph.mgr.x.smithi118.stderr: 15: (()+0x1853ac) [0x7f847932f3ac] 2018-04-06T23:06:35.046 INFO:tasks.ceph.mgr.x.smithi118.stderr: 16: (PyObject_Call()+0x43) [0x7f84792bb273] 2018-04-06T23:06:35.046 INFO:tasks.ceph.mgr.x.smithi118.stderr: 17: (PyObject_CallMethod()+0xf4) [0x7f84792bc444] 2018-04-06T23:06:35.047 INFO:tasks.ceph.mgr.x.smithi118.stderr: 18: (PyModuleRunner::serve()+0x5c) [0x5570937780dc] 2018-04-06T23:06:35.047 INFO:tasks.ceph.mgr.x.smithi118.stderr: 19: (PyModuleRunner::PyModuleRunnerThread::entry()+0x1b8) [0x5570937788e8] 2018-04-06T23:06:35.047 INFO:tasks.ceph.mgr.x.smithi118.stderr: 20: (()+0x76ba) [0x7f84781766ba] 2018-04-06T23:06:35.047 INFO:tasks.ceph.mgr.x.smithi118.stderr: 21: (clone()+0x6d) [0x7f84771e241d] 2018-04-06T23:06:35.047 INFO:tasks.ceph.mgr.x.smithi118.stderr:2018-04-06 23:06:35.044306 7f84692bd700 -1 *** Caught signal (Segmentation fault) ** @xiexingguo do you mind looking? |
Sure, I'm on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Reviewed-by: Sage Weil sage@redhat.com
http://tracker.ceph.com/issues/23118