New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/MgrStandby: respawn when deactivated #15557

Merged
merged 1 commit into from Jun 8, 2017

Conversation

Projects
None yet
2 participants
@liewegas
Member

liewegas commented Jun 7, 2017

  • It is ugly to unwind all of the Mgr state so that we can reactivate
    later.
  • It is perhaps impossible to do shut down the python state reliably.
  • Respawning provides a clean state and is reliable.

This mostly just copies MDSServer::respawn().

Fixes: http://tracker.ceph.com/issues/19595
Fixes: http://tracker.ceph.com/issues/19549
Signed-off-by: Sage Weil sage@redhat.com

mgr/MgrStandby: respawn when deactivated
- It is ugly to unwind all of the Mgr state so that we can reactivate
  later.
- It is perhaps impossible to do shut down the python state reliably.
- Respawning provides a clean state and is reliable.

This mostly just copies MDSServer::respawn().

Fixes: http://tracker.ceph.com/issues/19595
Fixes: http://tracker.ceph.com/issues/19549
Signed-off-by: Sage Weil <sage@redhat.com>

@liewegas liewegas requested review from jcsp and tchaikov Jun 7, 2017

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 7, 2017

I just hit another Mgr-shutdown bug in my last run:

2017-06-07T18:29:47.046 INFO:tasks.ceph.mgr.x.smithi116.stderr:src/tcmalloc.cc:278] Attempt to free invalid pointer 0x1f
2017-06-07T18:29:47.046 INFO:tasks.ceph.mgr.x.smithi116.stderr:*** Caught signal (Aborted) **
2017-06-07T18:29:47.046 INFO:tasks.ceph.mgr.x.smithi116.stderr: in thread 7f8da56cf700 thread_name:fn_anonymous
2017-06-07T18:29:47.047 INFO:tasks.ceph.mgr.x.smithi116.stderr: ceph version  12.0.2-2485-gc8340cd (c8340cde85674f8d9506d602368c2fd9a6307580) luminous (dev)
2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 1: (()+0x393172) [0x56490d9f6172]
2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 2: (()+0x113e0) [0x7f8dacb8f3e0]
2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 3: (gsignal()+0x38) [0x7f8dabb20428]
2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 4: (abort()+0x16a) [0x7f8dabb2202a]
2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 5: (tcmalloc::Log(tcmalloc::LogMode, char const*, int, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem)+0x22e) [0x7f8dad7625ce]
2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 6: (()+0x1375f) [0x7f8dad75675f]
2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 7: (operator delete[](void*)+0x1fd) [0x7f8dad77966d]
2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 8: (std::_Rb_tree, std::allocator > >, std::pair, std::allocator > >, std::_Identity, std::allocator > > >, std::less, std::allocator > > >, std::allocator, std::allocator > > > >::erase(std::pair, std::allocator > > const&)+0x63) [0x56490d903723]
2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 9: (MetadataUpdate::finish(int)+0x43) [0x56490d905fb3]
2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 10: (Context::complete(int)+0x9) [0x56490d8cab79]
2017-06-07T18:29:47.049 INFO:tasks.ceph.mgr.x.smithi116.stderr: 11: (Finisher::finisher_thread_entry()+0x460) [0x56490da35480]
2017-06-07T18:29:47.049 INFO:tasks.ceph.mgr.x.smithi116.stderr: 12: (()+0x770a) [0x7f8dacb8570a]
2017-06-07T18:29:47.049 INFO:tasks.ceph.mgr.x.smithi116.stderr: 13: (clone()+0x6d) [0x7f8dabbf182d]
2017-06-07T18:29:47.049 INFO:tasks.ceph.mgr.x.smithi116.stderr:2017-06-07 18:29:47.048186 7f8da56cf700 -1 *** Caught signal (Aborted) **
2017-06-07T18:29:47.049 INFO:tasks.ceph.mgr.x.smithi116.stderr: in thread 7f8da56cf700 thread_name:fn_anonymous

but fixing these feels like a waste of time.

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 8, 2017

tests look okay...

@jcsp

jcsp approved these changes Jun 8, 2017

This looks fine, I was wondering if we could avoid the copy-paste by putting the respawn bit somewhere common but it's hardly essential.

@liewegas liewegas merged commit f05a34a into ceph:master Jun 8, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details

@liewegas liewegas deleted the liewegas:wip-mgr-respawn branch Jun 8, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment