Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vm-node] Deadlock happened while resetting #18

Open
likebreath opened this issue Feb 20, 2017 · 2 comments
Open

[vm-node] Deadlock happened while resetting #18

likebreath opened this issue Feb 20, 2017 · 2 comments
Assignees
Labels

Comments

@likebreath
Copy link
Collaborator

likebreath commented Feb 20, 2017

A deadlock of vm-node will be triggered when the reset from itself (as a result of QEMU crash) and the reset from dispatch (as a result of finishing the current target test) happen at the same time.

This deadlock will also hold dispatch, as dispatch is waiting for the communication with vm-node.

@likebreath likebreath added the bug label Feb 20, 2017
@likebreath likebreath self-assigned this Feb 20, 2017
@likebreath
Copy link
Collaborator Author

VM-NODE Output:

leaving: NextTest
entering: Testing
+++++++++++++++++++++++++++++++++++++
Exception Caught:
/home/chenbo/crete/crete-dev/lib/cluster/vm_node_fsm.cpp(1026): Throw in function bool crete::cluster::node::vm::fsm::QemuFSM_::is_finished::operator()(const EVT &, FSM &, SourceState &, TargetState &) [EVT = crete::cluster::node::vm::ev::poll, FSM = boost::msm::back::state_machine<crete::cluster::node::vm::fsm::QemuFSM_, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_>, SourceState = crete::cluster::node::vm::fsm::QemuFSM_::Testing, TargetState = crete::cluster::node::vm::fsm::QemuFSM_::StoreTrace]
Dynamic exception type: boost::exception_detail::clone_implcrete::cluster::node::vm::VMException
std::exception::what: std::exception
[crete::err::tag_process_exited*] = pid_
Node: VM
Target: /home/test/auto_configs/auto-coreutils-0131/auto.join.328.xml
VM dir: "vm/1"
inputs_auto.join.328.xml_crash_848.bin being dumped
VMException thrown.
++++++++++++++++++++++++++++
fsm.dispatch_options_.mode.distributed
fsm.dispatch_options_.mode.distributed
leaving: Valid
entering: Error
pushing error!
reset()
leaving: Active
entering: Terminated
entering: QemuFSM_
entering: Start
entering: Active
entering: Valid
ev.dispatch_options_.mode.distributed
ev.dispatch_options_.mode.distributed
leaving: Start
entering: ValidateImage

GDB backtrace:

Thread 2 (Thread 0x7f6ebb711700 (LWP 13219)):
0 crete::ScopeLockcrete::cluster::VMNode::~ScopeLock (this=0x7ffe07f00da8)
at /home/chenbo/crete/crete-dev/lib/include/crete/atomic_guard.h:39
1 0x00007f6ec1a00fdb in crete::cluster::transmit_guest_data (node=..., client=...) at /home/chenbo/crete/crete-dev/lib/cluster/vm_node.cpp:353
2 0x00007f6ec1a007c1 in crete::cluster::process (node=..., request=...) at /home/chenbo/crete/crete-dev/lib/cluster/vm_node.cpp:324
3 0x0000000000532d22 in crete::cluster::NodeDrivercrete::cluster::VMNode::run_listener (this=0x7ffe07f00d68)
at /home/chenbo/crete/crete-dev/lib/include/crete/cluster/node_driver.h:151
4 0x0000000000546152 in std::_Mem_fn<void (crete::cluster::NodeDrivercrete::cluster::VMNode::)()>::operator()<, void>(crete::cluster::NodeDrivercrete::cluster::VMNode) const (this=0x7f6ebb710de0, __object=0x7ffe07f00d68)
at /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/functional:601
5 0x00000000005460c3 in std::_Bind<std::_Mem_fn<void (crete::cluster::NodeDrivercrete::cluster::VMNode::)()> (crete::cluster::NodeDrivercrete::cluster::VMNode)>::__call<void, , 0ul>(std::tuple<>&&, std::_Index_tuple<0ul>) (this=0x7f6ebb710de0,
__args=<unknown type in /home/chenbo/crete/debug/bin/crete-vm-node, CU 0x0, DIE 0xa1887>)
at /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/functional:1295
6 0x0000000000545f66 in std::_Bind<std::_Mem_fn<void (crete::cluster::NodeDrivercrete::cluster::VMNode::)()> (crete::cluster::NodeDrivercrete::cluster::VMNode)>::operator()<, void>() (this=0x7f6ebb710de0)
at /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/functional:1353
7 0x0000000000545e2f in operator()(std::_Bind<std::_Mem_fn<void (crete::cluster::NodeDrivercrete::cluster::VMNode::)()> (crete::cluster::NodeDrivercrete::cluster::VMNode )>) const (this=0x1cd6ca8, f=...) at /home/chenbo/crete/crete-dev/lib/include/crete/async_task.h:56
8 0x0000000000545de6 in boost::_bi::list1<boost::_bi::value<std::_Bind<std::_Mem_fn<void (crete::cluster::NodeDrivercrete::cluster::VMNode::
)()> (crete::cluster::NodeDrivercrete::cluster::VMNode
)> > >::operator()<crete::AsyncTask::AsyncTask<std::_Bind<std::_Mem_fn<void (crete::cluster::NodeDrivercrete::cluster::VMNode::)()> (crete::cluster::NodeDrivercrete::cluster::VMNode)>>(std::_Bind<std::_Mem_fn<void (crete::cluster::NodeDrivercrete::cluster::VMNode::)()> (crete::cluster::NodeDrivercrete::cluster::VMNode)>)::{lambda(std::_Bind<std::_Mem_fn<void (crete::cluster::NodeDrivercrete::cluster::VMNode::)()> (crete::cluster::NodeDrivercrete::cluster::VMNode)>)1}, boost::_bi::list0>(boost::_bi::type, crete::AsyncTask::AsyncTask<std::_Bind<std::_Mem_fn<void (crete::cluster::NodeDrivercrete::cluster::VMNode::)()> (crete::cluster::NodeDrivercrete::cluster::VMNode)>>(std::_Bind<std::_Mem_fn<void (crete::cluster::NodeDrivercrete::cluster::VMNode::)()> (crete::cluster::NodeDrivercrete::cluster::VMNode)>)::{lambda(std::_Bind<std::_Mem_fn<void (crete::cluster::NodeDrivercrete::cluster::VMNode::)()> (crete::cluster::NodeDrivercrete::cluster::VMNode)>)1}&, boost::_bi::list0&, int) (this=0x1cd6cb0, f=..., a=...)
at /home/chenbo/crete/debug/lib/boost/boost-prefix/src/boost_1_59_0/boost/bind/bind.hpp:255
9 0x0000000000545d3f in boost::_bi::bind_t<void, crete::AsyncTask::AsyncTask<std::_Bind<std::_Mem_fn<void (crete::cluster::NodeDrivercrete::cluster::VMNode::)()> (crete::cluster::NodeDrivercrete::cluster::VMNode)>>(std::_Bind<std::_Mem_fn<void (crete::cluster::NodeDrivercrete::cluster::VMNode::)()> (crete::cluster::NodeDrivercrete::cluster::VMNode)>)::{lambda(std::_Bind<std::_Mem_fn<void (crete::cluster::NodeDrivercrete::cluster::VMNode::)()> (crete::cluster::NodeDrivercrete::cluster::VMNode)>)1}, boost::_bi::list1<boost::_bi::value<std::_Bind<std::_Mem_fn<void (crete::cluster::NodeDrivercrete::cluster::VMNode::)()> (crete::cluster::NodeDrivercrete::cluster::VMNode)> > > >::operator()() (
this=0x1cd6ca8) at /home/chenbo/crete/debug/lib/boost/boost-prefix/src/boost_1_59_0/boost/bind/bind.hpp:895
10 0x000000000054503c in boost::detail::thread_data<boost::_bi::bind_t<void, crete::AsyncTask::AsyncTask<std::_Bind<std::_Mem_fn<void (crete::cluster::NodeDrivercrete::cluster::VMNode::)()> (crete::cluster::NodeDrivercrete::cluster::VMNode)>>(std::_Bind<std::_Mem_fn<void (crete::cluster::NodeDrivercrete::cluster::VMNode::)()> (crete::cluster::NodeDrivercrete::cluster::VMNode)>)::{lambda(std::_Bind<std::_Mem_fn<void (crete::cluster::NodeDrivercrete::cluster::VMNode::)()> (crete::cluster::NodeDrivercrete::cluster::VMNode)>)1}, boost::_bi::list1<boost::_bi::val---Type to continue, or q to quit---
ue<std::_Bind<std::_Mem_fn<void (crete::cluster::NodeDrivercrete::cluster::VMNode::)()> (crete::cluster::NodeDrivercrete::cluster::VMNode)> > > > >::run() (this=0x1cd6af0) at /home/chenbo/crete/debug/lib/boost/boost-prefix/src/boost_1_59_0/boost/thread/detail/thread.hpp:116
11 0x00007f6ec082b42f in thread_proxy ()
from /home/chenbo/crete/debug/lib/boost/boost-prefix/src/boost_1_59_0/stage/lib/libboost_thread.so.1.59.0
12 0x00007f6ebde19184 in start_thread (arg=0x7f6ebb711700) at pthread_create.c:312
13 0x00007f6ebe12937d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 1 (Thread 0x7f6ec20d8780 (LWP 13218)):
0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
1 0x00007f6ec08306eb in boost::condition_variable::wait(boost::unique_lockboost::mutex&) ()
from /home/chenbo/crete/debug/lib/boost/boost-prefix/src/boost_1_59_0/stage/lib/libboost_thread.so.1.59.0
2 0x00007f6ec082bc7c in boost::thread::join_noexcept() ()
from /home/chenbo/crete/debug/lib/boost/boost-prefix/src/boost_1_59_0/stage/lib/libboost_thread.so.1.59.0
3 0x0000000000533951 in boost::thread::join (this=0x7ffe07f00cd0)
at /home/chenbo/crete/debug/lib/boost/boost-prefix/src/boost_1_59_0/boost/thread/detail/thread.hpp:765
4 0x00000000005337d7 in crete::AsyncTask::~AsyncTask (this=0x7ffe07f00cd0) at /home/chenbo/crete/crete-dev/lib/include/crete/async_task.h:103
5 0x00000000004dd435 in crete::cluster::NodeDrivercrete::cluster::VMNode::run_all (this=0x7ffe07f00d68)
at /home/chenbo/crete/crete-dev/lib/include/crete/cluster/node_driver.h:80
6 0x00000000004dc3e2 in crete::cluster::VMNodeUI::run (this=0x7ffe07f01280)
at /home/chenbo/crete/crete-dev/back-end/manager/vm-node/node_vm.cpp:48
7 0x00000000004dc4c8 in main (argc=3, argv=0x7ffe07f014a8) at /home/chenbo/crete/crete-dev/back-end/manager/vm-node/node_vm.cpp:139

@likebreath
Copy link
Collaborator Author

Two running thread of vm-node:

Thread 1 (main thread): it returned from node_driver.h:79 (run_node()) and is blocking at "node_driver.h:80", waiting for thread 2 to finish.
Thread 2 (communication thread with dispatch): it is active and looping at vm_node.cpp:346 (transmit_guest_data()).

The unclear part is why thread 1 returned from run_node(). It should only returned from this function if vm-node is going to finish.

Also, a race condition on VMNode::vms_ is confirmed:
Thread 1 accesses it at vm_node.cpp:72;
Thread 2 accesses it at vm_node.cpp:258.

Note: all source lines mentioned are based on b88add3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant