Framework shutdown race #117

saschazelzer · 2016-08-05T15:41:09Z

@CppMicroServices/developers please review.

jeffdiclemente · 2016-08-05T21:48:04Z

@saschazelzer What was the root cause of the race condition?

saschazelzer · 2016-08-05T23:28:09Z

If client code creates a framework instance, then calls Stop() and then the framework instance is destroyed (e.g. goes out of scope), a race condition exists when accessing the weak ptr member of the FrameworkPrivate class (inherited from std::enable_shared_from_this) from the shutdown thread (reading) and when destructing the FrameworkPrivate class (writing, destroying the weak ptr member) in the thread that called Stop().

This happens quite often in our unit tests.

The second major issue was that shared_from_this() for the CoreBundleContext class was called (when calling MakeBundle for passing a Bundle instance to events) implicitly from its destructor, which is not well defined.

Here is the complete tsan output for the race condition:

9: Test command: /home/sascha/builds/CppMicroServices-dev-shared-tsan-debug/bin/usCoreTestDriver "usFrameworkTest"
9: Test timeout computed to be: 9.99988e+06
9: Test Framework instantiation [PASSED]
9: Test for default threading option [PASSED]
9: Test for empty default base storage property [PASSED]
9: Test default diagnostic logging [PASSED]
9: Test Framework instantiation with custom diagnostic logger [PASSED]
9: Test installation of library TestBundleA [PASSED]
9: Test that the default logger captured data. [PASSED]
9: Test Framework instantiation with custom configuration [PASSED]
9: Test Framework custom launch properties [PASSED]
9: Test Framework custom launch properties [PASSED]
9: Test Framework custom launch properties [PASSED]
9: Test Framework custom launch properties [PASSED]
9: Test Framework custom launch properties [PASSED]
9: Test for enabled diagnostic logging [PASSED]
9: Test for custom base storage path [PASSED]
9: Test for attempt to change threading option [PASSED]
9: Test Framework instantiation with custom diagnostic logger [PASSED]
9: ==================
9: WARNING: ThreadSanitizer: data race (pid=3836)
9: Read of size 8 at 0x7d640001b308 by thread T4:
9: #0 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::__weak_count<(__gnu_cxx::_Lock_policy)2> const&) /usr/include/c++/6.1.1/bits/shared_ptr_base.h:827 (libCppMicroServices.so.2.99.0+0x000000087bf3)
9: #1 std::__shared_ptr<us::CoreBundleContext, (__gnu_cxx::_Lock_policy)2>::__shared_ptrus::CoreBundleContext(std::__weak_ptr<us::CoreBundleContext, (__gnu_cxx::_Lock_policy)2> const&) /usr/include/c++/6.1.1/bits/shared_ptr_base.h:952 (libCppMicroServices.so.2.99.0+0x000000087bf3)
9: #2 std::shared_ptrus::CoreBundleContext::shared_ptrus::CoreBundleContext(std::weak_ptrus::CoreBundleContext const&) /usr/include/c++/6.1.1/bits/shared_ptr.h:251 (libCppMicroServices.so.2.99.0+0x000000087bf3)
9: #3 std::enable_shared_from_thisus::CoreBundleContext::shared_from_this() /usr/include/c++/6.1.1/bits/shared_ptr.h:573 (libCppMicroServices.so.2.99.0+0x000000087bf3)
9: #4 us::Bundle::Bundle(std::shared_ptrus::BundlePrivate const&) /home/sascha/git/code/CppMicroServices/core/src/bundle/usBundle.cpp:107 (libCppMicroServices.so.2.99.0+0x000000087bf3)
9: #5 us::MakeBundle(std::shared_ptrus::BundlePrivate const&) /home/sascha/git/code/CppMicroServices/core/src/bundle/usBundlePrivate.cpp:51 (libCppMicroServices.so.2.99.0+0x0000000a7378)
9: #6 us::FrameworkPrivate::Shutdown0(bool, bool) /home/sascha/git/code/CppMicroServices/core/src/util/usFrameworkPrivate.cpp:189 (libCppMicroServices.so.2.99.0+0x000000027f90)
9: #7 std::thread::_State_impl<std::_Bind_simple<std::_Bind<std::Mem_fn<void (us::FrameworkPrivate::*)(bool, bool)> (us::FrameworkPrivate, bool, bool)> ()> >::_M_run() (libCppMicroServices.so.2.99.0+0x000000028806)
9: #8 (libstdc++.so.6+0x0000000baaae)
9:
9: Previous write of size 8 at 0x7d640001b308 by main thread:
9: #0 std::__weak_count<(__gnu_cxx::_Lock_policy)2>::operator=(std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&) /usr/include/c++/6.1.1/bits/shared_ptr_base.h:766 (libCppMicroServices.so.2.99.0+0x0000000e06ee)
9: #1 std::__weak_ptr<us::CoreBundleContext, (__gnu_cxx::_Lock_policy)2>::M_assign(us::CoreBundleContext, std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&) /usr/include/c++/6.1.1/bits/shared_ptr_base.h:1474 (libCppMicroServices.so.2.99.0+0x0000000e06ee)
9: #2 void std::enable_shared_from_thisus::CoreBundleContext::M_weak_assignus::CoreBundleContext(us::CoreBundleContext, std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&) const /usr/include/c++/6.1.1/bits/shared_ptr.h:583 (libCppMicroServices.so.2.99.0+0x0000000e06ee)
9: #3 void std::__enable_shared_from_this_helper<us::CoreBundleContext, us::CoreBundleContext>(std::__shared_count<(_gnu_cxx::Lock_policy)2> const&, std::enable_shared_from_thisus::CoreBundleContext const, us::CoreBundleContext const) /usr/include/c++/6.1.1/bits/shared_ptr.h:601 (libCppMicroServices.so.2.99.0+0x0000000e06ee)
9: #4 _shared_ptr<us::CoreBundleContext, us::CoreBundleContext::~CoreBundleContext()::<lambda(us::CoreBundleContext)> > /usr/include/c++/6.1.1/bits/shared_ptr_base.h:899 (libCppMicroServices.so.2.99.0+0x0000000e06ee)
9: #5 shared_ptr<us::CoreBundleContext, us::CoreBundleContext::CoreBundleContext()::<lambda(us::CoreBundleContext*)> > /usr/include/c++/6.1.1/bits/shared_ptr.h:134 (libCppMicroServices.so.2.99.0+0x0000000e06ee)
9: #6 us::CoreBundleContext::CoreBundleContext() /home/sascha/git/code/CppMicroServices/core/src/bundle/usCoreBundleContext.cpp:83 (libCppMicroServices.so.2.99.0+0x0000000e06ee)
9: #7 std::_Sp_counted_ptr<us::CoreBundleContext*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() /usr/include/c++/6.1.1/bits/shared_ptr_base.h:372 (libCppMicroServices.so.2.99.0+0x000000024339)
9: #8 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /usr/include/c++/6.1.1/bits/shared_ptr_base.h:150 (libCppMicroServices.so.2.99.0+0x0000000861cc)
9: #9 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count() /usr/include/c++/6.1.1/bits/shared_ptr_base.h:662 (libCppMicroServices.so.2.99.0+0x0000000861cc)
9: #10 std::__shared_ptr<us::CoreBundleContext, (__gnu_cxx::_Lock_policy)2>::__shared_ptr() /usr/include/c++/6.1.1/bits/shared_ptr_base.h:928 (libCppMicroServices.so.2.99.0+0x0000000861cc)
9: #11 std::shared_ptrus::CoreBundleContext::~shared_ptr() /usr/include/c++/6.1.1/bits/shared_ptr.h:93 (libCppMicroServices.so.2.99.0+0x0000000861cc)
9: #12 us::Bundle::~Bundle() /home/sascha/git/code/CppMicroServices/core/src/bundle/usBundle.cpp:111 (libCppMicroServices.so.2.99.0+0x0000000861cc)
9: #13 us::Framework::~Framework() /home/sascha/git/code/CppMicroServices/core/include/usFramework.h:63 (usCoreTestDriver+0x00000044b058)
9: #14 TestCustomLogSink /home/sascha/git/code/CppMicroServices/core/test/usFrameworkTest.cpp:148 (usCoreTestDriver+0x00000044b058)
9: #15 usFrameworkTest(int, char**) /home/sascha/git/code/CppMicroServices/core/test/usFrameworkTest.cpp:473 (usCoreTestDriver+0x00000044feba)
9: #16 main /home/sascha/builds/CppMicroServices-dev-shared-tsan-debug/core/test/usCoreTestDriver.cpp:285 (usCoreTestDriver+0x00000041b49a)
9:
9: Location is heap block of size 1200 at 0x7d640001b300 allocated by main thread:
9: #0 operator new(unsigned long) (libtsan.so.0+0x000000069e83)
9: #1 us::FrameworkFactory::NewFramework(std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, us::Any, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, us::Any> > > const&, std::ostream*) /home/sascha/git/code/CppMicroServices/core/src/util/usFrameworkFactory.cpp:32 (libCppMicroServices.so.2.99.0+0x000000023d26)
9: #2 TestCustomLogSink /home/sascha/git/code/CppMicroServices/core/test/usFrameworkTest.cpp:148 (usCoreTestDriver+0x00000044af16)
9: #3 usFrameworkTest(int, char**) /home/sascha/git/code/CppMicroServices/core/test/usFrameworkTest.cpp:473 (usCoreTestDriver+0x00000044feba)
9: #4 main /home/sascha/builds/CppMicroServices-dev-shared-tsan-debug/core/test/usCoreTestDriver.cpp:285 (usCoreTestDriver+0x00000041b49a)
9:
9: Thread T4 (tid=3841, running) created by main thread at:
9: #0 pthread_create (libtsan.so.0+0x000000028380)
9: #1 std::thread::M_start_thread(std::unique_ptr<std::thread::State, std::default_deletestd::thread::_State >, void ()()) (libstdc++.so.6+0x0000000badc4)
9: #2 us::Framework::Stop() /home/sascha/git/code/CppMicroServices/core/src/util/usFramework.cpp:123 (libCppMicroServices.so.2.99.0+0x0000000206aa)
9: #3 TestCustomLogSink /home/sascha/git/code/CppMicroServices/core/test/usFrameworkTest.cpp:152 (usCoreTestDriver+0x00000044b036)
9: #4 usFrameworkTest(int, char*) /home/sascha/git/code/CppMicroServices/core/test/usFrameworkTest.cpp:473 (usCoreTestDriver+0x00000044feba)
9: #5 main /home/sascha/builds/CppMicroServices-dev-shared-tsan-debug/core/test/usCoreTestDriver.cpp:285 (usCoreTestDriver+0x00000041b49a)
9:
9: SUMMARY: ThreadSanitizer: data race /usr/include/c++/6.1.1/bits/shared_ptr_base.h:827 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::__weak_count<(__gnu_cxx::_Lock_policy)2> const&)
9: ==================

jeffdiclemente · 2016-08-08T18:05:00Z

@saschazelzer, still reviewing...

jeffdiclemente · 2016-08-08T19:54:33Z

core/src/util/usFrameworkFactory.cpp

+    auto fwCtx = ctx.get();
+    std::shared_ptr<CoreBundleContext> holder(std::make_shared<CoreBundleContextHolder>(std::move(ctx)), fwCtx);
+    holder->SetThis(holder);
+    holder->systemBundle->Shutdown(false);


The following is not caused by these changes, however I've noticed a different issue in the same area as these changes which I'd like to discuss.

I've run into a crash on Windows when a Framework instance is stored in a DLL as a global. On static destruction (assuming Framework::Stop() hasn't already been called), the std::thread creation inside FrameworkPrivate::Shutdown will throw an exception. This happens because while in DllMain, you shouldn't do much of anything (as stressed in Windows documentation), including spawning threads. Of course this only occurs if no one calls Framework::Stop prior to static destruction.

While I don't want to condone the over-use of global variables/singletons, I still think its worth it to try and make sure static destruction occurs smoothly. Within our use at the MathWorks we only want a single Framework instance for the entire process (for now).

What are your thoughts on making a "synchronous shutdown" which is only called during implicit Framework destruction (i.e. when a Framework instance goes out of scope)? Explicit calls to Framework::Stop() would still be asynchronous.

I was aware of the limitations while executing code in DllMain. That was one reason why I was happy to move away from registering modules and calling their activators during static initialization of shared libraries.

You have a valid point here and I think a synchronous shut down makes sense in this case (we are waiting for it to complete anyways). But we still cannot ensure that client code that is executed in Activator::Stop() functions (executed during framework stop) plays by the rules for DllMain.

So

I agree on "synchronous shutdown" during implicit framework destruction

Question: Should we ensure that all CppMicroServices bundles can be stopped safely in the context of 1. ?

This looks like a good topic for the "Best Practices" document

Question: Should we ensure that all CppMicroServices bundles can be stopped safely in the context of 1. ?

Yes, I think this has to be done. Minimally as long as we guarantee no threads are created by CppMicroServices (in the context of 1.) that is a good first step.

This looks like a good topic for the "Best Practices" document

I agree. I was planning on adding it to the doc.
We should stress that the framework should be stopped explicitly.

saschazelzer · 2016-08-09T12:30:20Z

I was looking at how to make the framework shutdown synchronous but unfortunately, it is a bit involved. It is basically half of the outstanding task to make the "single threaded" build configuration work again (The other half is the framework startup).

During shutdown, there is not only the shutdown thread itself, but bundle events that are send during bundle stop actions are also send using a different thread (to avoid locking issues). This makes such a change non-trivial and I don't want to rush it. I guess it would take a few more days to get it working and have it fully tested.

jeffdiclemente · 2016-08-09T12:38:23Z

I agree, I don't want to rush it.

As a workaround, while synchronous framework shutdown is implemented, clients can ensure that they explicitly stop the framework.

saschazelzer · 2016-08-09T12:40:42Z

Yes, or use a non-static Framework instance.

saschazelzer · 2016-08-19T05:47:40Z

I this still being reviewed? Any questions I can help with?

jeffdiclemente · 2016-08-19T11:15:43Z

Sorry, I haven't been able to get back to reviewing this yet. Its near the top of my todo list.

I left off trying to wrap my head around how void SetThis(const std::shared_ptr<CoreBundleContext>& self); is being used (i.e. its purpose) in this solution.

saschazelzer · 2016-08-24T05:48:06Z

The CoreBundleContext used to inherit from std::enable_shared_from_this and hence held a weak ptr to itself. This is not the case anymore, instead we use a more flexible approach. The SetThis method is similar to what a std::shared_ptr constructor calls when the class being pointed to inherits from enable_shared_from_this.

The advantage here is that although we are already destructing the CoreBundleContextHolder because all shared ptrs pointing to CoreBundleContext (using the alias constructor) went away, we can still create a new shared ptr pointing to the not yet destructed CoreBundleContext instance, but managed by a new CoreBundleContextHolder instance. We implant this new shared ptr in a thread-safe way into the CoreBundleContext using SetThis such that shared_from_this() (same name, but also implemented on our own instead of using enabled_shared_from_this) works as expected.

This way the CoreBundleContext can be used normally in event listeners during framework shutdown. The listeners could even start the framework again.

jeffdiclemente · 2016-08-24T14:15:48Z

Thanks for the explanation, it helps.

I think a test which minimally tests that event listeners which start the Framework during framework stop produce the expected result would be a good idea.

saschazelzer · 2016-08-24T16:50:54Z

You are right, I will add a test.

…to framework-shutdown-race # Conflicts: # core/src/util/usFramework.cpp

saschazelzer · 2016-08-25T20:50:33Z

It is you good you insisted on a test. It revealed a bug in our event classes.

Use an internal data structure to hold on to bundle event data, avoiding holding on to Bundle and CoreBundleContext shared pointers. This restores the original behavior (the framework itself must not hold on to such data in bundle threads - it could extend the framework's lifetime and /or trigger its shutdown from that thread).

saschazelzer · 2016-08-28T15:15:40Z

I think the last commit finally fixes this issue. If there are no objections, it would be good to merge this for 3.0 because the issue causes test failures for all PRs and pushed branches.

jeffdiclemente · 2016-08-29T18:24:14Z

Abhinay has some comments that he'll add.

For myself; +1 to merge.
Thanks for adding the test!

karthikreddy09 · 2016-08-29T18:28:13Z

framework/src/util/FrameworkFactory.cpp

+      return;
+    }
+
+    // Create a new CoreBundleContext holder, in case some event listener


Could we add a test point for the use case mentioned in this comment.

The added test case already relies on that code path. There is no standard way to query the Bundle object if it internally holds a "new" CoreBundleContextHolder though. The test assumes that this just "works", otherwise the OS should throw segmentation faults (which it did) when we try to access an invalid pointer to the CoreBundleContext in the test.

I see the restart is essentially the same as what would happen in the listener. No more concerns from me ...

karthikreddy09 · 2016-08-29T18:46:20Z

+1 for merge

saschazelzer · 2016-08-29T20:30:04Z

Thanks for your input!

saschazelzer added 2 commits August 5, 2016 15:53

Enable move operators for the Framework class.

55ee11e

Avoid race in shared_from_this and do not call it from the destructor.

a933726

jeffdiclemente reviewed Aug 8, 2016
View reviewed changes

saschazelzer added 2 commits August 25, 2016 10:42

Merge remote-tracking branch 'origin/121-system-bundle-not-usable' in…

3124c0f

…to framework-shutdown-race # Conflicts: # core/src/util/usFramework.cpp

Fix event classes to implicitly hold on to the framework; added test

ca9bf81

saschazelzer self-assigned this Aug 28, 2016

saschazelzer added this to the Release 3.0 milestone Aug 28, 2016

saschazelzer added the bug label Aug 28, 2016

karthikreddy09 reviewed Aug 29, 2016
View reviewed changes

saschazelzer merged commit 72b9857 into development Aug 29, 2016

saschazelzer deleted the framework-shutdown-race branch August 29, 2016 20:30

jeffdiclemente mentioned this pull request Aug 30, 2016

Make implicit framework shutdown synchronous #134

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Framework shutdown race #117

Framework shutdown race #117

saschazelzer commented Aug 5, 2016

jeffdiclemente commented Aug 5, 2016

saschazelzer commented Aug 5, 2016

jeffdiclemente commented Aug 8, 2016

jeffdiclemente Aug 8, 2016

saschazelzer Aug 8, 2016

jeffdiclemente Aug 9, 2016

saschazelzer commented Aug 9, 2016

jeffdiclemente commented Aug 9, 2016

saschazelzer commented Aug 9, 2016

saschazelzer commented Aug 19, 2016

jeffdiclemente commented Aug 19, 2016

saschazelzer commented Aug 24, 2016

jeffdiclemente commented Aug 24, 2016

saschazelzer commented Aug 24, 2016

saschazelzer commented Aug 25, 2016

saschazelzer commented Aug 28, 2016

jeffdiclemente commented Aug 29, 2016

karthikreddy09 Aug 29, 2016

saschazelzer Aug 29, 2016

karthikreddy09 Aug 29, 2016

karthikreddy09 commented Aug 29, 2016

saschazelzer commented Aug 29, 2016

Framework shutdown race #117

Framework shutdown race #117

Conversation

saschazelzer commented Aug 5, 2016

jeffdiclemente commented Aug 5, 2016

saschazelzer commented Aug 5, 2016

jeffdiclemente commented Aug 8, 2016

jeffdiclemente Aug 8, 2016

Choose a reason for hiding this comment

saschazelzer Aug 8, 2016

Choose a reason for hiding this comment

jeffdiclemente Aug 9, 2016

Choose a reason for hiding this comment

saschazelzer commented Aug 9, 2016

jeffdiclemente commented Aug 9, 2016

saschazelzer commented Aug 9, 2016

saschazelzer commented Aug 19, 2016

jeffdiclemente commented Aug 19, 2016

saschazelzer commented Aug 24, 2016

jeffdiclemente commented Aug 24, 2016

saschazelzer commented Aug 24, 2016

saschazelzer commented Aug 25, 2016

saschazelzer commented Aug 28, 2016

jeffdiclemente commented Aug 29, 2016

karthikreddy09 Aug 29, 2016

Choose a reason for hiding this comment

saschazelzer Aug 29, 2016

Choose a reason for hiding this comment

karthikreddy09 Aug 29, 2016

Choose a reason for hiding this comment

karthikreddy09 commented Aug 29, 2016

saschazelzer commented Aug 29, 2016