Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Framework shutdown race #117

Merged
merged 5 commits into from
Aug 29, 2016
Merged

Framework shutdown race #117

merged 5 commits into from
Aug 29, 2016

Conversation

saschazelzer
Copy link
Member

@CppMicroServices/developers please review.

@jeffdiclemente
Copy link
Member

@saschazelzer What was the root cause of the race condition?

@saschazelzer
Copy link
Member Author

If client code creates a framework instance, then calls Stop() and then the framework instance is destroyed (e.g. goes out of scope), a race condition exists when accessing the weak ptr member of the FrameworkPrivate class (inherited from std::enable_shared_from_this) from the shutdown thread (reading) and when destructing the FrameworkPrivate class (writing, destroying the weak ptr member) in the thread that called Stop().

This happens quite often in our unit tests.

The second major issue was that shared_from_this() for the CoreBundleContext class was called (when calling MakeBundle for passing a Bundle instance to events) implicitly from its destructor, which is not well defined.

Here is the complete tsan output for the race condition:

9: Test command: /home/sascha/builds/CppMicroServices-dev-shared-tsan-debug/bin/usCoreTestDriver "usFrameworkTest"
9: Test timeout computed to be: 9.99988e+06
9: Test Framework instantiation [PASSED]
9: Test for default threading option [PASSED]
9: Test for empty default base storage property [PASSED]
9: Test default diagnostic logging [PASSED]
9: Test Framework instantiation with custom diagnostic logger [PASSED]
9: Test installation of library TestBundleA [PASSED]
9: Test that the default logger captured data. [PASSED]
9: Test Framework instantiation with custom configuration [PASSED]
9: Test Framework custom launch properties [PASSED]
9: Test Framework custom launch properties [PASSED]
9: Test Framework custom launch properties [PASSED]
9: Test Framework custom launch properties [PASSED]
9: Test Framework custom launch properties [PASSED]
9: Test for enabled diagnostic logging [PASSED]
9: Test for custom base storage path [PASSED]
9: Test for attempt to change threading option [PASSED]
9: Test Framework instantiation with custom diagnostic logger [PASSED]
9: ==================
9: WARNING: ThreadSanitizer: data race (pid=3836)
9: Read of size 8 at 0x7d640001b308 by thread T4:
9: #0 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::__weak_count<(__gnu_cxx::_Lock_policy)2> const&) /usr/include/c++/6.1.1/bits/shared_ptr_base.h:827 (libCppMicroServices.so.2.99.0+0x000000087bf3)
9: #1 std::__shared_ptr<us::CoreBundleContext, (__gnu_cxx::_Lock_policy)2>::__shared_ptrus::CoreBundleContext(std::__weak_ptr<us::CoreBundleContext, (__gnu_cxx::_Lock_policy)2> const&) /usr/include/c++/6.1.1/bits/shared_ptr_base.h:952 (libCppMicroServices.so.2.99.0+0x000000087bf3)
9: #2 std::shared_ptrus::CoreBundleContext::shared_ptrus::CoreBundleContext(std::weak_ptrus::CoreBundleContext const&) /usr/include/c++/6.1.1/bits/shared_ptr.h:251 (libCppMicroServices.so.2.99.0+0x000000087bf3)
9: #3 std::enable_shared_from_thisus::CoreBundleContext::shared_from_this() /usr/include/c++/6.1.1/bits/shared_ptr.h:573 (libCppMicroServices.so.2.99.0+0x000000087bf3)
9: #4 us::Bundle::Bundle(std::shared_ptrus::BundlePrivate const&) /home/sascha/git/code/CppMicroServices/core/src/bundle/usBundle.cpp:107 (libCppMicroServices.so.2.99.0+0x000000087bf3)
9: #5 us::MakeBundle(std::shared_ptrus::BundlePrivate const&) /home/sascha/git/code/CppMicroServices/core/src/bundle/usBundlePrivate.cpp:51 (libCppMicroServices.so.2.99.0+0x0000000a7378)
9: #6 us::FrameworkPrivate::Shutdown0(bool, bool) /home/sascha/git/code/CppMicroServices/core/src/util/usFrameworkPrivate.cpp:189 (libCppMicroServices.so.2.99.0+0x000000027f90)
9: #7 std::thread::_State_impl<std::_Bind_simple<std::_Bind<std::Mem_fn<void (us::FrameworkPrivate::*)(bool, bool)> (us::FrameworkPrivate, bool, bool)> ()> >::_M_run() (libCppMicroServices.so.2.99.0+0x000000028806)
9: #8 (libstdc++.so.6+0x0000000baaae)
9:
9: Previous write of size 8 at 0x7d640001b308 by main thread:
9: #0 std::__weak_count<(__gnu_cxx::_Lock_policy)2>::operator=(std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&) /usr/include/c++/6.1.1/bits/shared_ptr_base.h:766 (libCppMicroServices.so.2.99.0+0x0000000e06ee)
9: #1 std::__weak_ptr<us::CoreBundleContext, (__gnu_cxx::_Lock_policy)2>::M_assign(us::CoreBundleContext, std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&) /usr/include/c++/6.1.1/bits/shared_ptr_base.h:1474 (libCppMicroServices.so.2.99.0+0x0000000e06ee)
9: #2 void std::enable_shared_from_thisus::CoreBundleContext::M_weak_assignus::CoreBundleContext(us::CoreBundleContext, std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&) const /usr/include/c++/6.1.1/bits/shared_ptr.h:583 (libCppMicroServices.so.2.99.0+0x0000000e06ee)
9: #3 void std::__enable_shared_from_this_helper<us::CoreBundleContext, us::CoreBundleContext>(std::__shared_count<(_gnu_cxx::Lock_policy)2> const&, std::enable_shared_from_thisus::CoreBundleContext const, us::CoreBundleContext const) /usr/include/c++/6.1.1/bits/shared_ptr.h:601 (libCppMicroServices.so.2.99.0+0x0000000e06ee)
9: #4 _shared_ptr<us::CoreBundleContext, us::CoreBundleContext::~CoreBundleContext()::<lambda(us::CoreBundleContext)> > /usr/include/c++/6.1.1/bits/shared_ptr_base.h:899 (libCppMicroServices.so.2.99.0+0x0000000e06ee)
9: #5 shared_ptr<us::CoreBundleContext, us::CoreBundleContext::CoreBundleContext()::<lambda(us::CoreBundleContext*)> > /usr/include/c++/6.1.1/bits/shared_ptr.h:134 (libCppMicroServices.so.2.99.0+0x0000000e06ee)
9: #6 us::CoreBundleContext::CoreBundleContext() /home/sascha/git/code/CppMicroServices/core/src/bundle/usCoreBundleContext.cpp:83 (libCppMicroServices.so.2.99.0+0x0000000e06ee)
9: #7 std::_Sp_counted_ptr<us::CoreBundleContext*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() /usr/include/c++/6.1.1/bits/shared_ptr_base.h:372 (libCppMicroServices.so.2.99.0+0x000000024339)
9: #8 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /usr/include/c++/6.1.1/bits/shared_ptr_base.h:150 (libCppMicroServices.so.2.99.0+0x0000000861cc)
9: #9 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::
__shared_count() /usr/include/c++/6.1.1/bits/shared_ptr_base.h:662 (libCppMicroServices.so.2.99.0+0x0000000861cc)
9: #10 std::__shared_ptr<us::CoreBundleContext, (__gnu_cxx::_Lock_policy)2>::
__shared_ptr() /usr/include/c++/6.1.1/bits/shared_ptr_base.h:928 (libCppMicroServices.so.2.99.0+0x0000000861cc)
9: #11 std::shared_ptrus::CoreBundleContext::~shared_ptr() /usr/include/c++/6.1.1/bits/shared_ptr.h:93 (libCppMicroServices.so.2.99.0+0x0000000861cc)
9: #12 us::Bundle::~Bundle() /home/sascha/git/code/CppMicroServices/core/src/bundle/usBundle.cpp:111 (libCppMicroServices.so.2.99.0+0x0000000861cc)
9: #13 us::Framework::~Framework() /home/sascha/git/code/CppMicroServices/core/include/usFramework.h:63 (usCoreTestDriver+0x00000044b058)
9: #14 TestCustomLogSink /home/sascha/git/code/CppMicroServices/core/test/usFrameworkTest.cpp:148 (usCoreTestDriver+0x00000044b058)
9: #15 usFrameworkTest(int, char**) /home/sascha/git/code/CppMicroServices/core/test/usFrameworkTest.cpp:473 (usCoreTestDriver+0x00000044feba)
9: #16 main /home/sascha/builds/CppMicroServices-dev-shared-tsan-debug/core/test/usCoreTestDriver.cpp:285 (usCoreTestDriver+0x00000041b49a)
9:
9: Location is heap block of size 1200 at 0x7d640001b300 allocated by main thread:
9: #0 operator new(unsigned long) (libtsan.so.0+0x000000069e83)
9: #1 us::FrameworkFactory::NewFramework(std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, us::Any, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, us::Any> > > const&, std::ostream*) /home/sascha/git/code/CppMicroServices/core/src/util/usFrameworkFactory.cpp:32 (libCppMicroServices.so.2.99.0+0x000000023d26)
9: #2 TestCustomLogSink /home/sascha/git/code/CppMicroServices/core/test/usFrameworkTest.cpp:148 (usCoreTestDriver+0x00000044af16)
9: #3 usFrameworkTest(int, char**) /home/sascha/git/code/CppMicroServices/core/test/usFrameworkTest.cpp:473 (usCoreTestDriver+0x00000044feba)
9: #4 main /home/sascha/builds/CppMicroServices-dev-shared-tsan-debug/core/test/usCoreTestDriver.cpp:285 (usCoreTestDriver+0x00000041b49a)
9:
9: Thread T4 (tid=3841, running) created by main thread at:
9: #0 pthread_create (libtsan.so.0+0x000000028380)
9: #1 std::thread::M_start_thread(std::unique_ptr<std::thread::State, std::default_deletestd::thread::_State >, void ()()) (libstdc++.so.6+0x0000000badc4)
9: #2 us::Framework::Stop() /home/sascha/git/code/CppMicroServices/core/src/util/usFramework.cpp:123 (libCppMicroServices.so.2.99.0+0x0000000206aa)
9: #3 TestCustomLogSink /home/sascha/git/code/CppMicroServices/core/test/usFrameworkTest.cpp:152 (usCoreTestDriver+0x00000044b036)
9: #4 usFrameworkTest(int, char
*) /home/sascha/git/code/CppMicroServices/core/test/usFrameworkTest.cpp:473 (usCoreTestDriver+0x00000044feba)
9: #5 main /home/sascha/builds/CppMicroServices-dev-shared-tsan-debug/core/test/usCoreTestDriver.cpp:285 (usCoreTestDriver+0x00000041b49a)
9:
9: SUMMARY: ThreadSanitizer: data race /usr/include/c++/6.1.1/bits/shared_ptr_base.h:827 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::__weak_count<(__gnu_cxx::_Lock_policy)2> const&)
9: ==================

@jeffdiclemente
Copy link
Member

@saschazelzer, still reviewing...

auto fwCtx = ctx.get();
std::shared_ptr<CoreBundleContext> holder(std::make_shared<CoreBundleContextHolder>(std::move(ctx)), fwCtx);
holder->SetThis(holder);
holder->systemBundle->Shutdown(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following is not caused by these changes, however I've noticed a different issue in the same area as these changes which I'd like to discuss.

I've run into a crash on Windows when a Framework instance is stored in a DLL as a global. On static destruction (assuming Framework::Stop() hasn't already been called), the std::thread creation inside FrameworkPrivate::Shutdown will throw an exception. This happens because while in DllMain, you shouldn't do much of anything (as stressed in Windows documentation), including spawning threads. Of course this only occurs if no one calls Framework::Stop prior to static destruction.

While I don't want to condone the over-use of global variables/singletons, I still think its worth it to try and make sure static destruction occurs smoothly. Within our use at the MathWorks we only want a single Framework instance for the entire process (for now).

What are your thoughts on making a "synchronous shutdown" which is only called during implicit Framework destruction (i.e. when a Framework instance goes out of scope)? Explicit calls to Framework::Stop() would still be asynchronous.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was aware of the limitations while executing code in DllMain. That was one reason why I was happy to move away from registering modules and calling their activators during static initialization of shared libraries.

You have a valid point here and I think a synchronous shut down makes sense in this case (we are waiting for it to complete anyways). But we still cannot ensure that client code that is executed in Activator::Stop() functions (executed during framework stop) plays by the rules for DllMain.

So

  1. I agree on "synchronous shutdown" during implicit framework destruction
  2. Question: Should we ensure that all CppMicroServices bundles can be stopped safely in the context of 1. ?
  3. This looks like a good topic for the "Best Practices" document

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Should we ensure that all CppMicroServices bundles can be stopped safely in the context of 1. ?

Yes, I think this has to be done. Minimally as long as we guarantee no threads are created by CppMicroServices (in the context of 1.) that is a good first step.

This looks like a good topic for the "Best Practices" document

I agree. I was planning on adding it to the doc.
We should stress that the framework should be stopped explicitly.

@saschazelzer
Copy link
Member Author

I was looking at how to make the framework shutdown synchronous but unfortunately, it is a bit involved. It is basically half of the outstanding task to make the "single threaded" build configuration work again (The other half is the framework startup).

During shutdown, there is not only the shutdown thread itself, but bundle events that are send during bundle stop actions are also send using a different thread (to avoid locking issues). This makes such a change non-trivial and I don't want to rush it. I guess it would take a few more days to get it working and have it fully tested.

@jeffdiclemente
Copy link
Member

I agree, I don't want to rush it.

As a workaround, while synchronous framework shutdown is implemented, clients can ensure that they explicitly stop the framework.

@saschazelzer
Copy link
Member Author

Yes, or use a non-static Framework instance.

@saschazelzer
Copy link
Member Author

I this still being reviewed? Any questions I can help with?

@jeffdiclemente
Copy link
Member

Sorry, I haven't been able to get back to reviewing this yet. Its near the top of my todo list.

I left off trying to wrap my head around how void SetThis(const std::shared_ptr<CoreBundleContext>& self); is being used (i.e. its purpose) in this solution.

@saschazelzer
Copy link
Member Author

The CoreBundleContext used to inherit from std::enable_shared_from_this and hence held a weak ptr to itself. This is not the case anymore, instead we use a more flexible approach. The SetThis method is similar to what a std::shared_ptr constructor calls when the class being pointed to inherits from enable_shared_from_this.

The advantage here is that although we are already destructing the CoreBundleContextHolder because all shared ptrs pointing to CoreBundleContext (using the alias constructor) went away, we can still create a new shared ptr pointing to the not yet destructed CoreBundleContext instance, but managed by a new CoreBundleContextHolder instance. We implant this new shared ptr in a thread-safe way into the CoreBundleContext using SetThis such that shared_from_this() (same name, but also implemented on our own instead of using enabled_shared_from_this) works as expected.

This way the CoreBundleContext can be used normally in event listeners during framework shutdown. The listeners could even start the framework again.

@jeffdiclemente
Copy link
Member

Thanks for the explanation, it helps.

I think a test which minimally tests that event listeners which start the Framework during framework stop produce the expected result would be a good idea.

@saschazelzer
Copy link
Member Author

You are right, I will add a test.

@saschazelzer
Copy link
Member Author

It is you good you insisted on a test. It revealed a bug in our event classes.

Use an internal data structure to hold on to bundle event data, avoiding
holding on to Bundle and CoreBundleContext shared pointers. This restores
the original behavior (the framework itself must not hold on to such data
in bundle threads - it could extend the framework's lifetime and /or
trigger its shutdown from that thread).
@saschazelzer saschazelzer self-assigned this Aug 28, 2016
@saschazelzer saschazelzer added this to the Release 3.0 milestone Aug 28, 2016
@saschazelzer
Copy link
Member Author

I think the last commit finally fixes this issue. If there are no objections, it would be good to merge this for 3.0 because the issue causes test failures for all PRs and pushed branches.

@jeffdiclemente
Copy link
Member

Abhinay has some comments that he'll add.

For myself; +1 to merge.
Thanks for adding the test!

return;
}

// Create a new CoreBundleContext holder, in case some event listener
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test point for the use case mentioned in this comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added test case already relies on that code path. There is no standard way to query the Bundle object if it internally holds a "new" CoreBundleContextHolder though. The test assumes that this just "works", otherwise the OS should throw segmentation faults (which it did) when we try to access an invalid pointer to the CoreBundleContext in the test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the restart is essentially the same as what would happen in the listener. No more concerns from me ...

@karthikreddy09
Copy link
Member

+1 for merge

@saschazelzer saschazelzer merged commit 72b9857 into development Aug 29, 2016
@saschazelzer
Copy link
Member Author

Thanks for your input!

@saschazelzer saschazelzer deleted the framework-shutdown-race branch August 29, 2016 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants