-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Framework shutdown race #117
Conversation
@saschazelzer What was the root cause of the race condition? |
If client code creates a framework instance, then calls This happens quite often in our unit tests. The second major issue was that Here is the complete tsan output for the race condition:
|
@saschazelzer, still reviewing... |
auto fwCtx = ctx.get(); | ||
std::shared_ptr<CoreBundleContext> holder(std::make_shared<CoreBundleContextHolder>(std::move(ctx)), fwCtx); | ||
holder->SetThis(holder); | ||
holder->systemBundle->Shutdown(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following is not caused by these changes, however I've noticed a different issue in the same area as these changes which I'd like to discuss.
I've run into a crash on Windows when a Framework
instance is stored in a DLL as a global. On static destruction (assuming Framework::Stop()
hasn't already been called), the std::thread
creation inside FrameworkPrivate::Shutdown
will throw an exception. This happens because while in DllMain, you shouldn't do much of anything (as stressed in Windows documentation), including spawning threads. Of course this only occurs if no one calls Framework::Stop
prior to static destruction.
While I don't want to condone the over-use of global variables/singletons, I still think its worth it to try and make sure static destruction occurs smoothly. Within our use at the MathWorks we only want a single Framework instance for the entire process (for now).
What are your thoughts on making a "synchronous shutdown" which is only called during implicit Framework destruction (i.e. when a Framework instance goes out of scope)? Explicit calls to Framework::Stop()
would still be asynchronous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was aware of the limitations while executing code in DllMain
. That was one reason why I was happy to move away from registering modules and calling their activators during static initialization of shared libraries.
You have a valid point here and I think a synchronous shut down makes sense in this case (we are waiting for it to complete anyways). But we still cannot ensure that client code that is executed in Activator::Stop()
functions (executed during framework stop) plays by the rules for DllMain
.
So
- I agree on "synchronous shutdown" during implicit framework destruction
- Question: Should we ensure that all CppMicroServices bundles can be stopped safely in the context of 1. ?
- This looks like a good topic for the "Best Practices" document
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Should we ensure that all CppMicroServices bundles can be stopped safely in the context of 1. ?
Yes, I think this has to be done. Minimally as long as we guarantee no threads are created by CppMicroServices (in the context of 1.) that is a good first step.
This looks like a good topic for the "Best Practices" document
I agree. I was planning on adding it to the doc.
We should stress that the framework should be stopped explicitly.
I was looking at how to make the framework shutdown synchronous but unfortunately, it is a bit involved. It is basically half of the outstanding task to make the "single threaded" build configuration work again (The other half is the framework startup). During shutdown, there is not only the shutdown thread itself, but bundle events that are send during bundle stop actions are also send using a different thread (to avoid locking issues). This makes such a change non-trivial and I don't want to rush it. I guess it would take a few more days to get it working and have it fully tested. |
I agree, I don't want to rush it. As a workaround, while synchronous framework shutdown is implemented, clients can ensure that they explicitly stop the framework. |
Yes, or use a non-static |
I this still being reviewed? Any questions I can help with? |
Sorry, I haven't been able to get back to reviewing this yet. Its near the top of my todo list. I left off trying to wrap my head around how |
The The advantage here is that although we are already destructing the This way the |
Thanks for the explanation, it helps. I think a test which minimally tests that event listeners which start the Framework during framework stop produce the expected result would be a good idea. |
You are right, I will add a test. |
…to framework-shutdown-race # Conflicts: # core/src/util/usFramework.cpp
It is you good you insisted on a test. It revealed a bug in our event classes. |
Use an internal data structure to hold on to bundle event data, avoiding holding on to Bundle and CoreBundleContext shared pointers. This restores the original behavior (the framework itself must not hold on to such data in bundle threads - it could extend the framework's lifetime and /or trigger its shutdown from that thread).
I think the last commit finally fixes this issue. If there are no objections, it would be good to merge this for 3.0 because the issue causes test failures for all PRs and pushed branches. |
Abhinay has some comments that he'll add. For myself; +1 to merge. |
return; | ||
} | ||
|
||
// Create a new CoreBundleContext holder, in case some event listener |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a test point for the use case mentioned in this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added test case already relies on that code path. There is no standard way to query the Bundle
object if it internally holds a "new" CoreBundleContextHolder
though. The test assumes that this just "works", otherwise the OS should throw segmentation faults (which it did) when we try to access an invalid pointer to the CoreBundleContext
in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the restart is essentially the same as what would happen in the listener. No more concerns from me ...
+1 for merge |
Thanks for your input! |
@CppMicroServices/developers please review.