New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A race condition in ServiceReferenceBasePrivate causes a seg fault #202

Closed
jeffdiclemente opened this Issue Apr 20, 2017 · 6 comments

Comments

Projects
2 participants
@jeffdiclemente
Contributor

jeffdiclemente commented Apr 20, 2017

A race can occur when a client retrieves a service from a ServiceFactory using BundleContext::GetService(ref). The race leads to a seg fault which occurs as a result of trying to access a nullptr service object returned from an empty std::unordered_map of cached service objects, maintained by CppMicroServices.

The following function, when added to and called in framework/test/ServiceFactoryTest.cpp, will reproduce the issue on Linux and Mac. The issue doesn't reproduce on Windows.

#ifdef US_ENABLE_THREADING_SUPPORT

// test that concurrent calls to ServiceFactory::GetService and ServiceFactory::UngetService
// don't cause race conditions.
void TestConcurrentServiceFactory()
{
  auto framework = FrameworkFactory().NewFramework();
  framework.Start();

  auto bundle = testing::InstallLib(framework.GetBundleContext(), "TestBundleH");
  bundle.Start();

  std::vector<std::thread> worker_threads;
  for (size_t i = 0; i < 100; ++i)
  {
    worker_threads.push_back(std::thread([framework]()
        {
          for (int i = 0; i < 1000; ++i)
          {
              auto frameworkCtx = framework.GetBundleContext();
              if (frameworkCtx)
              {
                auto ref = frameworkCtx.GetServiceReference<cppmicroservices::TestBundleH>();
                if (ref)
                {
                  std::shared_ptr<cppmicroservices::TestBundleH> svc = frameworkCtx.GetService(ref);
                  US_TEST_CONDITION_REQUIRED(svc, "test for valid service object");
                }
              }
          }
        }));
  }

  for (auto& t : worker_threads) t.join();
}
#endif

The stack trace from the resulting seg fault is

0x00000000005d5929 in std::_Rb_tree<std::string, std::pair<std::string const, std::shared_ptr<void> >, std::_Select1st<std::pair<std::string const, std::shared_ptr<void> > >, std::less<std::string>, std::allocator<std::pair<std::string const, std::shared_ptr<void> > > >::_M_begin (this=0x0) at gcc-4.9.3/include/c++/4.9.3/bits/stl_tree.h:524
524		  (this->_M_impl._M_header._M_parent);
(gdb) bt
#0  0x00000000005d5929 in std::_Rb_tree<std::string, std::pair<std::string const, std::shared_ptr<void> >, std::_Select1st<std::pair<std::string const, std::shared_ptr<void> > >, std::less<std::string>, std::allocator<std::pair<std::string const, std::shared_ptr<void> > > >::_M_begin (this=0x0) at gcc-4.9.3/include/c++/4.9.3/bits/stl_tree.h:524
#1  0x00000000005d4a57 in std::_Rb_tree<std::string, std::pair<std::string const, std::shared_ptr<void> >, std::_Select1st<std::pair<std::string const, std::shared_ptr<void> > >, std::less<std::string>, std::allocator<std::pair<std::string const, std::shared_ptr<void> > > >::find (this=0x0, __k=...) at gcc-4.9.3/include/c++/4.9.3/bits/stl_tree.h:1926
#2  0x00000000005d3a60 in std::map<std::string, std::shared_ptr<void>, std::less<std::string>, std::allocator<std::pair<std::string const, std::shared_ptr<void> > > >::find (this=0x0, __x=...)
    at gcc-4.9.3/include/c++/4.9.3/bits/stl_map.h:875
#3  0x00007ffff769d6ae in cppmicroservices::ExtractInterface (map=..., interfaceId=...)
    at CppMicroServices/framework/include/cppmicroservices/ServiceInterface.h:325
#4  0x00007ffff769bf81 in cppmicroservices::ServiceReferenceBasePrivate::GetService (this=0x7fffe4000970, bundle=0x931a80)
    at CppMicroServices/framework/src/service/ServiceReferenceBasePrivate.cpp:106
#5  0x00007ffff76c5151 in cppmicroservices::BundleContext::GetService (this=0x7fffdfffee10, reference=...)
    at CppMicroServices/framework/src/bundle/BundleContext.cpp:282
#6  0x00000000005d4958 in cppmicroservices::BundleContext::GetService<cppmicroservices::TestBundleH> (this=0x7fffdfffee10, reference=...)
    at CppMicroServices/framework/include/cppmicroservices/BundleContext.h:581
#7  0x00000000005d0f25 in <lambda()>::operator()(void) const (__closure=0x935708) at ./CppMicroServices/framework/test/ServiceFactoryTest.cpp:198
#8  0x00000000005d3656 in std::_Bind_simple<TestConcurrentServiceFactory()::<lambda()>()>::_M_invoke<>(std::_Index_tuple<>) (this=0x935708)
    at gcc-4.9.3/include/c++/4.9.3/functional:1700
#9  0x00000000005d34be in std::_Bind_simple<TestConcurrentServiceFactory()::<lambda()>()>::operator()(void) (this=0x935708)
    at gcc-4.9.3/include/c++/4.9.3/functional:1688
#10 0x00000000005d33ad in std::thread::_Impl<std::_Bind_simple<TestConcurrentServiceFactory()::<lambda()>()> >::_M_run(void) (this=0x9356f0)
    at gcc-4.9.3/include/c++/4.9.3/thread:115
#11 0x00007ffff70d4970 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#12 0x00007ffff68f2064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#13 0x00007ffff662762d in clone () from /lib/x86_64-linux-gnu/libc.so.6

@jeffdiclemente jeffdiclemente added the bug label Apr 20, 2017

@jeffdiclemente jeffdiclemente self-assigned this May 26, 2017

@jeffdiclemente jeffdiclemente added this to In Progress in Release 3.2 May 26, 2017

@saschazelzer

This comment has been minimized.

Show comment
Hide comment
@saschazelzer

saschazelzer Jun 2, 2017

Member

Did you start working on this? I had a look at it a week ago and there are multiple issues involved here.

Member

saschazelzer commented Jun 2, 2017

Did you start working on this? I had a look at it a week ago and there are multiple issues involved here.

@jeffdiclemente

This comment has been minimized.

Show comment
Hide comment
@jeffdiclemente

jeffdiclemente Jun 2, 2017

Contributor

Yes, I've started working on this.

There is a race that involves how ServiceRegistrationBasePrivate::dependents is managed and used in ServiceReferenceBasePrivate::GetServiceInterfaceMap. ServiceReferenceBasePrivate::UngetService will clear the count in ServiceRegistrationBasePrivate::dependents and if this happens while another thread is somewhere in ServiceReferenceBasePrivate::GetServiceInterfaceMap the wrong conditional branch can be taken, leading to a nullptr being returned from ServiceReferenceBasePrivate::GetServiceInterfaceMap.

A related issue is thatoperator[] is used in ServiceReferenceBasePrivate.cpp to retrieve values from the std::unordered_map members. at() would be more appropriate in most cases since it wouldn't insert a new value where the code assumes there is already a value in the std::unordered_map to retrieve.

This is what I've found so far. What issues did you find?

Contributor

jeffdiclemente commented Jun 2, 2017

Yes, I've started working on this.

There is a race that involves how ServiceRegistrationBasePrivate::dependents is managed and used in ServiceReferenceBasePrivate::GetServiceInterfaceMap. ServiceReferenceBasePrivate::UngetService will clear the count in ServiceRegistrationBasePrivate::dependents and if this happens while another thread is somewhere in ServiceReferenceBasePrivate::GetServiceInterfaceMap the wrong conditional branch can be taken, leading to a nullptr being returned from ServiceReferenceBasePrivate::GetServiceInterfaceMap.

A related issue is thatoperator[] is used in ServiceReferenceBasePrivate.cpp to retrieve values from the std::unordered_map members. at() would be more appropriate in most cases since it wouldn't insert a new value where the code assumes there is already a value in the std::unordered_map to retrieve.

This is what I've found so far. What issues did you find?

@saschazelzer

This comment has been minimized.

Show comment
Hide comment
@saschazelzer

saschazelzer Jun 7, 2017

Member

I found that the service factory implementation used in the test cases is not thread-safe. The framework may concurrently call the GetService and UngetService functions. However, it must not (according to the specs) recursively call those two functions in the same thread. We do not guard against this. And there is the issue of incomplete locking / checking of dependents you found, leading to nullptr being returned in some cases and the caller not checking for it.

I looked at existing Java implementations and the specs and while the fixes are not trivial, it looks like not a lot of work. Are you on it already or do you want me to take a look?

Member

saschazelzer commented Jun 7, 2017

I found that the service factory implementation used in the test cases is not thread-safe. The framework may concurrently call the GetService and UngetService functions. However, it must not (according to the specs) recursively call those two functions in the same thread. We do not guard against this. And there is the issue of incomplete locking / checking of dependents you found, leading to nullptr being returned in some cases and the caller not checking for it.

I looked at existing Java implementations and the specs and while the fixes are not trivial, it looks like not a lot of work. Are you on it already or do you want me to take a look?

@jeffdiclemente

This comment has been minimized.

Show comment
Hide comment
@jeffdiclemente

jeffdiclemente Jun 7, 2017

Contributor

Yep, I'm already working on the issue involving the ServiceFactory race condition.
I will need your help to code review the changes.

I did not see the issue with recursive calls to GetService and UngetService. This sounds like a separate issue which should be tracked and fixed. What do you think?

Do you have a use case (e.g. sample code) which would cause the Framework to recursively call GetService and UngetService on the same thread?
This would be helpful to formalize into a test. Thanks.

Contributor

jeffdiclemente commented Jun 7, 2017

Yep, I'm already working on the issue involving the ServiceFactory race condition.
I will need your help to code review the changes.

I did not see the issue with recursive calls to GetService and UngetService. This sounds like a separate issue which should be tracked and fixed. What do you think?

Do you have a use case (e.g. sample code) which would cause the Framework to recursively call GetService and UngetService on the same thread?
This would be helpful to formalize into a test. Thanks.

@saschazelzer

This comment has been minimized.

Show comment
Hide comment
@saschazelzer

saschazelzer Jun 9, 2017

Member

The potential recursive calls to GetService and UngetService are a separate issue and we can handle that in another issue. I wasn't clear enough about how this could happen: When the framework calls GetService, it effectively calls into external code and the service factory implementation could (in theory) call GetService or UngetService on the same factory instance again (either explicitly or implicitly via calling back into the framwork API). This needs to be detected and handled.

Member

saschazelzer commented Jun 9, 2017

The potential recursive calls to GetService and UngetService are a separate issue and we can handle that in another issue. I wasn't clear enough about how this could happen: When the framework calls GetService, it effectively calls into external code and the service factory implementation could (in theory) call GetService or UngetService on the same factory instance again (either explicitly or implicitly via calling back into the framwork API). This needs to be detected and handled.

@jeffdiclemente

This comment has been minimized.

Show comment
Hide comment
@jeffdiclemente

jeffdiclemente Jun 9, 2017

Contributor

Ok. I will create a separate issue to track the recursive call use case.
If you have time, test code which triggers a recursive call would be very helpful.
Thanks!

Contributor

jeffdiclemente commented Jun 9, 2017

Ok. I will create a separate issue to track the recursive call use case.
If you have time, test code which triggers a recursive call would be very helpful.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment