Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugs in program::get_library() and init_library() #968

Open
nyh opened this issue May 9, 2018 · 1 comment
Open

bugs in program::get_library() and init_library() #968

nyh opened this issue May 9, 2018 · 1 comment

Comments

@nyh
Copy link
Contributor

nyh commented May 9, 2018

Two potential problems in program::get_library(). I noticed these in a code review, I haven't seen any actual crash or error:

  1. init_library() is traditionally called inside get_library(), where it is protected with a SCOPE_LOCK(_mutex). Since it can now be called from outside get_library(), we should take the lock also in init_library() itself (because our mutex is recursive, it's ok to take the lock in both).

  2. get_library() stores the list of loaded objects needing initialization at the top of program's _loaded_objects_stack (the entire list is one item on this stack), and init_library() runs the last item (potentially for several objects loaded together). This is probably wrong - what if we call two get_library() for the same program almost in parallel, and then call two init_library()? We can have the init_library() initialize the wrong objects. For example consider we do this:

  • get_library(lib1.so) in thread 1
  • create thread 2, which at this point will have lib1.so's TLS variables.
  • get_library(lib2.so) in thread 3
  • In thread 2, run init_library(), thinking it will run lib1.so's init functions and we have its TLS variables. But, on the top of the stack, we'll have lib2.so's init functions! And we'll run those without their TLS variables - exactly what the separate init_library() feature was trying to avoid.

One wrong solution is not to have a stack of object lists, instead of in the elf::object (what get_library() returns) its own list. So the caller (app.cc) doesn't run program::init_library() - rather it runs _lib->init_library() (where _lib is the previous return from get_library). But unfortunately, this is not really a correct solution. Both lib1.so and lib2.so may be linked with (DT_NEEDED) the same third object, so it's important that although lib2.so did not need to load this object a second time, when we wait for its init_library(), we must wait for this shared object to have been initialized.

Instead we can expand the list we keep for each object to which objects it needed (not necessarily loaded!) - actually I think we already have this list (_needed)! We also need to remember which object has already been initialized, so we don't initialize again an object object which we already initialized.

We need to make sure that after changing this, the init order remains sane. Note that the existing init order wasn't perfect either (see issue #334), but we don't want it to become even worse.

@wkozaczuk
Copy link
Collaborator

I came across this stack trace when trying to run capstan with newest kernel and older cpiod.so (from Jan 2018). It only happened once which makes me think it is some kind of race condition described above.

OSv v0.24-536-gd44fe0f3
application::prepare_argv called!
eth0: 192.168.122.15
application::prepare_argv called!
/tools/cpiod.so: failed looking up symbol _ZNK5boost15program_options22error_with_option_name23substitute_placeholdersERKSs (boost::program_options::error_with_option_name::substitute_placeholders(std::string const&) const)

[backtrace]
0x0000000000344963 <elf::object::symbol(unsigned int)+227>
0x00000000003977d6 <elf::object::arch_relocate_rela(unsigned int, unsigned int, void*, long)+166>
0x00000000003429e4 <elf::object::relocate_rela()+148>
0x00000000003456f7 <elf::object::relocate()+199>
0x0000000000348c7c <elf::program::load_object(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<std::shared_ptr<elf::object>, std::allocator<std::shared_ptr<elf::object> > >&)+1452>
0x00000000003494d0 <elf::program::get_library(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, bool)+336>
0x000000000042383b <osv::application::application(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<0x00000000004240bc <osv::application::run(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>0x000000000042432b <osv::application::run(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)+91>
0x000000000021599c <do_main_thread(void*)+1964>
0x00000000004570d5 <???+4550869>
0x00000000003f1ba6 <thread_main_c+38>
0x00000000003935e2 <???+3749346>
0xd300ef2ce4f001ff <???+-454032897>
0x00000000003f159f <???+4134303>
0x4156415741e58947 <???+1105561927>

What is also interesting the symbol _ZNK5boost15program_options22error_with_option_name23substitute_placeholdersERKSs does not exist:
nm build/release/loader.elf | grep _ZNK5boost15program_options22error_with_option_name23substitute_placeholdersERKSs

but this one does:
0000000000602190 T _ZNK5boost15program_options22error_with_option_name23substitute_placeholdersERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

If expected symbol does not exist (or I am misreading it and it does) then how come almost always (except one occurrence) cpiod works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants