-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix gc crash #4944
Fix gc crash #4944
Conversation
This fixes a crash that looks like: ``` Thread 1 "nix-build" received signal SIGSEGV, Segmentation fault. 0x00007ffff7ad22a0 in GC_push_all_eager () from /nix/store/p1z58l18klf88iijpd0qi8yd2n9lhlk4-boehm-gc-8.0.4/lib/libgc.so.1 (gdb) bt 0 0x00007ffff7ad22a0 in GC_push_all_eager () from /nix/store/p1z58l18klf88iijpd0qi8yd2n9lhlk4-boehm-gc-8.0.4/lib/libgc.so.1 1 0x00007ffff7adeefb in GC_push_all_stacks () from /nix/store/p1z58l18klf88iijpd0qi8yd2n9lhlk4-boehm-gc-8.0.4/lib/libgc.so.1 2 0x00007ffff7ad5ac7 in GC_mark_some () from /nix/store/p1z58l18klf88iijpd0qi8yd2n9lhlk4-boehm-gc-8.0.4/lib/libgc.so.1 3 0x00007ffff7ad77bd in GC_stopped_mark () from /nix/store/p1z58l18klf88iijpd0qi8yd2n9lhlk4-boehm-gc-8.0.4/lib/libgc.so.1 4 0x00007ffff7adbe3a in GC_try_to_collect_inner.part.0 () from /nix/store/p1z58l18klf88iijpd0qi8yd2n9lhlk4-boehm-gc-8.0.4/lib/libgc.so.1 5 0x00007ffff7adc2a2 in GC_collect_or_expand () from /nix/store/p1z58l18klf88iijpd0qi8yd2n9lhlk4-boehm-gc-8.0.4/lib/libgc.so.1 6 0x00007ffff7adc4f8 in GC_allocobj () from /nix/store/p1z58l18klf88iijpd0qi8yd2n9lhlk4-boehm-gc-8.0.4/lib/libgc.so.1 7 0x00007ffff7adc88f in GC_generic_malloc_inner () from /nix/store/p1z58l18klf88iijpd0qi8yd2n9lhlk4-boehm-gc-8.0.4/lib/libgc.so.1 8 0x00007ffff7ae1a04 in GC_generic_malloc_many () from /nix/store/p1z58l18klf88iijpd0qi8yd2n9lhlk4-boehm-gc-8.0.4/lib/libgc.so.1 9 0x00007ffff7ae1c72 in GC_malloc_kind () from /nix/store/p1z58l18klf88iijpd0qi8yd2n9lhlk4-boehm-gc-8.0.4/lib/libgc.so.1 10 0x00007ffff7e003d6 in nix::EvalState::allocValue() () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so 11 0x00007ffff7e04b9c in nix::EvalState::callPrimOp(nix::Value&, nix::Value&, nix::Value&, nix::Pos const&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so 12 0x00007ffff7e0a773 in nix::EvalState::callFunction(nix::Value&, nix::Value&, nix::Value&, nix::Pos const&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so 13 0x00007ffff7e0a91d in nix::ExprApp::eval(nix::EvalState&, nix::Env&, nix::Value&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so 14 0x00007ffff7e0a8f8 in nix::ExprApp::eval(nix::EvalState&, nix::Env&, nix::Value&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so 15 0x00007ffff7e0e0e8 in nix::ExprOpNEq::eval(nix::EvalState&, nix::Env&, nix::Value&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so 16 0x00007ffff7e0d708 in nix::ExprOpOr::eval(nix::EvalState&, nix::Env&, nix::Value&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so 17 0x00007ffff7e0d695 in nix::ExprOpOr::eval(nix::EvalState&, nix::Env&, nix::Value&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so 18 0x00007ffff7e0d695 in nix::ExprOpOr::eval(nix::EvalState&, nix::Env&, nix::Value&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so 19 0x00007ffff7e0d695 in nix::ExprOpOr::eval(nix::EvalState&, nix::Env&, nix::Value&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so 20 0x00007ffff7e0d695 in nix::ExprOpOr::eval(nix::EvalState&, nix::Env&, nix::Value&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so 21 0x00007ffff7e09e19 in nix::ExprOpNot::eval(nix::EvalState&, nix::Env&, nix::Value&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so 22 0x00007ffff7e0a792 in nix::EvalState::callFunction(nix::Value&, nix::Value&, nix::Value&, nix::Pos const&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so 23 0x00007ffff7e8cba0 in nix::addPath(nix::EvalState&, nix::Pos const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nix::Value*, nix::FileIngestionMethod, std::optional<nix::Hash>, nix::Value&)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#1}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so 24 0x00007ffff752e6f9 in nix::dump(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nix::Sink&, std::function<bool (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixutil.so 25 0x00007ffff752e8e2 in nix::dump(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nix::Sink&, std::function<bool (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixutil.so 26 0x00007ffff752e8e2 in nix::dump(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nix::Sink&, std::function<bool (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixutil.so 27 0x00007ffff752e8e2 in nix::dump(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nix::Sink&, std::function<bool (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixutil.so 28 0x00007ffff752e8e2 in nix::dump(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nix::Sink&, std::function<bool (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixutil.so 29 0x00007ffff752e8e2 in nix::dump(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nix::Sink&, std::function<bool (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixutil.so 30 0x00007ffff757f8c0 in void boost::context::detail::fiber_entry<boost::context::detail::fiber_record<boost::context::fiber, nix::VirtualStackAllocator, boost::coroutines2::detail::pull_coroutine<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::control_block::control_block<nix::VirtualStackAllocator, nix::sinkToSource(std::function<void (nix::Sink&)>, std::function<void ()>)::SinkToSource::read(char*, unsigned long)::{lambda(boost::coroutines2::detail::push_coroutine<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >&)#1}>(boost::context::preallocated, nix::VirtualStackAllocator&&, nix::sinkToSource(std::function<void (nix::Sink&)>, std::function<void ()>)::SinkToSource::read(char*, unsigned long)::{lambda(boost::coroutines2::detail::push_coroutine<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >&)#1}&&)::{lambda(boost::context::fiber&&)#1}> >(boost::context::detail::transfer_t) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixutil.so 31 0x00007ffff6f331ef in make_fcontext () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libboost_context.so.1.69.0 32 0x0000000000000000 in ?? () ```
Fixes the problem where a stack pointer outside the original thread causes the collector to crash. It could be made more accurate by recording the stack pointer every time we switch to a coroutine. We can use this information to update our own coroutine stacks like normal data. When the stack pointer is on a thread, we can add a field to GC_thread "fallback_sp" to be used when the thread sp is outside the original thread range.
should the boehm-gc patch be upstreamed? looks like it addresses some of the |
The existing fixmes in boehm are for an unrelated feature on very rare architectures where the stack grows in the other direction. I did not implement those fixmes because we don't need it. Otoh, I couldn't resist writing the code to support these rare archs in my own part. I'll remove it, so the next person won't think it matters to Nix. |
As it is, it is part of a very specific solution that works for Nix, which isn't ideal because it scans garbage stack sections sometimes. That's ok because it's a conservative garbage collector and in the case of Nix those garbage sections will not be scanned anymore when the coroutine has ended. This is the main reason I have doubts about upstreaming. |
Can confirm that this fixes a segfault we experience when building |
That's fair. Could we open an issue and explain what we did, so they're aware of our use case? |
Really great work, thanks! Yeah it would be good to get upstream's ideas about this. |
While formulating my question I've found issues I hadn't seen before, with some recommendations. I'll summarize my findings here. |
As has been done in NixOS/nix#4944 This introduces the boehmgc_nix and boehmgc_nixUnstable attributes which are useful for external packages that link with Nix and its boehmgc.
As has been done in NixOS/nix#4944 This introduces the boehmgc_nix and boehmgc_nixUnstable attributes which are useful for external packages that link with Nix and its boehmgc. (cherry picked from commit 596ac24)
The boehmgc patch only applies to Linux. Boehm GC has a separate "stop the world and push thread stacks" implementation for Darwin (see |
This was an oversight. See #4919
It's more frequent than that. The main cause is that an OS thread has its stack pointer in a coroutine, which is not supported by an unpatched GC. Which thread (or stack) invokes the GC doesn't really matter beyond that. |
I've figured out another cause of gc crashes, related to coroutines.
This now solves
I'll run
nixUnstable
with this PR and #4895 to see if any other crashes occur.