Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory corruption caused by GC-invisible coroutine stacks #4206

Merged
merged 2 commits into from Nov 5, 2020

Conversation

@roberth
Copy link
Member

@roberth roberth commented Oct 30, 2020

Crucially this introduces BoehmGCStackAllocator, but it also
adds a bunch of wiring to avoid making libutil depend on bdw-gc.
The extra wiring, more than half of the PR, can be removed if it's ok to depend on bdw-gc.

Part of the solutions for #4178, #4200

Ping @edolstra

Crucially this introduces BoehmGCStackAllocator, but it also
adds a bunch of wiring to avoid making libutil depend on bdw-gc.

Part of the solutions for NixOS#4178, NixOS#4200
@roberth roberth mentioned this pull request Oct 30, 2020
The default stack size was not based on the normal stack size and
was too small.
@edolstra
Copy link
Member

@edolstra edolstra commented Nov 2, 2020

Yes, it's okay to make libutil depend on bdw-gc. The only downside is that it makes the footprint of nix-daemon a bit bigger, if we ever want to have a separate daemon executable (NixOS/rfcs#68).

Another solution would be to not have GC roots on coroutine stacks. Where are we doing that?

8 MiB sounds like quite a lot for coroutines. So long as we're not evaluating in coroutines (which we shouldn't, since the stack overflow detection probably doesn't work there), then a much smaller stack should be sufficient. How much does Boost allocate by default?

@roberth
Copy link
Member Author

@roberth roberth commented Nov 2, 2020

have GC roots on coroutine stacks. Where are we doing that?

We've started evaluating in coroutines with #4030. It's not so obvious because of the indirections, but the source filter in eg filterSource needed to run while creating the dump, which is now interleaved with sending the dump.

8 MiB sounds like quite a lot for coroutines

I think I agree, but for evaluation it seems like a safe number. It's only virtual memory, so a bigger value shouldn't hurt.

which we shouldn't, since the stack overflow detection probably doesn't work there

This works. Boost provides protected_fixedsize_stack for this purpose.
The default stack size, 128K iirc, is indeed too small for evaluation. That had the benefit of immediately confirming that overflow detection worked :)

Yes, it's okay to make libutil depend on bdw-gc.

Does that mean you want to remove the DefaultStackAllocator etc boilerplate? Keeping it helps with the NixOS/rfcs#68 goal.

@mkaito
Copy link
Contributor

@mkaito mkaito commented Nov 2, 2020

Figured I'd drop by to report that this fixed the malloc issues that had been plaguing me for months on nix unstable.

@roberth
Copy link
Member Author

@roberth roberth commented Nov 4, 2020

@edolstra Not sure if you saw my response. I think I've addressed your concerns.

@edolstra edolstra merged commit 387f824 into NixOS:master Nov 5, 2020
2 checks passed
@edolstra
Copy link
Member

@edolstra edolstra commented Nov 5, 2020

@roberth Thanks!

@wizeman
Copy link
Member

@wizeman wizeman commented Nov 9, 2020

I suspect that this PR has resulted in hydra-queue-runner locking up and consuming 100% CPU usage for me (possibly due to an unrelated bug, or maybe because this PR is incomplete?).

Could you take a look at the stack trace in NixOS/hydra#816 (comment) and see whether it's related?

@wizeman
Copy link
Member

@wizeman wizeman commented Nov 9, 2020

Do you think the following patch would fix the 100% CPU usage / infinite stack overflow correctly? I don't understand the code (way too many C++isms for me) so I'm not sure if it's correct...

diff --git a/src/libutil/serialise.cc b/src/libutil/serialise.cc
index 28f6968d0..038ede049 100644
--- a/src/libutil/serialise.cc
+++ b/src/libutil/serialise.cc
@@ -195,7 +195,7 @@ class DefaultStackAllocator : public StackAllocator {
     }
 
     void deallocate(boost::context::stack_context sctx) {
-        deallocate(sctx);
+        stack.deallocate(sctx);
     }
 };

@wizeman
Copy link
Member

@wizeman wizeman commented Nov 10, 2020

I've submitted the above patch as PR #4242.

edolstra pushed a commit that referenced this issue Nov 10, 2020
edolstra added a commit that referenced this issue Nov 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants