Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update boehmgc-coroutine-sp-fallback.diff for darwin #6987

Conversation

matthewbauer
Copy link
Member

@matthewbauer matthewbauer commented Sep 1, 2022

The darwin_stop_world implementation is slightly different. sp goes to
altstack_lo instead of lo in this case. Assuming that is an
implementation detail.

But the fix is the same, when we detect alstack_lo outside of the
expected stack range, we reset it to hi - stack_limit.

Here stack_limit is calculated with pthread_get_stacksize_np since
that is the BSD equivalent to pthread_attr_getstacksize.

Fixes #4919

cc @roberth @edolstra

The darwin_stop_world implementation is slightly different. sp goes to
altstack_lo instead of lo in this case. Assuming that is an
implementation detail.

But the fix is the same, when we detect alstack_lo outside of the
expected stack range, we reset it to hi - stack_limit.

Here stack_limit is calculated with pthread_get_stacksize_np since
that is the BSD equivalent to pthread_attr_getstacksize.
@edolstra
Copy link
Member

edolstra commented Sep 1, 2022

Isn't this potentially bad for performance, since (if I read it correctly) it causes Boehm to scan the entire stack? Especially on 64-bit systems, the stack can potentially be very large, and reading it can cause uninitialized parts of the stack to become committed. (Nix uses a 64 MB stack, see src/nix/main.cc.)

But maybe it's not too bad.

@matthewbauer
Copy link
Member Author

matthewbauer commented Sep 1, 2022

Isn't this potentially bad for performance, since (if I read it correctly) it causes Boehm to scan the entire stack? Especially on 64-bit systems, the stack can potentially be very large, and reading it can cause uninitialized parts of the stack to become committed. (Nix uses a 64 MB stack, see src/nix/main.cc.)

Yeah maybe need to clarify with Robert on that, not really sure of the ramifications. I think this is definitely not an ideal solution since it relies on some guesswork of where the stack is. But it definitely beats the segfault you'd otherwise get!

@roberth
Copy link
Member

roberth commented Sep 1, 2022

since (if I read it correctly) it causes Boehm to scan the entire stack?

You've read correctly, but this penalty is only paid when GC happens while a coroutine is active, which seems to be only during filterSource, which might be rare.
This a known flaw and it's been present in linux builds since we have the current gc/coroutine integration #4944.

slightly different. sp goes to
altstack_lo instead of lo in this case. Assuming that is an
implementation detail.

The altstack is for signal handling, so I'm almost certain that this is a wrong assumption.

On reading the file more thoroughly it seems that we might need to support segmented stacks on darwin? I'm not sure, because that would also affect our stack protection page solution (detectStackOverflow() and such). If we need it, we'll have to replace the simple "put the stack in a gc region" by a solution that hooks into the gc and calls GC_push_all_stack_sections like crystal-lang anyway.

@roberth
Copy link
Member

roberth commented Sep 1, 2022

It seems that things can be made simpler by having coroutine building blocks directly in bdwgc. I've (re)started the discussion here ivmai/bdwgc#362 (comment).
We might want to pursue this PR's quick-ish "good enough" solution in the meanwhile, so as not to leave macOS users behind.

Another potentially "good enough" idea is to disable GC while any co-routine exists. This only works well if the number of co-routines reaches 0 quickly and often. Afaik this may be the case, but would need to be confirmed.

@matthewbauer
Copy link
Member Author

The altstack is for signal handling, so I'm almost certain that this is a wrong assumption.

Yeah, not sure what is going on. What gave me that impression was this block, which is the only place paltstack_lo is set:

https://github.com/ivmai/bdwgc/blob/af1eede4362a53a32b25c4063df6c2ccc8ed489b/darwin_stop_world.c#L328-L339

(and noticing that the callstack from the segfaults involves the if (altstack_log) condition) Perhaps it doesn't really matter for the purpose of the patch though.

@edolstra edolstra merged commit a9af12e into NixOS:master Sep 2, 2022
Minion3665 pushed a commit to Minion3665/nix that referenced this pull request Feb 23, 2023
…ine-sp-fallback-for-darwin

Update boehmgc-coroutine-sp-fallback.diff for darwin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Random segfaults with aarch64-darwin on flakes based project
3 participants