Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore SEGV during profiler unwind on Unix #28291

Merged
merged 3 commits into from
Jul 30, 2018
Merged

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Jul 26, 2018

Unix equivalent of #4159. Unpolished, works for my use case (libcuda tripping up libunwind).

@maleadt maleadt requested a review from Keno July 26, 2018 13:29
bt_size_cur += rec_backtrace_ctx((uintptr_t*)bt_data_prof + bt_size_cur,
bt_size_max - bt_size_cur - 1, signal_context);
} else {
jl_safe_printf("WARNING: profiler attempt to access an invalid memory location\n");
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any risk of printing this message on each profile interval?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just mimicking

jl_safe_printf("WARNING: profiler attempt to access an invalid memory location\n");
. I agree it's not an interesting message for users though, but I've only seen it print at most a couple of times during nontrivial (1-10s) profile invocations.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@yuyichao
Copy link
Contributor

We already have a mechanism for this.

ptls->safe_restore = &buf;

@maleadt
Copy link
Member Author

maleadt commented Jul 26, 2018

OK, changed the approach. Will squash on merge.

@@ -225,6 +225,18 @@ static void segv_handler(int sig, siginfo_t *info, void *context)
jl_ptls_t ptls = jl_get_ptls_states();
assert(sig == SIGSEGV || sig == SIGBUS);

// if we're profiling, this segfault is likely caused by the unwinder.
// ignore the signal and jump back to where we came from.
if (running && ptls->safe_restore) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this? Does it not work with the condition below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just being conservative, only affecting the case where the profiler is running. Do it unconditionally then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean you shouldn't need any code here in the segfault handler. Have you tested that it doesn't work without this but with the condition a few lines below?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Or in another word, it is meant to be doing this unconditionally)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, no it doesn't work, looking closer it triggers a segfault in jl_call_in_ctx (via jl_throw_in_ctx..., jl_stackovf_exception, ...)).
Inferring from the function names, doesn't that behave differently from the plain longjmp I do here, would I need to catch an exception then somehow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, segfault in jl_call_in_ctx? Did you get a NULL context?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I guess this thread doesn't have signal_stack allocated. I believe this should fix it (you can merge this with the falllback ifdef below if you want).

diff --git a/src/signals-unix.c b/src/signals-unix.c
index 0fafe121cd..8da89b5fc4 100644
--- a/src/signals-unix.c
+++ b/src/signals-unix.c
@@ -89,6 +89,14 @@ static void jl_call_in_ctx(jl_ptls_t ptls, void (*fptr)(void), int sig, void *_c
     // checks that the syscall is made in the signal handler and that
     // the ucontext address is valid. Hopefully the value of the ucontext
     // will not be part of the validation...
+    if (!ptls->signal_stack) {
+        sigset_t sset;
+        sigemptyset(&sset);
+        sigaddset(&sset, sig);
+        sigprocmask(SIG_UNBLOCK, &sset, NULL);
+        fptr();
+        return;
+    }
     uintptr_t rsp = (uintptr_t)ptls->signal_stack + sig_stack_size;
     assert(rsp % 16 == 0);
 #if defined(_OS_LINUX_) && defined(_CPU_X86_64_)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that works. Thanks!
Why isn't this used for OSX btw? Mimicking profiler_segv_handler which does thread_set_state is what got me here in the first place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't handle segfault the same way on OSX. I don't really know if the two ways could be used together.

@maleadt maleadt changed the title WIP/RFC: Ignore SEGV during profiler unwind on Unix Ignore SEGV during profiler unwind on Unix Jul 27, 2018
@maleadt
Copy link
Member Author

maleadt commented Jul 28, 2018

@yuyichao any further comments?

@maleadt maleadt mentioned this pull request Jul 30, 2018
13 tasks
@JeffBezanson JeffBezanson merged commit 9bb2273 into master Jul 30, 2018
@JeffBezanson JeffBezanson deleted the tb/profiler_segv_unix branch July 30, 2018 06:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants