Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix profiling on OS X #4159

Merged
merged 5 commits into from
Aug 28, 2013
Merged

Fix profiling on OS X #4159

merged 5 commits into from
Aug 28, 2013

Conversation

Keno
Copy link
Member

@Keno Keno commented Aug 26, 2013

Fixes #3971 .

Apple's version of libunwind does not handle the case when we're in a prologue or epilogue. This introduces a modified version that adds support for that where possible. Note that you will still experience segfaults if you are using Clang < 3.3 as there was a bug in the way LLVM generates the Compact Unwind Info in prior versions.

Apart from that, this also replaces the SIGALRM based version of the profiler with a Mach-based one to reduce interference with libraries that may make use of SIGALRM as well as preventing the profiler from being blocked by blocking that signal.

@StefanKarpinski
Copy link
Sponsor Member

Seems to work on my system. Do we have any code that we know this didn't work for before?

@blakejohnson
Copy link
Contributor

I'll try it on some previously failing cases.

@Keno
Copy link
Member Author

Keno commented Aug 26, 2013

@blakejohnson Make sure you're running a version of Clang > 3.2. Otherwise, you might still see segfaults.

@blakejohnson
Copy link
Contributor

Indeed, with latest Apple build of Clang (Apple LLVM version 4.2 (clang-425.0.24)), I still see segfaults.

Is the homebrew recipe up to date for Clang?

@Keno
Copy link
Member Author

Keno commented Aug 27, 2013

@blakejohnson You can just set BUILD_LLVM_CLANG = 1 in your Make.user and use that clang.

@blakejohnson
Copy link
Contributor

Once I have an up to date Clang, what needs to be recompiled? libosxunwind, libjulia, ...?

@Keno
Copy link
Member Author

Keno commented Aug 27, 2013

Everything you want to profile.

@ViralBShah
Copy link
Member

All the failures I have seen are in openblas.

@blakejohnson
Copy link
Contributor

I haven't had time this evening to rebuild everything, but my previously failing test spends most of its time in OpenBLAS, so I started with that. Unfortunately, it still segfaults at least 50% of the time. I'll try running it through GDB tomorrow to see if it is still crashing inside an OpenBLAS stack trace.

@Keno
Copy link
Member Author

Keno commented Aug 27, 2013

Can you give me a testcase?

@ViralBShah
Copy link
Member

See: timholy/IProfile.jl#14

@blakejohnson
Copy link
Contributor

@loladiro try https://gist.github.com/blakejohnson/6353018

@blakejohnson
Copy link
Contributor

The backtraces are not very enlightening. Here's an example

Program received signal SIGSEGV, Segmentation fault.
[Switching to process 47725 thread 0x2503]
0x00000001009ecc13 in libunwind::CompactUnwinder_x86_64<libunwind::LocalAddressSpace>::stepWithCompactEncodingRBPFrame ()
(gdb) bt
#0  0x00000001009ecc13 in libunwind::CompactUnwinder_x86_64<libunwind::LocalAddressSpace>::stepWithCompactEncodingRBPFrame ()
#1  0x00000001009eacf1 in libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::step ()
#2  0x00000001000e2a55 in rec_backtrace_ctx (data=0x104036d68, maxsize=971922, uc=0x107fd0a38) at task.c:561
#3  0x00000001000f5bd5 in mach_profile_listener (arg=0x0) at profile.c:155
#4  0x00007fff8d16d8bf in _pthread_start ()
#5  0x00007fff8d170b75 in thread_start ()

@Keno
Copy link
Member Author

Keno commented Aug 27, 2013

I know what the problem is. I'll have a patch momentarily.

See the comment in src/profile.c for an explanation for what this is doing
@Keno
Copy link
Member Author

Keno commented Aug 27, 2013

PR Updated

@blakejohnson
Copy link
Contributor

That does seem to do the trick. I now see "unrooted" backtraces like what we have on linux #3469, but this is worlds better than segfaulting.

Awesome work, Keno.

@Keno
Copy link
Member Author

Keno commented Aug 27, 2013

I'll have a look at the truncated backtraces issue.

@Keno
Copy link
Member Author

Keno commented Aug 27, 2013

Alright, re: truncated backtraces, I know what the issue is, but there is now way to fix it in general without implementing a full x86_64 emulator (which is what e.g. lldb does - and to some extent so do I in libosxunwind). I can however fix at least OpenBlas to generate the necessary debug information (on Mac at least, on linux you'd have to create DWARF debug info).

@StefanKarpinski
Copy link
Sponsor Member

Why the #@$! is generating backtraces so damned hard?!? There's a standard for this and it's just a linked list. Why can't compilers just respect that when you ask them to? Ugh.

@timholy
Copy link
Sponsor Member

timholy commented Aug 27, 2013

Really nice work, @loladiro!

@Keno
Copy link
Member Author

Keno commented Aug 27, 2013

@StefanKarpinski there's two problems with that:

  1. You might currently be at an instruction modifying the linked list (what now?)
  2. x86_64 doesn't actually have a frame chain

Both issues are solvable if you know the function boundaries and what registers are being saved since you can use some highly successful heuristics (I'm doing this in libosxunwind). However, as mentioned, that requires some debug information about the function. Libunwind (the non-OSX version), has some heuristics for doing unwinding without debug info if the function does happen to use a frame chain, but that is really just in there because of signal handlers and PLT as in general if your function doesn't have debug info it's also not very likely to adhere to the frame chain.

@blakejohnson
Copy link
Contributor

I previously noticed that OpenBLAS uses bizarre prologue/epilogue conventions in its Mac asm kernels. Is that what you are looking to fix?

Regarding truncated backtraces in general: where there is a straightforward fix, I would grab it. But, they are generally rare enough in practice to not make a significant difference in profiling or other uses. So, I don't think it would be worth the effort to fix all instances.

On Tuesday, August 27, 2013 at 6:16 PM, Keno Fischer wrote:

Alright, re: truncated backtraces, I know what the issue is, but there is now way to fix it in general without implementing a full x86_64 emulator (which is what e.g. lldb does - and to some extent so do I in libosxunwind). I can however fix at least OpenBlas to generate the necessary debug information (on Mac at least, on linux you'd have to create DWARF debug info).


Reply to this email directly or view it on GitHub (#4159 (comment)).

@Keno
Copy link
Member Author

Keno commented Aug 27, 2013

@blakejohnson It's a question of adding an extra Mach-0 section (32bits/ function so I think it's ok).

Keno added a commit that referenced this pull request Aug 28, 2013
@Keno Keno merged commit 924fb0f into master Aug 28, 2013
@blakejohnson
Copy link
Contributor

@loladiro I'm rather naive about these things, but OpenBLAS defines PROLOGUE and EPILOGUE like so

#define PROLOGUE .text;.align 5; .globl REALNAME; REALNAME:
#define EPILOGUE    .subsections_via_symbols

and I guess I was expecting to see .cfi_startproc at the end of the PROLOGUE, and .cfi_endproc in the EPILOGUE, as I've read that those are necessary to get the compiler to generate .eh_frame information.

@Keno
Copy link
Member Author

Keno commented Aug 28, 2013

That's one thing that's necessary, but it might not be sufficient, since I don't know if the system assembler will generate compact unwind info and unfortunately due to (what I assume to be) a liker bug, the unwinder expects the function to have compact unwind info.

if (forceDwarf == 0) {
// Save the backtrace
bt_size_cur += rec_backtrace_ctx((ptrint_t*)bt_data_prof+bt_size_cur, bt_size_max-bt_size_cur-1, &uc);
} else if(forceDwarf == 1) {
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think you want else here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do. It jumps back to unw_getcontext on segv.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, i see you are using unw_getcontext/unw_setcontext profiler_uc as a setjmp/longjmp jmp_buf. so this is correct. perhaps a comment here may be useful?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that was covered by the if that results in a segfault, we retry with DWARF info.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I guess I can add a comment explicitly mentioning it being a jump buffer.

@blakejohnson
Copy link
Contributor

I guess we'll have to to try it and find out what the assembler does.

Do you have any interest in submitting a bug report to Apple w.r.t. their libunwind implementation? I had started writing something a while back, but had only gotten as far as recognizing that there was a problem, whereas you have actually fixed it :).

@BobPortmann
Copy link
Contributor

I had to revert commit 924fb0f on Mac OSX 10.6.8 in order to compile julia. Otherwise I get the error:

make[3]: Circular src/libuwind.cxx <- src/libuwind.cxx.o dependency dropped.
cc1plus: error: unrecognized command line option "-std=c++11"
make[3]: *** [src/libuwind.cxx.o] Error 1
make[2]: *** [libosxunwind-0.0.1-rc3/libosxunwind.dylib] Error 2
make[1]: *** [julia-release] Error 2
make: *** [release] Error 2

@Keno
Copy link
Member Author

Keno commented Sep 3, 2013

Try updating gcc.

@staticfloat
Copy link
Sponsor Member

@loladiro would clang maybe work as well?

@Keno
Copy link
Member Author

Keno commented Sep 3, 2013

Yes, clang does work, but I thought clang was the default on OS X (meaning that @BobPortmann would have had to explicitly change it, hence I didn't suggest it).

@BobPortmann
Copy link
Contributor

I am on OSX 10.6 where gcc is the default. I was under the impression that clang did not work on 10.6.

@staticfloat
Copy link
Sponsor Member

@BobPortmann it depends on what version of clang you have installed, which depends on how old your Xcode/CLI tools are. It's been a while since I've been on 10.6 so I can't verify that it will work, but that is probably easier than building/installing a new GCC.

@BobPortmann
Copy link
Contributor

When I try using gcc 4.7.3 from Macports I get the error

src/unw_getcontext.s:219:invalid character '_' in mnemonic
src/unw_getcontext.s:220:invalid character '_' in mnemonic
src/unw_getcontext.s:221:invalid character '_' in mnemonic
src/unw_getcontext.s:223:no such instruction: `li r3, 0'
src/unw_getcontext.s:223:no such instruction: `return UNW_ESUCCESS'
src/unw_getcontext.s:224:no such instruction: `blr'
make[3]: *** [src/unw_getcontext.s.o] Error 1
make[2]: *** [libosxunwind-0.0.1-rc3/libosxunwind.dylib] Error 2
make[1]: *** [julia-release] Error 2
make: *** [release] Error 2

I have not tried using clang yet but the version on 10.6 is very old. I could try the Macports version. I thought there was a reason for the default being gcc on 10.6?

@StefanKarpinski
Copy link
Sponsor Member

I'm pretty sure that's the reason – because 10.6's clang is so old.

@ViralBShah
Copy link
Member

10.6 does not even have clang, IIRC.

@BobPortmann
Copy link
Contributor

Crow julia> uname -v
10.6 does seem to have clang:

uname -v
Darwin Kernel Version 10.8.0: Tue Jun 7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386
where clang
clang is /usr/bin/clang
clang -v
Apple clang version 1.6 (tags/Apple/clang-70)
Target: x86_64-apple-darwin10
Thread model: posix

@BobPortmann
Copy link
Contributor

sorry that got munged:

> uname -v
Darwin Kernel Version 10.8.0: Tue Jun  7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386
> where clang
clang is /usr/bin/clang
> clang -v
Apple clang version 1.6 (tags/Apple/clang-70)
Target: x86_64-apple-darwin10
Thread model: posix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segfault in profiler
8 participants