New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backtrace line numbers #3303

Merged
merged 8 commits into from Nov 18, 2016

Conversation

Projects
None yet
6 participants
@ysbaddaden
Member

ysbaddaden commented Sep 14, 2016

Alternative to #3170 which implements ELF, Mach-O and DWARF parsers (instead on relying on the external addr2line executable) to parse the running program, looking for the .debug_line section to be decompressed to print a backtrace which includes the originating file line and columns for each line of the backtrace.

For example:

0x452124: ??? at /home/github/crystal/test_addr2line.cr 4:5
0x42be4e: ??? at /home/github/crystal/test_addr2line.cr 13:13
0x428043: ??? at /home/github/crystal/src/signal.cr 129:1
0x42bd29: main at /home/julien/.cache/crystal/macro67505680.cr 12:15
0x7fcffd0b9f45: __libc_start_main at ??
0x4279f9: ??? at ??
0x0: ??? at ??

As you may notice, the symbol names are sometimes not properly found. Work as begun to parse the DWARF address to symbol debug information. Since the binaries generated on Linux don't contain the .debug_aranges section (neither with gcc or clang), I begun to reconstruct it from the .debug_info (which relies on the .debug_abbrev and .debug_str sections) instead of creating a parser for DWARF aranges.

TODO:

  • parse ELF files (Linux, FreeBSD)
  • parse Mach-O files (OSX)
    • locate the "companion dSYM file" with the DWARF symbols (OSX)
    • determinate the "image slide" of the loaded program (ASLR) to offset the addresses
    • read DWARF sections from .o files (mapping offsets) (abandoned, see #3303 (comment))
  • parse DWARF line number section
  • parse DWARF info, abbrev and str sections
  • print file line:column information when printing a backtrace (#1270)
  • skip implementation details when raising an exception (#3161)

Hopefully OSX now uses DWARF instead of STABS, so we won't have to add support for another debug format to support OSX. OSX keep the DWARF sections in the .o files but will NEVER put them into the executable 馃槶

POSTPONED (follow up pull request):

  • print the function name from the DWARF info (then fallback on seeking the symbol with dladdr)
  • parse symtabs section of ELF files (when DWARF info is missing)
  • parse symtabs section of Mach-O files (when DWARF info is missing)
Show outdated Hide outdated src/callstack.cr
@ysbaddaden

This comment has been minimized.

Show comment
Hide comment
@ysbaddaden

ysbaddaden Sep 16, 2016

Member

Here comes support for parsing Mach-O files... except that the DWARF sections aren't located in the executable Mach-O file itself (unlike ELF), but in its companion dSYM file (which is also a Mach-O file) which isn't even generated unless we ask for it using dsymutil (or so it seems).

I'm not exactly sure what to do.

  • The dumbest idea would be for crystal to call dsymutil and save a foo.dsym file alongside a foo executable
  • The craziest idea would be to call dsymutil, parse th resulting dSYM, and... reinject the relevant __debug_* sections back into the executable :neckbeard:
Member

ysbaddaden commented Sep 16, 2016

Here comes support for parsing Mach-O files... except that the DWARF sections aren't located in the executable Mach-O file itself (unlike ELF), but in its companion dSYM file (which is also a Mach-O file) which isn't even generated unless we ask for it using dsymutil (or so it seems).

I'm not exactly sure what to do.

  • The dumbest idea would be for crystal to call dsymutil and save a foo.dsym file alongside a foo executable
  • The craziest idea would be to call dsymutil, parse th resulting dSYM, and... reinject the relevant __debug_* sections back into the executable :neckbeard:
@ysbaddaden

This comment has been minimized.

Show comment
Hide comment
@ysbaddaden

ysbaddaden Sep 16, 2016

Member

Note that Go actually generates its own sections (__gopclntab for pc -> line numbers and __gosymtab for symbols) using a Plan9 inspired format, and injects them somehow into the compiled executable (for all platforms it seems).

Member

ysbaddaden commented Sep 16, 2016

Note that Go actually generates its own sections (__gopclntab for pc -> line numbers and __gosymtab for symbols) using a Plan9 inspired format, and injects them somehow into the compiled executable (for all platforms it seems).

@ysbaddaden

This comment has been minimized.

Show comment
Hide comment
@ysbaddaden

ysbaddaden Sep 17, 2016

Member

Injecting segments and sections at link time is relatively easy. We just have to pass -createsect segname sectname file. Hence I believe this is what Go does. I couldn't find tools to inject sections after the executable is linked.

Calling dsymutil --flat foo creates the foo.dwarf dSYM Mach-O companion file with the wanted __DWARF segment.

Member

ysbaddaden commented Sep 17, 2016

Injecting segments and sections at link time is relatively easy. We just have to pass -createsect segname sectname file. Hence I believe this is what Go does. I couldn't find tools to inject sections after the executable is linked.

Calling dsymutil --flat foo creates the foo.dwarf dSYM Mach-O companion file with the wanted __DWARF segment.

@ysbaddaden

This comment has been minimized.

Show comment
Hide comment
@ysbaddaden

ysbaddaden Sep 19, 2016

Member

@waj I investigated parsing the DWARF directly from the .o files: it's hard!

The .o files are Mach-O object files (easy) and they indeed have a __DWARF segment with the expected sections (eg: __debug_line, yay). The DWARF sections are now segmented into many files instead of having a single section for the executable, and we need to manually map/offset them (argh). I found the debug map in the executable as stabs entries in the __symtab section, but this is a mapping of symbols for each .o file, but I got lost understanding how to remap the __debug_line section... (which maps address ranges to file:line:column, not symbols).

The best information I could find (not actual documentation):

I propose to:

  1. run dsymutil to generate a .dwarf file in crystal's cache folder (alongside the .o files);
  2. look for a DSYM file, respecting Apple's standard, that is look for a .dSYM folder a .dwarf file (checking the UUID to invalidate older copies), first alongside the binary (if the user ran dsymutil herself, for shipping OSX software for example), then fallback to look inside the crystal cache folder.

I'm too lazy to invest more time in redoing what the dsymutil tool is doing (for a platform I don't use), so I propose to merely use it, open a new issue about mapping the DWARF from the .o files. Maybe someone will be motivated enough to work on that.

Line Numbers in optimized code

As per the LLVM documentation, the DWARF information should be as proper in optimized builds as in non-optimized builds, so line number information (and everything else) should still be relevant when shipping software built with --release. See http://llvm.org/docs/SourceLevelDebugging.html#debugging-optimized-code

Member

ysbaddaden commented Sep 19, 2016

@waj I investigated parsing the DWARF directly from the .o files: it's hard!

The .o files are Mach-O object files (easy) and they indeed have a __DWARF segment with the expected sections (eg: __debug_line, yay). The DWARF sections are now segmented into many files instead of having a single section for the executable, and we need to manually map/offset them (argh). I found the debug map in the executable as stabs entries in the __symtab section, but this is a mapping of symbols for each .o file, but I got lost understanding how to remap the __debug_line section... (which maps address ranges to file:line:column, not symbols).

The best information I could find (not actual documentation):

I propose to:

  1. run dsymutil to generate a .dwarf file in crystal's cache folder (alongside the .o files);
  2. look for a DSYM file, respecting Apple's standard, that is look for a .dSYM folder a .dwarf file (checking the UUID to invalidate older copies), first alongside the binary (if the user ran dsymutil herself, for shipping OSX software for example), then fallback to look inside the crystal cache folder.

I'm too lazy to invest more time in redoing what the dsymutil tool is doing (for a platform I don't use), so I propose to merely use it, open a new issue about mapping the DWARF from the .o files. Maybe someone will be motivated enough to work on that.

Line Numbers in optimized code

As per the LLVM documentation, the DWARF information should be as proper in optimized builds as in non-optimized builds, so line number information (and everything else) should still be relevant when shipping software built with --release. See http://llvm.org/docs/SourceLevelDebugging.html#debugging-optimized-code

@ysbaddaden

This comment has been minimized.

Show comment
Hide comment
@ysbaddaden

ysbaddaden Oct 9, 2016

Member

Now with support for OSX... somewhat... that is only if there is an up-to-date dSYM bundle generated with dsymutil, either as foo.dwarf (--flat) or as foo.dSYM/Contents/Resources/DWARF/foo (default).

Example:

$ crystal build --debug foo.cr
$ dsymutil --flat foo
$ foo
OH NO (DEF)! (Exception)
0x10b260c0b: *Kls#callee1:NoReturn at /Users/julien/test_addr2line.cr 4:5
0x10b2396f1: *callee3:NoReturn at /Users/julien/test_addr2line.cr 13:13
0x10b235c5f: __crystal_main at /Users/julien/crystal/src/signal.cr 129:1
0x10b2395c8: main at /Users/julien/.cache/crystal/macro4378508480.cr 12:15

Maybe Crystal could run dsymutil by itself whenever compiling with --debug (either in build, run or eval), or maybe we could call it when decoding a backtrace? Thought we'll have the same issue than in #3170 (comment)

Member

ysbaddaden commented Oct 9, 2016

Now with support for OSX... somewhat... that is only if there is an up-to-date dSYM bundle generated with dsymutil, either as foo.dwarf (--flat) or as foo.dSYM/Contents/Resources/DWARF/foo (default).

Example:

$ crystal build --debug foo.cr
$ dsymutil --flat foo
$ foo
OH NO (DEF)! (Exception)
0x10b260c0b: *Kls#callee1:NoReturn at /Users/julien/test_addr2line.cr 4:5
0x10b2396f1: *callee3:NoReturn at /Users/julien/test_addr2line.cr 13:13
0x10b235c5f: __crystal_main at /Users/julien/crystal/src/signal.cr 129:1
0x10b2395c8: main at /Users/julien/.cache/crystal/macro4378508480.cr 12:15

Maybe Crystal could run dsymutil by itself whenever compiling with --debug (either in build, run or eval), or maybe we could call it when decoding a backtrace? Thought we'll have the same issue than in #3170 (comment)

@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Oct 11, 2016

Contributor

It seems for 32 bits a type is wrong. After that I think this should be green :-)

Contributor

asterite commented Oct 11, 2016

It seems for 32 bits a type is wrong. After that I think this should be green :-)

@luislavena

This comment has been minimized.

Show comment
Hide comment
@luislavena

luislavena Oct 11, 2016

Contributor

Hello @ysbaddaden, this is pure gold! 馃槃

I did a quick test and compared with lldb (how I use it right now) and your changes:

def foo
  bar
end

def bar
  baz
end

def baz
  raise "BOOM"
end

foo
$ bin/crystal build --debug foo.cr
Using compiled compiler at .build/crystal

Current: https://gist.github.com/luislavena/b323cbd50600171f355cad4c31e1e0f1#file-run-foo-log
LLDB: https://gist.github.com/luislavena/b323cbd50600171f355cad4c31e1e0f1#file-lldb-foo-log
This: https://gist.github.com/luislavena/b323cbd50600171f355cad4c31e1e0f1#file-core-trace-foo-log

Noticed this misses some of the symbols from the callstack (foo & bar), but got the last one correctly.

Wondering if there are chances we can learn something from Rust and backtraces-rs

Contributor

luislavena commented Oct 11, 2016

Hello @ysbaddaden, this is pure gold! 馃槃

I did a quick test and compared with lldb (how I use it right now) and your changes:

def foo
  bar
end

def bar
  baz
end

def baz
  raise "BOOM"
end

foo
$ bin/crystal build --debug foo.cr
Using compiled compiler at .build/crystal

Current: https://gist.github.com/luislavena/b323cbd50600171f355cad4c31e1e0f1#file-run-foo-log
LLDB: https://gist.github.com/luislavena/b323cbd50600171f355cad4c31e1e0f1#file-lldb-foo-log
This: https://gist.github.com/luislavena/b323cbd50600171f355cad4c31e1e0f1#file-core-trace-foo-log

Noticed this misses some of the symbols from the callstack (foo & bar), but got the last one correctly.

Wondering if there are chances we can learn something from Rust and backtraces-rs

@ysbaddaden

This comment has been minimized.

Show comment
Hide comment
@ysbaddaden

ysbaddaden Oct 11, 2016

Member

@luislavena I don't decode the function names from the DWARF sections (yet). It's still using dladdr, which fails to get some symbols. Hence the difference in *baz:NoReturn (symbol) in the backtrace vs baz (dwarf) in LLDB.

Member

ysbaddaden commented Oct 11, 2016

@luislavena I don't decode the function names from the DWARF sections (yet). It's still using dladdr, which fails to get some symbols. Hence the difference in *baz:NoReturn (symbol) in the backtrace vs baz (dwarf) in LLDB.

@ysbaddaden

This comment has been minimized.

Show comment
Hide comment
@ysbaddaden

ysbaddaden Oct 11, 2016

Member

I still have more work to achieve because of ASLR on some platforms. I already fixed it manually for OSX (using _dyld_image_slice) but OpenBSD also has ASLR which I fixed with dladdr. I hoped it would be a cross platform solution, but I don't have the same behavior in Linux (weak ASLR) than OpenBSD. I'll have to test on Alpine which has a patched kernel with strong ASLR, and do other round of tests on the different platforms...

Member

ysbaddaden commented Oct 11, 2016

I still have more work to achieve because of ASLR on some platforms. I already fixed it manually for OSX (using _dyld_image_slice) but OpenBSD also has ASLR which I fixed with dladdr. I hoped it would be a cross platform solution, but I don't have the same behavior in Linux (weak ASLR) than OpenBSD. I'll have to test on Alpine which has a patched kernel with strong ASLR, and do other round of tests on the different platforms...

@RX14 RX14 referenced this pull request Oct 25, 2016

Open

Improve backtraces #2785

@ysbaddaden

This comment has been minimized.

Show comment
Hide comment
@ysbaddaden

ysbaddaden Nov 6, 2016

Member

Almost done:

I fixed the ASLR issue on Linux with the PaX patch applied (e,g, Alpine Linux): just use dladdr to offset the address, like in OpenBSD and other platforms it would seem. ~~Maybe OSX doesn't need a special treatment too. I must verify that (EDIT: it needs special treatment). I must also check that FreeBSD and OpenBSD didn't break.

I also found an issue in DWARF::LineNumbers which wrongly expected the matrix' sequences to have growing addresses 鈥攆ound on Alpine Linux.

Symbols will come in another pull request. OSX prints all symbols, but linux is missing a lot, and on ARM, it doesn't print a single one :'(

Member

ysbaddaden commented Nov 6, 2016

Almost done:

I fixed the ASLR issue on Linux with the PaX patch applied (e,g, Alpine Linux): just use dladdr to offset the address, like in OpenBSD and other platforms it would seem. ~~Maybe OSX doesn't need a special treatment too. I must verify that (EDIT: it needs special treatment). I must also check that FreeBSD and OpenBSD didn't break.

I also found an issue in DWARF::LineNumbers which wrongly expected the matrix' sequences to have growing addresses 鈥攆ound on Alpine Linux.

Symbols will come in another pull request. OSX prints all symbols, but linux is missing a lot, and on ARM, it doesn't print a single one :'(

@ysbaddaden

This comment has been minimized.

Show comment
Hide comment
@ysbaddaden

ysbaddaden Nov 6, 2016

Member

It looks like Travis eventually fails because crystal tool format wants to format some generated macro files in .crystal 馃槙

Member

ysbaddaden commented Nov 6, 2016

It looks like Travis eventually fails because crystal tool format wants to format some generated macro files in .crystal 馃槙

@ysbaddaden ysbaddaden changed the title from Core backtrace line numbers [WIP] to Backtrace line numbers Nov 6, 2016

@ysbaddaden

This comment has been minimized.

Show comment
Hide comment
@ysbaddaden
Member

ysbaddaden commented Nov 6, 2016

@asterite done!

@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Nov 6, 2016

Contributor

So cool!!! I probably won't have time to try this until the next week (I'll be speaking at oredev), so feel free to ping others to try this, maybe @waj would like to give it a shot.

Contributor

asterite commented Nov 6, 2016

So cool!!! I probably won't have time to try this until the next week (I'll be speaking at oredev), so feel free to ping others to try this, maybe @waj would like to give it a shot.

ysbaddaden added some commits Sep 13, 2016

Add line numbers to backtraces
Supports all currently supported platforms: Darwin, FreeBSD, Linux
and OpenBSD. Takes care to offset the unwinded addresses to match
the fixed addresses of the DWARF sections whtn the platform uses
ASLR.
Skip implementation details in backtraces
Skips any call in the callstack that relates to creating the
exception, unwinding the stack or raising the exception, so the
first backtrace entry is the call that did raise an exception.

closes #3161
@ysbaddaden

This comment has been minimized.

Show comment
Hide comment
@ysbaddaden

ysbaddaden Nov 7, 2016

Member

Rebased on master with a clean commit history, and removed Debug::DWARF::Info and Debug::DWARF::Abbrev parsers, that will come in an upcoming pull request, where I'll replace the raw symbols for the nice function name stored in the .debug_aranges sections or from .debug_info if the former is missing (it happens).

Member

ysbaddaden commented Nov 7, 2016

Rebased on master with a clean commit history, and removed Debug::DWARF::Info and Debug::DWARF::Abbrev parsers, that will come in an upcoming pull request, where I'll replace the raw symbols for the nice function name stored in the .debug_aranges sections or from .debug_info if the former is missing (it happens).

require "tempfile"
describe "Backtrace" do
it "prints file line:colunm" do

This comment has been minimized.

@r-obert

r-obert Nov 7, 2016

line:column 馃槈

@r-obert

r-obert Nov 7, 2016

line:column 馃槈

@ysbaddaden ysbaddaden merged commit bdd3170 into crystal-lang:master Nov 18, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@ysbaddaden ysbaddaden deleted the ysbaddaden:core-backtrace-line-numbers branch Nov 18, 2016

@Sija

This comment has been minimized.

Show comment
Hide comment
@Sija

Sija Dec 7, 2016

Contributor

Backtrace lines for Proc are not consistent with the rest.

Instead of showing the file lineno:colno pattern after at, it shows @file:lineno string after method signature. See below:

0x106371d56: ~procProc(Nil)@/usr/local/Cellar/crystal-lang/0.20.1/src/http/server.cr:148 at ??
0x106361819: ~proc2Proc(Fiber, (IO::FileDescriptor | Nil))@/usr/local/Cellar/crystal-lang/0.20.1/src/fiber.cr:29 at ??
0x10636e3fb: ~proc4Proc(HTTP::Server::Context, String)@lib/kemal/src/kemal/route.cr:11 at ??
Contributor

Sija commented Dec 7, 2016

Backtrace lines for Proc are not consistent with the rest.

Instead of showing the file lineno:colno pattern after at, it shows @file:lineno string after method signature. See below:

0x106371d56: ~procProc(Nil)@/usr/local/Cellar/crystal-lang/0.20.1/src/http/server.cr:148 at ??
0x106361819: ~proc2Proc(Fiber, (IO::FileDescriptor | Nil))@/usr/local/Cellar/crystal-lang/0.20.1/src/fiber.cr:29 at ??
0x10636e3fb: ~proc4Proc(HTTP::Server::Context, String)@lib/kemal/src/kemal/route.cr:11 at ??
@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Dec 7, 2016

Contributor

That is actually the name of the proc that is being generated, and not part of the backtrace that @ysbaddaden is generating

Contributor

asterite commented Dec 7, 2016

That is actually the name of the proc that is being generated, and not part of the backtrace that @ysbaddaden is generating

@Sija

This comment has been minimized.

Show comment
Hide comment
@Sija

Sija Dec 7, 2016

Contributor

@asterite Would it be possible to move it into the backtrace part of the line?

Contributor

Sija commented Dec 7, 2016

@asterite Would it be possible to move it into the backtrace part of the line?

@RX14

This comment has been minimized.

Show comment
Hide comment
@RX14

RX14 Dec 7, 2016

Member

@Sija But it's part of the function symbol name, and we should display the exact function symbol names to aid using C debuggers and the likes.

Member

RX14 commented Dec 7, 2016

@Sija But it's part of the function symbol name, and we should display the exact function symbol names to aid using C debuggers and the likes.

@ysbaddaden

This comment has been minimized.

Show comment
Hide comment
@ysbaddaden

ysbaddaden Dec 7, 2016

Member

@Sija I suspect the compiler doesn't set the location when generating the LLVM IR for procs, because since it's a symbol, it should have a location in DWARF.

@RX14 actually we should display the function name from DWARF, instead of the mangled name. I don't know what name are given to procs, though.

Member

ysbaddaden commented Dec 7, 2016

@Sija I suspect the compiler doesn't set the location when generating the LLVM IR for procs, because since it's a symbol, it should have a location in DWARF.

@RX14 actually we should display the function name from DWARF, instead of the mangled name. I don't know what name are given to procs, though.

@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Dec 7, 2016

Contributor

Some time ago the name given to procs like ->{ ... } or captured blocks was ~proc1, ~proc2, etc. Not very descriptive. Now, since that's not very helpful, I decided to add their full location to the name. That way when they are shown in the backtrace you can trace back their location, even without compiling with -d. For example:

def foo
  a = ->{
    raise "OH NO!"
  }
  a.call
end

foo

Running it we get:

OH NO! (Exception)
0x108bbe662: *CallStack::unwind:Array(Pointer(Void)) at ??
0x108bbe601: *CallStack#initialize:Array(Pointer(Void)) at ??
0x108bbe5d8: *CallStack::new:CallStack at ??
0x108bba7a1: *raise<Exception>:NoReturn at ??
0x108bba781: *raise<String>:NoReturn at ??
0x108bbdcb3: ~procProc(NoReturn)@foo.cr:2 at ??
0x108bbdc87: *foo:NoReturn at ??
0x108bba348: __crystal_main at ??
0x108bbdb28: main at ??

Note how we can go back from the stack trace to every line in our program, except the line that effectively threw the exception. For methods, we'll, we just search def foo in our program in the corresponding type. For a proc, the info is right there.

We can change it back to ~procN, but then this info will be lost even without compiling with -d, so I'm not sure. I don't think those names bother much (on the contrary, I think they are very useful)

Contributor

asterite commented Dec 7, 2016

Some time ago the name given to procs like ->{ ... } or captured blocks was ~proc1, ~proc2, etc. Not very descriptive. Now, since that's not very helpful, I decided to add their full location to the name. That way when they are shown in the backtrace you can trace back their location, even without compiling with -d. For example:

def foo
  a = ->{
    raise "OH NO!"
  }
  a.call
end

foo

Running it we get:

OH NO! (Exception)
0x108bbe662: *CallStack::unwind:Array(Pointer(Void)) at ??
0x108bbe601: *CallStack#initialize:Array(Pointer(Void)) at ??
0x108bbe5d8: *CallStack::new:CallStack at ??
0x108bba7a1: *raise<Exception>:NoReturn at ??
0x108bba781: *raise<String>:NoReturn at ??
0x108bbdcb3: ~procProc(NoReturn)@foo.cr:2 at ??
0x108bbdc87: *foo:NoReturn at ??
0x108bba348: __crystal_main at ??
0x108bbdb28: main at ??

Note how we can go back from the stack trace to every line in our program, except the line that effectively threw the exception. For methods, we'll, we just search def foo in our program in the corresponding type. For a proc, the info is right there.

We can change it back to ~procN, but then this info will be lost even without compiling with -d, so I'm not sure. I don't think those names bother much (on the contrary, I think they are very useful)

@RX14

This comment has been minimized.

Show comment
Hide comment
@RX14

RX14 Dec 7, 2016

Member

@ysbaddaden I think we should have the option to display mangled names at least

Member

RX14 commented Dec 7, 2016

@ysbaddaden I think we should have the option to display mangled names at least

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment