Skip to content
This repository has been archived by the owner on Oct 12, 2022. It is now read-only.

Fix Issue 18068 - No file names and line numbers in stack trace #2172

Closed
wants to merge 3 commits into from

Conversation

wilzbach
Copy link
Member

@wilzbach wilzbach commented May 3, 2018

Turns out all that changed was PIE and the addresses with ASLR.

Example:

import std.exception;
void main() {
    enforce(0);
}

So we finally get our line number back in the stack trace on Linux:

> dmddev -g hello.d && ./hello
/home/seb/dlang/dmd/generated/linux/release/64/../../../../../phobos/std/exception.d:515 pure @safe void std.exception.bailOut!(Exception).bailOut(immutable(char)[], ulong, const(char[])) [0x555869fe]
/home/seb/dlang/dmd/generated/linux/release/64/../../../../../phobos/std/exception.d:436 pure @safe int std.exception.enforce!().enforce!(int).enforce(int, lazy const(char)[], immutable(char)[], ulong) [0x55586979]
hello.d:3 _Dmain [0x555868f7]

CC @JinShil @ZombineDev

This is a PoC because I'm not sure on the best way to get the base offset of the binary.

@wilzbach wilzbach requested a review from andralex as a code owner May 3, 2018 02:49
@dlang-bot
Copy link
Contributor

dlang-bot commented May 3, 2018

Thanks for your pull request, @wilzbach!

Bugzilla references

Auto-close Bugzilla Severity Description
18068 regression No file names and line numbers in stack trace

⚠️⚠️⚠️ Warnings ⚠️⚠️⚠️

To target stable perform these two steps:

  1. Rebase your branch to upstream/stable:
git rebase --onto upstream/stable upstream/master
  1. Change the base branch of your PR to stable

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "master + druntime#2172"

@dlang-bot dlang-bot added the Bug Fix Include reference to corresponding bugzilla issue label May 3, 2018
@JinShil
Copy link
Contributor

JinShil commented May 3, 2018

Nice work! I think you can use https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/baselib-dladdr-3.html

It's already bound in the druntime at

int dladdr(in void* __address, Dl_info* __info);

@wilzbach
Copy link
Member Author

wilzbach commented May 3, 2018

The behavior of dladdr() is only specified in dynamically linked programs.

However, using the first result from dl_iterate_phdr seems to work too.

@JinShil
Copy link
Contributor

JinShil commented May 3, 2018

Tested locally. DMD test17559 finally passes. In fact for the first time, the entire DMD test suite passes when run locally on my PC. Great!

@wilzbach wilzbach changed the title [PoC] Fix Issue 18068 - No file names and line numbers in stack trace Fix Issue 18068 - No file names and line numbers in stack trace May 3, 2018
@JinShil
Copy link
Contributor

JinShil commented May 3, 2018

Did a little more testing locally. This is working great.

@kinke
Copy link
Contributor

kinke commented May 3, 2018

I had a go at this too as one of the fixes in #2151. I was unsure about how to obtain the base address too and came up with cumbersome parsing of /proc/self/maps. Other differences:

  • As only the executable's DWARF lineinfo data is used, I check the ELF header whether it's a DSO, and only then use its base address.
  • I add the base address to the relative DWARF address, making sure the virtual address is unique, which may not be the case in this version here, where the base offset is subtracted from each frame address (incl. frames in other DSOs). If unlucky, you'll get wrong file/line infos for such frames. Erm nope that's not possible, there won't be any lineinfos for addresses outside the executable's memory region.

}
ElfObj elfObj;
dl_iterate_phdr(&dl_iterate_phdr_cb_ngc_tracehandler, &elfObj);
ElfW!"Addr" offset = elfObj.l_addr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto?

else
{
// non-ASLR fallback
size_t offset;
Copy link
Member

@ibuclaw ibuclaw May 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= 0 ?

@wilzbach
Copy link
Member Author

wilzbach commented May 4, 2018

I was unsure about how to obtain the base address too

FYI @MartinNowak and I looked at the implementation in FreeBSD and Linux and dl_iterate_phdr will always iterate over the main binary for the first iteration.

FreeBSD:

https://github.com/freebsd/freebsd/blob/d9ac9c210e73d5141195a5497d3e96dfc755d1dc/libexec/rtld-elf/rtld.c#L572
https://github.com/freebsd/freebsd/blob/d9ac9c210e73d5141195a5497d3e96dfc755d1dc/libexec/rtld-elf/rtld.c#L637
https://github.com/freebsd/freebsd/blob/d9ac9c210e73d5141195a5497d3e96dfc755d1dc/libexec/rtld-elf/rtld.c#L3723

Linux:

https://github.com/bminor/glibc/blob/15c19147a8361b7281519b9784b3b823c447cf7a/elf/dl-iteratephdr.c#L32
https://github.com/bminor/glibc/blob/master/elf/rtld.c#L1114

(LM_ID_BASE is always 0 - see https://github.com/bminor/glibc/blob/09533208febe923479261a27b7691abef297d604/dlfcn/dlfcn.h#L47)


I had a go at this too as one of the fixes in #2151

That's a shame to have this work duplicated :/

As only the executable's DWARF lineinfo data is used, I check the ELF header whether it's a DSO, and only then use its base address.

@MartinNowak recommended to only to use the executable code segment (and not the entire DSO), but apparently that currently fails on the CIs. I'm not sure that's actually necessary. Can the binary segment order really be dynamically changed by the loader?

@kinke
Copy link
Contributor

kinke commented May 4, 2018

FYI MartinNowak and I looked at the implementation in FreeBSD and Linux and dl_iterate_phdr will always iterate over the main binary for the first iteration.

Good, my main point was that the fix shouldn't be Linux-only.

That's a shame to have this work duplicated

I expected the bot to add something to the dlang issue. - Glad the maps-file parsing can be avoided, so it did pay off. :)

@MartinNowak recommended to only to use the executable code segment (and not the entire DSO), but apparently that currently fails on the CIs. I'm not sure that's actually necessary. Can the binary segment order really be dynamically changed by the loader?

No idea, but I'd have considered it unnecessary as well. My concern was that offsetting the addresses for non-relocatable executables may likely be wrong, but I see that you added the check now. - CI fails because of missing .sys in BSD imports.

I'd suggest applying the offset to the DWARF addresses in resolveAddresses() as I do. You don't have to touch the frame addresses this way, which are later used in the stack trace string, so you don't end up with invalid relative addresses for frames outside the executable & keep all frame addresses absolute.
You'll also have to implement (dummy) baseAddress() for new rt.backtrace.macho. And I recommend returning a void* or size_t instead of the platform-dependent ElfBaseAddress struct.

@wilzbach wilzbach force-pushed the debug_line-aslr branch 2 times, most recently from cee17a2 to 65cbc66 Compare May 7, 2018 14:47
@wilzbach wilzbach force-pushed the debug_line-aslr branch 2 times, most recently from 309ddda to a0abca3 Compare May 7, 2018 18:50
@wilzbach
Copy link
Member Author

wilzbach commented May 7, 2018

No idea, but I'd have considered it unnecessary as well. My concern was that offsetting the addresses for non-relocatable executables may likely be wrong, but I see that you added the check now.

I actually removed the check for now as it didn't seem to make a difference.

BTW I also added a test for shared libraries and it seems like debug symbols from shared libraries don't work at all (and have never worked). I still added the test, s.t. it's easier to fix this in a follow-up.

@@ -58,6 +58,62 @@ struct Image

return null;
}

@property size_t baseAddress()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an awful lot of logic for a property, and would be run needlessly if called multiple times by a caller that isn't aware of the implementation. The baseAddress shouldn't change during the lifetime of an Image, so wouldn't it be better to put this in the constructor, and store the result as a field?

@kinke
Copy link
Contributor

kinke commented May 16, 2018

BTW I also added a test for shared libraries and it seems like debug symbols from shared libraries don't work at all (and have never worked).

I see 2 remaining problems with the current backtrace code (in general, unrelated to this PR):

  • As mentioned above, only the executable's DWARF lineinfo section is scanned, so no file/line infos for frames in other DSOs.
  • The function names are derived from the dynamic symbols table (via backtrace_symbols()), requiring the ld flag -export-dynamic (added implicitly by DMD, not by LDC), which may significantly increase the binary file size. I guess DWARF debuginfos could be used here as well (if available via -g, falling back to backtrace_symbols() for release binaries).

@jacob-carlborg
Copy link
Contributor

What's the status of this?

@wilzbach
Copy link
Member Author

Sorry. I haven't had much time for this lately, but I will get to this once I am back from my hiking trip (~end of next week).
IIRC not much was missing here, shared libraries have never been supported for getting the debug info.

@jacob-carlborg
Copy link
Contributor

Sorry. I haven't had much time for this lately, but I will get to this once I am back from my hiking trip (~end of next week).

Why are you replying here during your hike 😃.

@wilzbach
Copy link
Member Author

(Because it takes about 12h to get there and luckily there's free roaming in Europe nowadays :) )

@JinShil
Copy link
Contributor

JinShil commented Jun 8, 2018

Ping! Is this ready to go? It'd be awesome to get this in the upcoming release.

@wilzbach
Copy link
Member Author

Superseded by #2230

@wilzbach wilzbach closed this Jun 26, 2018
@wilzbach wilzbach deleted the debug_line-aslr branch August 20, 2018 12:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug Fix Include reference to corresponding bugzilla issue
Projects
None yet
6 participants