Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose debug-id via API, or include in stacktrace #43274

Closed
bruno-garcia opened this issue Sep 1, 2020 · 11 comments
Closed

Expose debug-id via API, or include in stacktrace #43274

bruno-garcia opened this issue Sep 1, 2020 · 11 comments
Assignees
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends.

Comments

@bruno-garcia
Copy link
Contributor

bruno-garcia commented Sep 1, 2020

Today the story to symbolicate stacktraces when DWARF information is split from the library is to pass the string stack trace, such as:

$ cat stacktrace.txt

Warning: This VM has been configured to produce stack traces that violate the Dart standard.
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 29278, tid: 29340, name 1.ui
isolate_dso_base: 6fe9d64000, vm_dso_base: 6fe9d64000
isolate_instructions: 6fe9d74000, vm_instructions: 6fe9d66000
    #00 abs 0000006fe9f4e87b virt 00000000001ea87b _kDartIsolateSnapshotInstructions+0x1da87b
    #01 abs 0000006fe9f4e4a3 virt 00000000001ea4a3 _kDartIsolateSnapshotInstructions+0x1da4a3
    #02 abs 0000006fe9d83ca3 virt 000000000001fca3 _kDartIsolateSnapshotInstructions+0xfca3
    #03 abs 0000006fe9f06513 virt 00000000001a2513 _kDartIsolateSnapshotInstructions+0x192513
    #04 abs 0000006fe9f0b457 virt 00000000001a7457 _kDartIsolateSnapshotInstructions+0x197457
    #05 abs 0000006fe9f5150f virt 00000000001ed50f _kDartIsolateSnapshotInstructions+0x1dd50f
    #06 abs 0000006fe9d83d07 virt 000000000001fd07 _kDartIsolateSnapshotInstructions+0xfd07
    #07 abs 0000006fe9f06513 virt 00000000001a2513 _kDartIsolateSnapshotInstructions+0x192513
    #08 abs 0000006fe9f0b457 virt 00000000001a7457 _kDartIsolateSnapshotInstructions+0x197457
    #09 abs 0000006fe9f5150f virt 00000000001ed50f _kDartIsolateSnapshotInstructions+0x1dd50f
    #10 abs 0000006fe9d82fc7 virt 000000000001efc7 _kDartIsolateSnapshotInstructions+0xefc7
    #11 abs 0000006fe9d82eab virt 000000000001eeab _kDartIsolateSnapshotInstructions+0xeeab

To the native_stack_trace tool, for example:

 decode translate -d app.android-arm64.symbols -v -i stacktrace.txt

Even though this is very straight forward, it has some limitations:

  1. Assumes the developer knows exactly what file contains the correct DWARF information.
  2. Tool is built to parse the whole stack trace (including header with isolate/vm addr). Assumes you log the whole string.

To address the first point, I'd like to ask (or propose) that the stacktrace include the relevant debug_id.
A debug id was added to generated ELF files and their debug files (when split) through this change, it would be helpful if that debug-id was included.

native_stack_trace could even check against that debug_id to make sure it in fact received the correct file as a parameter and warn the user otherwise.

If that's no an option (i.e: don't want to change the exception string format), please provide an API that we can query in Dart to get the debug-id. That would be used by crash reporting tools ( like https://sentry.io ), to report which debug_id to use when symbolicating the stack trace on the server.

Point number 2 could be addressed by having a more fine grained API on native_stack_trace package to return a set of objects describing the stack trace (with frames and addresses). Or ideally a way to convert the dynamic stacktrace at runtime to such representation, in order to avoid having to parse the string on the client to report the frames to the server.

This is a blocker for this Flutter issue.

@mraleph
Copy link
Member

mraleph commented Sep 1, 2020

This looks like a reasonable request to me. I see that @sstrickl has already commented on the original Flutter issue - the main complication seems to be accessing build-id (LC_UUID) in iOS builds.

Tentatively assigning to @sstrickl

@mraleph mraleph added the area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. label Sep 1, 2020
@sstrickl
Copy link
Contributor

Just a note, I have started on this, but it's requiring a bit of rework on how we do an end-run around the embedder API for extra information that only exists for precompiled snapshots, and I still need to check into how to handle the build ID generation for assembly snapshots. Will update here if I run into any blockers (none expected at the moment though).

@sstrickl
Copy link
Contributor

sstrickl commented Sep 17, 2020

CL 163207 is now under review. That CL adds build IDs to non-symbolic stack traces for ELF-compiled snapshots.

Having build IDs be consistent between assembly snapshots and their separate ELF debugging information (by generating them ourselves for assembly output) remains to be done.

dart-bot pushed a commit that referenced this issue Sep 22, 2020
Since we've run out of room for more fields in the Image object header
on 64-bit architectures, the serializer instead creates an ImageHeader
object for precompiled snapshots that is placed at the start of text
segments. The new ImageHeader object contains the following information:

* The offset of the BSS segment from the text segment, previously
  stored in the Image object header.

* The relocated address of the text segment in the dynamic shared
  object. Due to restrictions when generating assembly snapshots, this
  field is only set for ELF snapshots, and so it can also be used to
  detect whether a snapshot was compiled to assembly or ELF.

* The offset of the build ID description field from the text segment.

* The length of the build ID description field.

We replace the BSS offset in the Image object header with the offset of
the ImageHeader object within the text segment, so that we can detect
when a given Image has an ImageHeader object available.

There are no methods available on ImageHeader objects, but instead the
Image itself controls access to the information. In particular, the
relocated address method either returns the relocated address
information from the ImageHeader object or from the initialized BSS
depending on the type of snapshot, so the caller need not do this work.
Also, instead of returning the raw offset to the BSS section and having
the caller turn that into an appropriate pointer, the method for
accessing the BSS segment now returns a pointer to the segment.

Bug: #43274
Cq-Include-Trybots: luci.dart.try:vm-precomp-ffi-qemu-linux-release-arm-try,vm-kernel-precomp-android-release-arm64-try,vm-kernel-precomp-android-release-arm_x64-try
Change-Id: I15eae4ad0a088260b127f3d07da79374215b7f56
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/163207
Commit-Queue: Tess Strickland <sstrickl@google.com>
Reviewed-by: Daco Harkes <dacoharkes@google.com>
@sstrickl
Copy link
Contributor

The CL mentioned above has just landed (and I believe I've tested with enough trybots that don't expect a revert, but of course we'll see if anything surprising shows up).

CL 163585 is a followup that does the same for assembly, including generating our own build ID section in assembly for ELF-native platforms. I plan to put it up for review now. It does leave the question open for what to do about snapshots that are split into multiple loading units (right now no build IDs are generated for such), but I've created #43516 for thinking about that as I don't think that's yet a primary workflow.

@marandaneto
Copy link

Hi. We'd like to know if there's something we can help with. This is a blocker for flutter/flutter#59321 and we'd like to try our symbolication on the server as well, let us know if we could be of any help, thanks.

@sstrickl
Copy link
Contributor

So just to clarify the current state, ELF snapshots should already include a build ID that's also reported in non-symbolic stack traces. Assembly ones do not, for two major reasons:

  • When assembling to ELF, clang and gcc both include their own GNU build ID unless specifically told not to.
  • When assembling to Mach-O, I don't know if there's a different standard for build IDs there.

The CL I mentioned above is still valid (though needs updating due to changes since), so I'll return to it now and see about getting it in. I think I'll actually split it into two parts:

  • An initial CL that adds the information to assembly snapshots to print a build ID in non-symbolic stack traces which matches a build ID section in the separately generated debugging information (which is always in ELF format currently and thus we can add the GNU build ID note section to it).
  • A followup CL that, for ELF-native targets (e.g., Linux), adds directives in the assembly to generate a GNU build ID note section with the same information.

With only the first, then you can correlate non-symbolic stack traces to the separate debugging information for both ELF and assembly, but you can only correlate them with unstripped snapshots for direct-to-ELF snapshots. That might be enough for all needs and the second CL may not be necessary.

If we do the second as well, then there'll still be some work that I'll need to coordinate with Flutter and internal customers, to change their assembly process for snapshots in their toolchains to elide the assembler-added build ID. Until that happens, there'll be two build ID sections in each assembled ELF snapshot, only one of which will match the reported build ID in non-symbol stack traces, and I'm not sure how well external tools will handle multiple build ID sections.

@jan-auer
Copy link

jan-auer commented Nov 26, 2020

When assembling to Mach-O, I don't know if there's a different standard for build IDs there.

Compilers should place it in the LC_UUID load command. As the name suggests, it has to be a UUID, which differs from ELF where either MD5 or SHA1 of the code (.text section) are used. I've seen compilers produce reproducible UUIDs, which suggest they are digesting the code as well.

When dsymutil generates the dSYM structure, it also ensures that the LC_UUID header is copied into the dSYM file, along with all moved debug sections.

but you can only correlate them with unstripped snapshots for direct-to-ELF snapshots

Can you elaborate on this? As far as I'm aware, most tools in this space are aware of the build ID program header (NT_GNU_BUILD_ID) and section (.note.gnu.build-id). Both strip and objcopy ensure that the sections are never stripped but always copied over.

I still have to look into how your assembly process works, so please forgive me if I'm making wrong assumptions. Ideally, your assembly process does not care about build ids and lets standard tooling like the linker take care of that. If you have custom tooling for this, then the only thing to do is ensure the sections do not get removed or get copied (depending on whether you strip or split).

Since the libraries are loaded into readable process memory at runtime, you need no further modifications to the binary. Usually, debuggers and crash reporting tools obtain a list of all loaded libraries that includes:

  • the path and name of the library
  • the memory address at which the library is loaded
  • the platform-dependent identifier

You can find examples for this in the Sentry Native SDK here: getsentry/sentry-native/src/modulefinder. We have actually started using this code for Flutter in the meanwhile, as it allows us to easily symbolicate frames even from third-party and system libraries using the standard approach for native symbolication:

  • Take the absolute address reported by the Dart VM
  • Subtract the base address from the library the address points into
  • Look up the relative address in the DWARF information resolved using the build ID

@jan-auer
Copy link

By the way, I'm happy to contribute to this if you like, but may need some guidance for the places to look at in the Dart project.

I thought it may be nice to share an example of how the information could be reported and how this looks in Sentry. Let's start with the library list read with the code I linked above. It includes the name of the library, the absolute memory range, and the build id called "Code ID":

image

It's important to note that this list is much longer and includes all loaded libraries. It's definitely worth exposing all of them in case the stack trace includes system frames, or calls to third-party native modules.

The reported and symbolicated stack trace then looks like this. We report the absolute addresses only:

image

The corresponding relative addresses actually used for the lookup in debug information would then be these, for instance:

image

@yanivshaked
Copy link

Hi @sstrickl,
Is there any progress on this issue?
Thanks

@fzyzcjy
Copy link
Contributor

fzyzcjy commented Jul 25, 2021

Hi, is there any updates? Thanks!

@bruno-garcia
Copy link
Contributor Author

Closing we get the right debug files in each platform already

@bruno-garcia bruno-garcia closed this as not planned Won't fix, can't repro, duplicate, stale Mar 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends.
Projects
None yet
Development

No branches or pull requests

7 participants