Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DebugID for the auto generated obfuscation mapping file #51941

Closed
marandaneto opened this issue Apr 4, 2023 · 14 comments
Closed

DebugID for the auto generated obfuscation mapping file #51941

marandaneto opened this issue Apr 4, 2023 · 14 comments
Assignees
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends.

Comments

@marandaneto
Copy link

marandaneto commented Apr 4, 2023

Hi, thanks for all the hard work.

If you're compiling an AOT app such as a Flutter app with obfuscation enabled, you can't deobfuscate the runtime types, See caveat.

Example:

flutter build ios --split-debug-info=symbols --obfuscate

For tools such as sentry.io this is important because we'd like to have the best user experience, but if you collect user's breadcrumbs/view hierarchy/etc and they are all obfuscated, well, not that useful.

You can though export the mapping file to remap the obfuscated types by using the extra param --extra-gen-snapshot-options=--save-obfuscation-map=mapping.json.

flutter build ios --split-debug-info=symbols --obfuscate --extra-gen-snapshot-options=--save-obfuscation-map=mapping.json

The problem is that right now you can't associate the version of the app with the generated file, since the version isn't unique enough.

For some other platforms, we can hook into the build pipeline and auto-generate a DebugID before uploading the mapping file to sentry.io and injecting the DebugID to the app, so it's also accessible at runtime, with this association you can remap the obfuscated types on the server.

The problem is that Dart/Flutter itself does not have build hooks and hooking up into each build tool of each platform isn't scalable/too complicated/fragile since new versions might break the build hooks, etc.

We request that the obfuscation tooling does that automatically, by generating and appending a unique/deterministic DebugID to the mapping file, plus that the Dart language gives a way to access the DebugID at runtime through an API (maybe the dart:developer package?).

import 'dart:developer';

String? get debugId;

We're trying to solve this problem with some other platforms as well, for example, source maps, see our proposal for more details.

More context on this thread as well.

@srawlins srawlins added the area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. label Apr 4, 2023
@srawlins
Copy link
Member

srawlins commented Apr 4, 2023

Tentatively triaging to area-vm...

@mraleph
Copy link
Member

mraleph commented May 9, 2023

I think this is a reasonable request - it's not entirely clear to me where this should be done though and what kind of UUID we should be attaching. One option here is to somehow tie this to build-id of the binary, but this requires some post processing because for Mach-O LC_UUID is currently generated by external toolchain.

/cc @sstrickl @mkustermann

@mkustermann
Copy link
Member

It seems the purpose here is to de-obfuscate symbol names that were obtained at runtime (e.g. via <obj>.runtimeType.toString() or #<my-symbol-name>) rather than from symbolic tacktraces.

As @mraleph points out, our AOT compiler has a mode where it outputs assembly and as such doesn't know at compilation time the build-id. I'd say that this is already a little problematic for stack traces, as they

  • in elf mode: include build_id in stack traces, making it easy to identify the binary & find corresponding offline symbol information
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 1531081, tid: 140380622812864, name DartWorker
os: linux arch: x64 comp: no sim: no
build_id: '4c9d4f6cf2c9b64423b1d561fd810cb6'
isolate_dso_base: 7face9ea4000, vm_dso_base: 7face9ea4000
isolate_instructions: 7face9f354c0, vm_instructions: 7face9f30000
    #00 abs 00007face9fe6002 virt 0000000000142002 _kDartIsolateSnapshotInstructions+0xb0b42
    #01 abs 00007face9f8cfe0 virt 00000000000e8fe0 _kDartIsolateSnapshotInstructions+0x57b20
    #02 abs 00007face9f8ca37 virt 00000000000e8a37 _kDartIsolateSnapshotInstructions+0x57577
  • in assembly mode: do not include build_id, leaving no way to go back to binary & symbolic information (purely from stack):
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 1530943, tid: 140060745266880, name DartWorker
os: linux arch: x64 comp: no sim: no
isolate_dso_base: 7f626fbf0000, vm_dso_base: 7f626fbf0000
isolate_instructions: 7f626fbf65c0, vm_instructions: 7f626fbf1100
    #00 abs 00007f626fca7102 _kDartIsolateSnapshotInstructions+0xb0b42
    #01 abs 00007f626fc4e0e0 _kDartIsolateSnapshotInstructions+0x57b20
    #02 abs 00007f626fc4db37 _kDartIsolateSnapshotInstructions+0x57577

=> Possibly we could make our runtime find the build_id from the loaded ELF/MachO file.
=> Then we could have a dart:developer API to return the build_id of the app
=> The one invoking the build, can save obfuscation map and associate it with the build_id with the obfuscation map

Doing this would make the non-symbolic stack traces in assembly mode better, provide programmatic access to build_id and avoid inventing another UUID.

If we used another UUID for the obfuscation map, then we'd need to surface that at runtime via API but also include in the obfuscation map. As the current format of obfuscation map is a simple array of string, we'd break the format by adding a UUID in there (or would need a hack by emitting a obfuscation entry we misuse to encode UUID)

@mraleph @sstrickl wdyt about making runtime find build_id?

@marandaneto
Copy link
Author

Consider source maps as well, it'd be awesome if the generated source maps append the debugId, so we don't have to manually associate the file to the app's version which is problematic.

@sstrickl
Copy link
Contributor

sstrickl commented Jun 2, 2023

So I've made CL 306640. In that CL, when printing out a non-symbolic stack trace, the runtime still uses the build ID information from the app isolate instructions image if that information is available. If not, the runtime then looks in the DSO loaded for the app isolate instructions for its build ID (ELF) or UUID (Mach-O) information and uses that instead.

I'll do a followup CL that exposes the build ID via dart:developer, so the build ID can be retrieved programatically. For example, the built application can take a command line option which tells it to just print the build ID/UUID and exit, so the flow for building an application can retrieve the build ID that will be used at runtime (whether VM-generated or assembler-generated) and then save that information somewhere for later use.

copybara-service bot pushed a commit that referenced this issue Jun 12, 2023
For direct-to-ELF snapshots, the story remains the same as before,
as we use the information from the Image header if available.

If it isn't, then we fall back to dladdr to get the dynamic shared
object containing the app snapshot and then walk the ELF or Mach-O
headers to find the build ID or UUID information.

TEST=vm/dart/use_dwarf_stack_traces_flag

Issue: #51941
Change-Id: I3705ed244d1b4a1255e75fffd238a29fc2a60800
Cq-Include-Trybots: luci.dart.try:vm-aot-dwarf-linux-product-x64-try,vm-aot-linux-debug-simarm_x64-try,vm-aot-linux-debug-x64-try,vm-aot-linux-release-x64-try,vm-aot-mac-product-arm64-try,vm-aot-mac-release-arm64-try,vm-aot-mac-release-x64-try,vm-aot-linux-product-x64-try,vm-aot-win-release-x64-try,vm-aot-win-product-x64-try,vm-aot-win-debug-x64c-try,vm-aot-android-release-arm_x64-try,vm-aot-android-release-arm64c-try,vm-fuchsia-release-x64-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/306640
Reviewed-by: Slava Egorov <vegorov@google.com>
Commit-Queue: Tess Strickland <sstrickl@google.com>
copybara-service bot pushed a commit that referenced this issue Jun 22, 2023
TEST=vm/dart/build_id

Issue: #51941
CoreLibraryReviewExempt: Native runtime only API
Change-Id: Ib3757480f0eab6d147385a87adf657f4f709ec4e
Cq-Include-Trybots: luci.dart.try:vm-aot-dwarf-linux-product-x64-try,vm-aot-linux-debug-simarm_x64-try,vm-aot-linux-debug-x64-try,vm-aot-linux-release-x64-try,vm-aot-mac-product-arm64-try,vm-aot-mac-release-arm64-try,vm-aot-mac-release-x64-try,vm-aot-linux-product-x64-try,vm-aot-win-release-x64-try,vm-aot-win-product-x64-try,vm-aot-win-debug-x64c-try,vm-aot-android-release-arm_x64-try,vm-fuchsia-release-x64-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/307122
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Tess Strickland <sstrickl@google.com>
Reviewed-by: Lasse Nielsen <lrn@google.com>
@marandaneto
Copy link
Author

So I've made CL 306640. In that CL, when printing out a non-symbolic stack trace, the runtime still uses the build ID information from the app isolate instructions image if that information is available. If not, the runtime then looks in the DSO loaded for the app isolate instructions for its build ID (ELF) or UUID (Mach-O) information and uses that instead.

I'll do a followup CL that exposes the build ID via dart:developer, so the build ID can be retrieved programatically. For example, the built application can take a command line option which tells it to just print the build ID/UUID and exit, so the flow for building an application can retrieve the build ID that will be used at runtime (whether VM-generated or assembler-generated) and then save that information somewhere for later use.

I've checked the beta channel and can see the NativeRuntime.buildId, thank you for that.
I'm afraid that the suggested approach wouldn't work for Mobile at least.
Mobile apps can't execute a CLI command and return something AFAIK, APK is just a zip file.

Right now I'm able to read the debugId at runtime but not after the build, what's the chance that this debugId gets injected into the generated file (--save-obfuscation-map=mapping.json), or even outputting a new file that contains the debugId?

What we'd need is a way to retrieve the debugId after the app is compiled since Flutter does not have build hooks, so I can upload the mapping.json and associate it with the debugId.

Also, is there a way to generate a debugId for Flutter web as well? since we'd need that to symbolicate class types as well.
Another option is an extra build parameter that avoids obfuscation of class types, JS, for example, does not minify class types by default.
Eg flutter build web --source-maps --no-class-types-obfuscation

Let me know your thoughts.

@marandaneto
Copy link
Author

Actually, I found a way to match the buildId after being compiled.

❯ sentry-cli debug-files check app.android-arm64.symbols
Debug Info File Check
Type: elf debug companion
Contained debug identifiers:
> Debug ID: 6ff64056-41c6-7a20-4067-e263a1b01ced
Code ID: 5640f66fc641207a4067e263a1b01ced
Arch: arm64
Contained debug information:
> symtab, debug
Usable: yes

Using sentry-cli's check command, the Code ID matches the NativeRuntime.buildId at runtime.

So sentry-cli can associate the generated file (--extra-gen-snapshot-options=--save-obfuscation-map=mapping.json) with the Code ID from the debug symbols.
Is that always the case @sstrickl ?

@sstrickl
Copy link
Contributor

I would assume so, since I assume sentry-cli is using the same standard ELF note segment that we are to retrieve this information. The only issue would be if multiple build ID note segments were created (so you'd have to figure out which is the "right" one re: what the Dart VM would pick), but since we only generate one ourselves if compiling direct to ELF and otherwise let the assembler handle generating the build ID (which should only generate one), it should be safe to depend on any tool for ELF files that produces a build ID, like sentry-cli or readelf:

$ readelf -n bar.so

Displaying notes found in: .note.gnu.build-id
  Owner                Data size 	Description
  GNU                  0x00000010	NT_GNU_BUILD_ID (unique build ID bitstring)
    Build ID: 852bbe58cb4f654b7165bb316e1a3528

@marandaneto
Copy link
Author

@sstrickl thanks.
So the only missing bit to unblock full symbolication of class types is an identifier (eg similar to build_id) for Flutter web.
Right now we cannot associate the compiled Flutter web app and the generated mapping file.

Any plans or ideas on how to tackle this?

@mkustermann
Copy link
Member

So the only missing bit to unblock full symbolication of class types is an identifier (eg similar to build_id) for Flutter web.
Right now we cannot associate the compiled Flutter web app and the generated mapping file.

Any plans or ideas on how to tackle this?

Is the code that prints the names of classes (e.g. via <obj>.runtimeType.toString()) Dart code? Do those prints eventually end up in some logging system?

What one can do right now is e.g.

import 'dart:developer';

main() {
  log('App startup, build-id: ${NativeRuntime.buildId}');
  ...
  foo(...);
}
void foo(...) {
  log('Got object ${object.runtimeType}');
  try {
    ...
  } catch (e, s) {
    log('Got exception ${e.runtimeType}');
  }
}

One can then post-process all this logging information, finding the printed build-id, getting the obfuscation map for that build-id, then replacing all obfuscated names with the unobfuscated names in the logs (e.g. regexp-replace, or simple tokenzier & replacing, ...).

It wouldn't be precise, as one may actually print intentionally a string that also corresponds to an obfuscated symbol name, but the probability of that happening are quite low - and for logs this edge case may not matter at all.

@marandaneto would that work for you?

@marandaneto
Copy link
Author

Hi @mkustermann yes, we'd like to symbolicate the minified names gotten by <obj>.runtimeType.toString().
Getting the NativeRuntime.buildId isn't a problem (at runtime), nor doing the symbolication (e.g. regexp-replace, or simple tokenzier & replacing, ...).
The problem is to associate the log with a given buildId and the generated mapping file.
If I have multiple releases of an app, I also have multiple mapping files, so which one do I look for?

The solution would be that the mapping file would contain the buildId that matches the NativeRuntime.buildId.
Or I'd be able to read the NativeRuntime.buildId somehow (via compilation hooks, or similarly to AOT apps using sentry-cli/readelf), so I can make the association of the mapping file myself during the upload process (Think about 3rd party solutions that upload those mapping files from multiple releases).

The problem is that a compiled version of a Flutter web app does not have any reference to the buildId.

@mkustermann
Copy link
Member

mkustermann commented Jul 24, 2023

The problem is to associate the log with a given buildId and the generated mapping file.
If I have multiple releases of an app, I also have multiple mapping files, so which one do I look for?

The log will contain obfuscated symbols + build-id needed to symbolize (similar to our stack traces).
When creating the app one gets an ELF/MachO file containing the build-id as well as the obfuscation map as well as debug symbols file. This has to be preserved somewhere, e.g. a database storing the debug symbol file & obfuscation map for each app version. It can use the build-id as key to obtain debug symbol file & obfuscation file.

So maybe the issue here is that you have trouble getting the build-id out of the generated binary at compile-time. There's native tools that can be used for this (e.g. objdump -s -j /bin/ls). But as we already have ELF/MachO parsing code in package:native_stack_traces, we could add a tool there to dump the build-id to avoid dependency on the native tools, e.g.

% pub global activate native_stack_traces
% pub global run native_stack_traces:dump_build_id libapp.so

or make programmatic API in package:native_stack_traces, that you can call. Would that be helpful?

The problem is that a compiled version of a Flutter web app does not have any reference to the buildId.

@marandaneto could you open a separate issue for web support?

@marandaneto
Copy link
Author

marandaneto commented Jul 24, 2023

@mkustermann Sure, for AOT compiled apps, we have all we need, the given API NativeRuntime.buildId was the only missing bit, thanks for that.

I will open an issue for Flutter Web support.
This issue could be closed as done since there are no blockers anymore for non-Web apps.

@marandaneto
Copy link
Author

@mkustermann #53027

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends.
Projects
None yet
Development

No branches or pull requests

5 participants