Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce end-to-end AOT compilation time for large applications #43299

Open
mraleph opened this issue Sep 2, 2020 · 16 comments
Open

reduce end-to-end AOT compilation time for large applications #43299

mraleph opened this issue Sep 2, 2020 · 16 comments
Labels
area-front-end Use area-front-end for front end / CFE / kernel format related issues. area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. customer-google3 Epic

Comments

@mraleph
Copy link
Member

mraleph commented Sep 2, 2020

This is an umbrella issue to track efforts around reduction of critical path when building large Flutter applications in AOT mode

@mraleph mraleph added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. Epic area-front-end Use area-front-end for front end / CFE / kernel format related issues. labels Sep 2, 2020
@alexmarkov
Copy link
Contributor

This looks duplicate to #42442.

@mraleph
Copy link
Member Author

mraleph commented Sep 2, 2020

@alexmarkov this is more of an umbrella issue, not limited to just TFA but to general end-to-end performance.

dart-bot pushed a commit that referenced this issue Sep 25, 2020
This change improves TFA speed by adding
* Cache of dispatch targets.
* Identical types fast path to union and intersection of set and cone
  types.
* Subset fast path in the union of set types.
* More efficient ConcreteType.raw.

AOT step 2 (TFA):
app #1: 200s -> 140s (-30%)
app #2: 208s -> 150s (-27%)

Issue: #42442
Issue: #43299
b/154155290

Change-Id: Ie9039a6448a7655d2aed5f5260473c28b1d917d9
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/164480
Reviewed-by: Martin Kustermann <kustermann@google.com>
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
Commit-Queue: Alexander Markov <alexmarkov@google.com>
dart-bot pushed a commit that referenced this issue Oct 16, 2020
…uppress extra frontend_server output

Front-end server prints all dependencies after compilation, which could
result in a lot of output when AOT compiling a large application.

This change adds --no-print-incremental-dependencies option which
suppresses extra output. This skips printing dependencies which
takes time and avoids I/O which may be blocked.

Issue: #43299
Change-Id: I7779d3b5f1b513c2370978a5384a71cff371f017
b/154155290
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/167860
Reviewed-by: Alexander Aprelev <aam@google.com>
Commit-Queue: Alexander Markov <alexmarkov@google.com>
dart-bot pushed a commit that referenced this issue Jan 19, 2021
In certain cases involving auto-generated Dart sources there could be a
huge number of allocated classes which are subtypes of a certain class.
Specializing such cone types to set types, as well as intersection and
union operations on such types may be slow and may severely affect
compilation time. Also, gradually discovering allocated classes in
such cone types may cause a lot of invalidations during analysis.

In order to avoid servere degradation of compilation time in such case,
this change adds WideConeType which works like a ConeType when number of
allocated classes is large, but it doesn't specialize to a SetType and
has more efficient but approximate implementation of union and
intersection.

Uncompressed size of Flutter gallery AOT snapshot (android/arm64):
WideConeType approximation for types with
>32 allocated subtypes: +0.1176%
>64 allocated subtypes: +0.0956%
>128 allocated subtypes: +0.0027%

For now conservative approximation is used when number of allocated
types >128.

TFA time of large app #1: 175s -> 119s (-32%)
TFA time of large app #2: 211s -> 81s (-61.6%)
Snapshot size changes are insignificant.

TEST=Stress tested on precomp bots with
maxAllocatedTypesInSetSpecializations = 3 and 0.

Issue: #42442
Issue: #43299
Change-Id: Idae33205ddda81714e4aeccc7ae292e0164be651
b/154155290, b/177498788, b/177497864
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/179200
Commit-Queue: Alexander Markov <alexmarkov@google.com>
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
Reviewed-by: Aske Simon Christensen <askesc@google.com>
dart-bot pushed a commit that referenced this issue Sep 13, 2021
* When testing for pragmas in the inliner, call function.has_pragma()
  early to avoid more expensive Library::FindPragma query.

* When scanning through object pool entries in
  Precompiler::AddCalleesOfHelper, skip over OneByteString and null
  objects quickly. They are leaf and there could be a huge number of
  those objects.

AOT gen_snapshot time of a large Flutter application built in
flutter/release mode for arm64 (best of 5 runs):
Before: 81.589s
After:  74.415s (-8.79%)

TEST=ci

Issue: #43299
Change-Id: I960451c73b42dab9845f0e0eafacaa9bb23720e3
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/213288
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Commit-Queue: Alexander Markov <alexmarkov@google.com>
@dnfield
Copy link
Contributor

dnfield commented Oct 25, 2021

One way to do this would be to support breaking things up into smaller increments of compilation.

This would allow for fanning out compilation steps to many different worker machines, and would also potentially help with incrementally compiling smaller changes between builds.

I'm more interested in faster incremental builds for the sake of performance tuning/debugging - right now, changing a single line of code requires repeating the entire process of kernel compiling/tfa/snapshotting, which can be very expensive on large codebases.

@mraleph
Copy link
Member Author

mraleph commented Oct 25, 2021

We could parallelize gen_snapshot step in a relatively straightforward fashion though it would potentially require us to forgo some of the late stage tree shaking (might not be that relevant for code quality after all).

Parallelizing TFA is an open problem - it is intrisically non-modular global step. You should compare it to similar global optimizations steps like LTO in LLVM or ProGuard et al

@xster
Copy link
Contributor

xster commented Oct 25, 2021

Google concerns: b/203690870

@dnfield
Copy link
Contributor

dnfield commented Oct 25, 2021

Perhaps something like what LLVM did with ThinLTO would be applicable for TFA? http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html

copybara-service bot pushed a commit that referenced this issue Jan 17, 2022
…eduplication

On a large Flutter app, compiled in release mode for arm64:
Total gen_snapshot time 112.762s -> 89.595s (-20.5%)
(Dedup pass 34.58s -> 11.03s)

TEST=ci

Issue: #43299
Bug: b/154155290
Change-Id: If5ce4cf6a26e4a0300de6bc1864854f4deedffa3
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/228281
Auto-Submit: Alexander Markov <alexmarkov@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Commit-Queue: Slava Egorov <vegorov@google.com>
copybara-service bot pushed a commit that referenced this issue Jan 21, 2022
This change improves hash code implementations in multiple places in
the compiler. That reduces number of probes during lookups in hash maps
and improves AOT compilation time of large applications.

On a large Flutter app, compiled in release mode for arm64:
Total gen_snapshot time 89.184s -> 60.736s (-31.9%)

Also, this change adds --hash_map_probes_limit=N option which sets
a hard limit for the number of probes in hash maps. This option
makes it easy to find hash maps where there are many collisions
due to poor hash code implementation.

TEST=ci

Issue: #43299
Bug: b/154155290
Change-Id: Ibf6f37d4b9f3bf42dd6731bfb4095a7305b98b2d
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/229240
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Alexander Markov <alexmarkov@google.com>
copybara-service bot pushed a commit that referenced this issue Jan 26, 2022
…t hashes

This change improves a few implementations of hashcodes in compiler.
Slightly improves AOT compilation time:
gen_snapshot 60.736s -> 58.920s (-2.9%)
(on a large Flutter app, compiled in release mode for arm64).

Also, the same large app can be now compiled with
--hash_map_probes_limit=1500, meaning that all hash maps in
the compiler perform less than 1500 probes when looking for an
element.

This change also adds a test which verifies that kernel compiler
itself can be compiled with --hash_map_probes_limit=1000.
This is a sanity check to ensure we do not have a very
badly distributed hashcode in the compiler.

TEST=ci

Issue: #43299
Change-Id: I7a802709727a33760c4f1d13f7b2c8cb263852d7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/229940
Reviewed-by: Martin Kustermann <kustermann@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Commit-Queue: Alexander Markov <alexmarkov@google.com>
copybara-service bot pushed a commit that referenced this issue Aug 18, 2022
This change adds binary serialization/deserialization of flow graphs.
It supports all IL instructions and certain objects which can be
referenced from IL instructions. IL binary serialization is a useful
machanism which would allow us to split compilation into multiple parts
in order to parallelize AOT compilation.

The program structure (libraries/classes/functions/fields) is not
serialized. It is assumed that reader and writer use the same
program structure.

Caveats:
* FFI callbacks are not supported yet.
* Closure functions are not re-created when reading flow graph.
* Flow graph should be in SSA form (unoptimized flow graphs are not
  supported).
* JIT mode is not supported (serializer currently assumes lazy
  linking of native methods and empty ICData).

In order to test IL serialization, --test_il_serialization VM option is
added to serialize and deserialize flow graph before generating code.

TEST=vm/dart/splay_test now runs with --test_il_serialization.

TEST=Manual run of vm-kernel-precomp-linux-debug-x64-try with
--test_il_serialization enabled (only ffi tests failed).

Issue: #43299
Change-Id: I7bbfd9e3a301e00c9cfbffa06b8f1f6c78a78470
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/254941
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Commit-Queue: Alexander Markov <alexmarkov@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
copybara-service bot pushed a commit that referenced this issue Aug 18, 2022
This reverts commit 9700458.

Reason for revert: breaks Dart SDK build using gcc and dart-sdk-linux-main bot.

Original change's description:
> [vm/compiler] Initial implementation of IL binary serialization
>
> This change adds binary serialization/deserialization of flow graphs.
> It supports all IL instructions and certain objects which can be
> referenced from IL instructions. IL binary serialization is a useful
> machanism which would allow us to split compilation into multiple parts
> in order to parallelize AOT compilation.
>
> The program structure (libraries/classes/functions/fields) is not
> serialized. It is assumed that reader and writer use the same
> program structure.
>
> Caveats:
> * FFI callbacks are not supported yet.
> * Closure functions are not re-created when reading flow graph.
> * Flow graph should be in SSA form (unoptimized flow graphs are not
>   supported).
> * JIT mode is not supported (serializer currently assumes lazy
>   linking of native methods and empty ICData).
>
> In order to test IL serialization, --test_il_serialization VM option is
> added to serialize and deserialize flow graph before generating code.
>
> TEST=vm/dart/splay_test now runs with --test_il_serialization.
>
> TEST=Manual run of vm-kernel-precomp-linux-debug-x64-try with
> --test_il_serialization enabled (only ffi tests failed).
>
> Issue: #43299
> Change-Id: I7bbfd9e3a301e00c9cfbffa06b8f1f6c78a78470
> Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/254941
> Reviewed-by: Ryan Macnak <rmacnak@google.com>
> Commit-Queue: Alexander Markov <alexmarkov@google.com>
> Reviewed-by: Slava Egorov <vegorov@google.com>

TBR=vegorov@google.com,kustermann@google.com,rmacnak@google.com,alexmarkov@google.com

Change-Id: Iae4e4868f183815a8fc3cd79597141b3896e23d7
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Issue: #43299
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/255780
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>
Commit-Queue: Alexander Markov <alexmarkov@google.com>
copybara-service bot pushed a commit that referenced this issue Aug 22, 2022
This is a reland of commit 9700458

Original change's description:
> [vm/compiler] Initial implementation of IL binary serialization
>
> This change adds binary serialization/deserialization of flow graphs.
> It supports all IL instructions and certain objects which can be
> referenced from IL instructions. IL binary serialization is a useful
> machanism which would allow us to split compilation into multiple parts
> in order to parallelize AOT compilation.
>
> The program structure (libraries/classes/functions/fields) is not
> serialized. It is assumed that reader and writer use the same
> program structure.
>
> Caveats:
> * FFI callbacks are not supported yet.
> * Closure functions are not re-created when reading flow graph.
> * Flow graph should be in SSA form (unoptimized flow graphs are not
>   supported).
> * JIT mode is not supported (serializer currently assumes lazy
>   linking of native methods and empty ICData).
>
> In order to test IL serialization, --test_il_serialization VM option is
> added to serialize and deserialize flow graph before generating code.

TEST=vm/dart/splay_test now runs with --test_il_serialization.
TEST=Manual run of vm-kernel-precomp-linux-debug-x64-try with
--test_il_serialization enabled (only ffi tests failed).
TEST=gcc build on dart-sdk-linux-try bot.

Issue: #43299

> Change-Id: I7bbfd9e3a301e00c9cfbffa06b8f1f6c78a78470
> Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/254941
> Reviewed-by: Ryan Macnak <rmacnak@google.com>
> Commit-Queue: Alexander Markov <alexmarkov@google.com>
> Reviewed-by: Slava Egorov <vegorov@google.com>

Change-Id: I64ff9747f761496a096371e490ef070a14023256
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/255840
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Commit-Queue: Alexander Markov <alexmarkov@google.com>
copybara-service bot pushed a commit that referenced this issue Aug 26, 2022
TEST=tests/ffi/function_callbacks_test.dart
TEST=Manual run of vm-kernel-precomp-linux-debug-x64-try with --test_il_serialization enabled.

Issue: #43299
Change-Id: Ia57021d9091e8a80de3645cb4723ebdbb5a3d33d
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/256371
Commit-Queue: Alexander Markov <alexmarkov@google.com>
Reviewed-by: Daco Harkes <dacoharkes@google.com>
@quyenvsp
Copy link

Hi,

Please guide me anyway to debug AOT progress? Our bank application build AOT too long.

stuck at step frontend_server.dart.snapshot

debug without AOT:
51 seconds (with 16 Gb RAM)

release with AOT:
51 minutes (with 16 Gb RAM)
31 minutes (with 32 Gb RAM)

But problem is frontend_server.dart.snapshot --verbose dont print anything helpful so we can not know where is problem. Is it possible to print what dart file is processing? So we can see where it stuck.

Thank you.

@mraleph
Copy link
Member Author

mraleph commented Sep 22, 2022

@quyenvsp which Flutter version are you on? (could you post flutter doctor -v?).

It is probably hitting some pathological case in the TFA.

Currently the best way to diagnose that requires a bit of manual work - it's described in #47546 (comment)

  1. get Dart SDK sources (as explained at dart-lang/sdk/wiki/Building#getting-the-source)

  2. Checkout the version of Dart matching the version of Flutter you are using. I would be able to tell you the version if you post flutter doctor -v output.

  3. Get exact command line arguments for frontend_server.dart.snapshot (e.g. by doing flutter build apk --release -v) and then run it from sources in the following way ($DART is the path to Dart SDK sources):

    # $DART is the path to Dart SDK sources
    # $ARGS are the arguments that are normally passed to frontend_server.dart.snapshot
    $ dart -Dglobal.type.flow.print.stats=true --disable-dart-dev --packages=$DART/.packages $DART/pkg/frontend_server/bin/frontend_server_starter.dart $ARGS
    

/cc @alexmarkov (we should consider allowing users to set global.type.flow.print.stats=true through OS environment variables, rather than through Dart defines).

@quyenvsp
Copy link

quyenvsp commented Sep 25, 2022

@mraleph

Thank so much for guide.

We have 2 bank application project almost similar (one just little smaller) but TFA time is more different 377947ms >< 1997843ms (~5 time slower).

Both do same Ubuntu machine (32Gb of ram), using Flutter stable 2.10.5 Dart 2.16.2 (Dart SDK sources), free memory before build each.
one project build just waste 8Gb ram, the slow project take 17Gb ram

While we continue find where is problem, hope global.type.flow.print.stats=true result below of 2 project helpful.
(Also have perf report but size is 200Mb and 600Mb, will upload if you need)

fine_project_result.txt
slow_project_result.txt

We see a huge different 258907308 >< 738198250 invocations queried in cache, but it not detailed yet. Please is these anyway counter per packages/classes? Something like package/class A 1000 invocations, package/class B 2000 invocations,...By this way we easy know where is problem then find more on it.

@alexmarkov
Copy link
Contributor

@quyenvsp Thank you for the report. These huge compilation times are usually caused by a large amount of auto-generated Dart source code. Do you use Dart source auto-generation in your app?

By looking at the logs, it seems like the second app is ~3x times larger (10676547 vs. 28089433 summaries analyzed). That may explain the difference in the compilation times. So far I do not see anything out of the ordinary from those logs, except maybe that in both apps analysis hits the hard limit of 5000 maximum invocations cached per selector. You can try to vary (decrease) constants in the _SelectorApproximation class to see if it helps:

/// Approximation [_Invocation] with raw arguments is created and used
/// after number of [_Invocation] objects with same selector but
/// different arguments reaches this limit.
static const int maxInvocationsPerSelector = 5000;
/// [_DirectInvocation] can be approximated with raw arguments
/// if number of operations in its summary exceeds this threshold.
static const int largeSummarySize = 300;
/// If summary exceeds [largeSummarySize] and number of
/// [_DirectInvocation] objects with same selector but
/// different arguments exceeds this limit, then approximate
/// [_DirectInvocation] with raw arguments is created and used.
static const int maxDirectInvocationsPerSelector = 10;

Also, you can try to run frontend server with -Dglobal.type.flow.print.timings=true option (as @mraleph explained above) to get a list of methods where analysis spent the most time. This option should print 3 tables of methods - Top summaries by number of times analyzed, Top summaries by dirty analysis time (including callees), in microseconds and Top summaries by pure analysis time (excluding callees), in microseconds. The most interesting are the last and the first tables.

@yaminet1024
Copy link

yaminet1024 commented Oct 24, 2023

hi, I'm facing the same issue, and most of my time is spent on the gen_snapshot, which takes nearly 8 minutes.
2023-10-24 16:31:32:218 : [ +1 ms] executing: gen_snapshot_arm64 --deterministic --save-obfuscation-map=build/ios/framework/Release/app_iOS_symbols.json --snapshot_kind=app-aot-assembly --assembly=/Volumes/data/workspace/.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/snapshot_assembly.S --dwarf-stack-traces --resolve-dwarf-paths --save-debugging-info=build/ios/framework/Release/app.ios-arm64.symbols --obfuscate /Volumes/data/workspace/.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/app.dill 2023-10-24 16:33:47:397 : ##[error][+135179 ms] Warning: The generated assembly code contains unobfuscated DWARF debugging information. 2023-10-24 16:33:47:398 : ##[error][ ] To avoid this, use --strip to remove it. 2023-10-24 16:33:47:646 : [ +247 ms] executing: xcrun cc -arch arm64 -miphoneos-version-min=11.0 -isysroot /Applications/Xcode_14.3.1.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS16.4.sdk -c /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/snapshot_assembly.S -o /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/snapshot_assembly.o 2023-10-24 16:38:04:903 : [+257257 ms] executing: xcrun clang -arch arm64 -miphoneos-version-min=11.0 -isysroot /Applications/Xcode_14.3.1.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS16.4.sdk -dynamiclib -Xlinker -rpath -Xlinker @executable_path/Frameworks -Xlinker -rpath -Xlinker @loader_path/Frameworks -fapplication-extension -install_name @rpath/App.framework/App -o /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/App.framework/App /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/snapshot_assembly.o 2023-10-24 16:38:05:763 : [ +859 ms] executing: xcrun dsymutil -o /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/App.framework.dSYM /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/App.framework/App 2023-10-24 16:38:08:158 : [+2395 ms] executing: xcrun strip -x /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/App.framework/App -o /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/App.framework/App 2023-10-24 16:38:09:312 : [+1153 ms] aot_assembly_release: Complete
Are there any good way for outputting trace information, so I can better identify where the longest delays occur? Also, does gen_snapshot currently support incremental compilation? If not, are there any other methods to improve the compilation speed? Currently, our project takes close to 30 minutes to compile. Thank you.

@mraleph
Copy link
Member Author

mraleph commented Oct 24, 2023

@yaminet1024 how big is your code? You can try to run gen_snapshot_arm64 with --print_precompiler_timings and paste the output. You can also run gen_snapshot_arm64 with --trace-compiler, redirect it to 2>/tmp/compilation_trace.txt and then process the log using the following command

$ cat /tmp/compilation_trace.txt | perl -n -e 'm/--> \'([^\']+)\'.*time: (\d+)/ && do{$n=$1;$t=$2; $n =~ s/ /_/g; print ($n, " ", $t, "\n")}' | sort -n -k 2 > /tmp/sorted.txt

You can look at the methods that take longest to compile. If there are some outliers - we would like to know if there is something special about them (e.g. huge autogenerated code is one common suspect).

It would be nice to know histogram of compilation timings, how many methods are compiled in total (wc -l /tmp/sorted.txt should give that).

@michael-golfi
Copy link

After some searching I believe this is the right issue to comment on... I am seeing large compile times for a Flutter app using quite a bit of generated serializer code. I'm not sure what is considered big in terms of an app.

I am seeing 25 minute release builds, here is output from verbose logs:

[  +86 ms] gen_dart_plugin_registrant: Complete
[   +7 ms] kernel_snapshot: Starting due to {}
[  +10 ms] Embedding native assets mapping /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/native_assets.yaml in kernel.
[   +6 ms] /Users/michaelgolfi/development/flutter/bin/cache/dart-sdk/bin/dartaotruntime --disable-dart-dev /Users/michaelgolfi/development/flutter/bin/cache/dart-sdk/bin/snapshots/frontend_server_aot.dart.snapshot --sdk-root /Users/michaelgolfi/development/flutter/bin/cache/artifacts/engine/common/flutter_patched_sdk_product/ --target=flutter --no-print-incremental-dependencies -Ddart.vm.profile=false -Ddart.vm.product=true --delete-tostring-package-uri=dart:ui --delete-tostring-package-uri=package:flutter --aot --tfa --target-os ios --packages /Users/michaelgolfi/work/mobile/.dart_tool/package_config.json --output-dill /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/app.dill --depfile /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/kernel_snapshot.d --source file:///Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/dart_plugin_registrant.dart --source package:flutter/src/dart_plugin_registrant.dart -Dflutter.dart_plugin_registrant=file:///Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/dart_plugin_registrant.dart --native-assets /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/native_assets.yaml --verbosity=error package:phyla/main.dart
[+1096958 ms] kernel_snapshot: Complete
[+1782 ms] aot_assembly_release: Starting due to {InvalidatedReasonKind.inputChanged: The following inputs have updated contents: /Users/michaelgolfi/development/flutter/packages/flutter_tools/lib/src/build_system/targets/ios.dart,/Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/app.dill,/Users/michaelgolfi/development/flutter/bin/internal/engine.version,/Users/michaelgolfi/development/flutter/bin/internal/engine.version}
[   +1 ms] targetingApplePlatform = true
[        ] extractAppleDebugSymbols = true
[        ] Will strip AOT snapshot manually after build and dSYM generation.
[        ] executing: /Users/michaelgolfi/development/flutter/bin/cache/artifacts/engine/ios-release/gen_snapshot_arm64 --deterministic --snapshot_kind=app-aot-assembly --assembly=/Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/snapshot_assembly.S /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/app.dill
[+59343 ms] executing: sysctl hw.optional.arm64
[   +3 ms] Exit code 0 from: sysctl hw.optional.arm64
[        ] hw.optional.arm64: 1
[        ] executing: /usr/bin/arch -arm64e xcrun cc -arch arm64 -miphoneos-version-min=12.0 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS17.5.sdk -c /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/snapshot_assembly.S -o /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/snapshot_assembly.o
[+105579 ms] executing: /usr/bin/arch -arm64e xcrun clang -arch arm64 -miphoneos-version-min=12.0 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS17.5.sdk -dynamiclib -Xlinker -rpath -Xlinker @executable_path/Frameworks -Xlinker -rpath -Xlinker @loader_path/Frameworks -fapplication-extension -install_name @rpath/App.framework/App -o /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/App.framework/App /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/snapshot_assembly.o
[ +100 ms] executing: /usr/bin/arch -arm64e xcrun dsymutil -o /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/App.framework.dSYM /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/App.framework/App
[ +751 ms] executing: /usr/bin/arch -arm64e xcrun strip -x /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/App.framework/App -o /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/App.framework/App
[ +124 ms] aot_assembly_release: Complete
[        ] release_ios_bundle_flutter_assets: Starting due to {}

@mraleph I will try your suggestions above. So far I have tried to move most of the serializers into their own path dependency in the past, in the hopes it would cache the builds since they change less often than the main app.

@michael-golfi
Copy link

I found that much time was spent on serializers generated from ferry_generator. I managed to reduce some of the generated code with some config and shaved 500s from release build time, an improvement of about 40%.

The shell script above has a missing quote somewhere. I gave up on it and wrote some Python to sort the files by time instead.

"""
  /Users/michaelgolfi/development/flutter/bin/cache/artifacts/engine/ios-release/gen_snapshot_arm64 \
  --print_precompiler_timings \
  --trace-compiler \
  --deterministic \
  --snapshot_kind=app-aot-assembly \
  --assembly=/Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/snapshot_assembly.S \
    /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/app.dill 2>compilation_trace.txt
"""

def process_file(input_file, output_file):
    data = []

    with open(input_file, "r") as file:
        for line in file:
            if "-->" in line:
                parts = line.strip().split("time:")
                if len(parts) > 1:
                    time_part = (
                        parts[1].strip().split(" ")[0]
                    )  # Extract the time before 'us'
                    name_part = parts[0].split("'")[1]  # Extract the function name
                    name_part = name_part.replace(
                        " ", "_"
                    )  # Replace spaces with underscores
                    data.append((name_part, int(time_part)))

    # Sort data by the time, which is the second item in the tuple
    data.sort(key=lambda x: x[1])

    with open(output_file, "w") as file:
        for name, time in data:
            file.write(f"{name} {time}\n")


# Usage
process_file("compilation_trace.txt", "sorted.txt")

@mraleph
Copy link
Member Author

mraleph commented Jun 28, 2024

The ferry_generator generated code does seem to hit some sort of non-linearity in TFA.

I extracted a benchmark core from another issue I was looking at before (related to build_runner performance): https://github.com/mraleph/flutter_ferry_aot_stress_test. I can clearly see that as input grows TFA speed (input size divided by time taken) decreases: 0.49 kb/ms, then 0.25 kb/ms, then 0.11 kb/ms

@alexmarkov could you take this for a spin?

@michael-golfi
Copy link

If helpful, I posted a repro on a build_runner issue a couple years ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-front-end Use area-front-end for front end / CFE / kernel format related issues. area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. customer-google3 Epic
Projects
None yet
Development

No branches or pull requests

8 participants