-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce end-to-end AOT compilation time for large applications #43299
Comments
This looks duplicate to #42442. |
@alexmarkov this is more of an umbrella issue, not limited to just TFA but to general end-to-end performance. |
This change improves TFA speed by adding * Cache of dispatch targets. * Identical types fast path to union and intersection of set and cone types. * Subset fast path in the union of set types. * More efficient ConcreteType.raw. AOT step 2 (TFA): app #1: 200s -> 140s (-30%) app #2: 208s -> 150s (-27%) Issue: #42442 Issue: #43299 b/154155290 Change-Id: Ie9039a6448a7655d2aed5f5260473c28b1d917d9 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/164480 Reviewed-by: Martin Kustermann <kustermann@google.com> Reviewed-by: Vyacheslav Egorov <vegorov@google.com> Commit-Queue: Alexander Markov <alexmarkov@google.com>
…uppress extra frontend_server output Front-end server prints all dependencies after compilation, which could result in a lot of output when AOT compiling a large application. This change adds --no-print-incremental-dependencies option which suppresses extra output. This skips printing dependencies which takes time and avoids I/O which may be blocked. Issue: #43299 Change-Id: I7779d3b5f1b513c2370978a5384a71cff371f017 b/154155290 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/167860 Reviewed-by: Alexander Aprelev <aam@google.com> Commit-Queue: Alexander Markov <alexmarkov@google.com>
In certain cases involving auto-generated Dart sources there could be a huge number of allocated classes which are subtypes of a certain class. Specializing such cone types to set types, as well as intersection and union operations on such types may be slow and may severely affect compilation time. Also, gradually discovering allocated classes in such cone types may cause a lot of invalidations during analysis. In order to avoid servere degradation of compilation time in such case, this change adds WideConeType which works like a ConeType when number of allocated classes is large, but it doesn't specialize to a SetType and has more efficient but approximate implementation of union and intersection. Uncompressed size of Flutter gallery AOT snapshot (android/arm64): WideConeType approximation for types with >32 allocated subtypes: +0.1176% >64 allocated subtypes: +0.0956% >128 allocated subtypes: +0.0027% For now conservative approximation is used when number of allocated types >128. TFA time of large app #1: 175s -> 119s (-32%) TFA time of large app #2: 211s -> 81s (-61.6%) Snapshot size changes are insignificant. TEST=Stress tested on precomp bots with maxAllocatedTypesInSetSpecializations = 3 and 0. Issue: #42442 Issue: #43299 Change-Id: Idae33205ddda81714e4aeccc7ae292e0164be651 b/154155290, b/177498788, b/177497864 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/179200 Commit-Queue: Alexander Markov <alexmarkov@google.com> Reviewed-by: Vyacheslav Egorov <vegorov@google.com> Reviewed-by: Aske Simon Christensen <askesc@google.com>
* When testing for pragmas in the inliner, call function.has_pragma() early to avoid more expensive Library::FindPragma query. * When scanning through object pool entries in Precompiler::AddCalleesOfHelper, skip over OneByteString and null objects quickly. They are leaf and there could be a huge number of those objects. AOT gen_snapshot time of a large Flutter application built in flutter/release mode for arm64 (best of 5 runs): Before: 81.589s After: 74.415s (-8.79%) TEST=ci Issue: #43299 Change-Id: I960451c73b42dab9845f0e0eafacaa9bb23720e3 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/213288 Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Commit-Queue: Alexander Markov <alexmarkov@google.com>
One way to do this would be to support breaking things up into smaller increments of compilation. This would allow for fanning out compilation steps to many different worker machines, and would also potentially help with incrementally compiling smaller changes between builds. I'm more interested in faster incremental builds for the sake of performance tuning/debugging - right now, changing a single line of code requires repeating the entire process of kernel compiling/tfa/snapshotting, which can be very expensive on large codebases. |
We could parallelize Parallelizing TFA is an open problem - it is intrisically non-modular global step. You should compare it to similar global optimizations steps like LTO in LLVM or ProGuard et al |
Google concerns: b/203690870 |
Perhaps something like what LLVM did with ThinLTO would be applicable for TFA? http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html |
…eduplication On a large Flutter app, compiled in release mode for arm64: Total gen_snapshot time 112.762s -> 89.595s (-20.5%) (Dedup pass 34.58s -> 11.03s) TEST=ci Issue: #43299 Bug: b/154155290 Change-Id: If5ce4cf6a26e4a0300de6bc1864854f4deedffa3 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/228281 Auto-Submit: Alexander Markov <alexmarkov@google.com> Reviewed-by: Martin Kustermann <kustermann@google.com> Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Commit-Queue: Slava Egorov <vegorov@google.com>
This change improves hash code implementations in multiple places in the compiler. That reduces number of probes during lookups in hash maps and improves AOT compilation time of large applications. On a large Flutter app, compiled in release mode for arm64: Total gen_snapshot time 89.184s -> 60.736s (-31.9%) Also, this change adds --hash_map_probes_limit=N option which sets a hard limit for the number of probes in hash maps. This option makes it easy to find hash maps where there are many collisions due to poor hash code implementation. TEST=ci Issue: #43299 Bug: b/154155290 Change-Id: Ibf6f37d4b9f3bf42dd6731bfb4095a7305b98b2d Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/229240 Reviewed-by: Martin Kustermann <kustermann@google.com> Commit-Queue: Alexander Markov <alexmarkov@google.com>
…t hashes This change improves a few implementations of hashcodes in compiler. Slightly improves AOT compilation time: gen_snapshot 60.736s -> 58.920s (-2.9%) (on a large Flutter app, compiled in release mode for arm64). Also, the same large app can be now compiled with --hash_map_probes_limit=1500, meaning that all hash maps in the compiler perform less than 1500 probes when looking for an element. This change also adds a test which verifies that kernel compiler itself can be compiled with --hash_map_probes_limit=1000. This is a sanity check to ensure we do not have a very badly distributed hashcode in the compiler. TEST=ci Issue: #43299 Change-Id: I7a802709727a33760c4f1d13f7b2c8cb263852d7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/229940 Reviewed-by: Martin Kustermann <kustermann@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Commit-Queue: Alexander Markov <alexmarkov@google.com>
This change adds binary serialization/deserialization of flow graphs. It supports all IL instructions and certain objects which can be referenced from IL instructions. IL binary serialization is a useful machanism which would allow us to split compilation into multiple parts in order to parallelize AOT compilation. The program structure (libraries/classes/functions/fields) is not serialized. It is assumed that reader and writer use the same program structure. Caveats: * FFI callbacks are not supported yet. * Closure functions are not re-created when reading flow graph. * Flow graph should be in SSA form (unoptimized flow graphs are not supported). * JIT mode is not supported (serializer currently assumes lazy linking of native methods and empty ICData). In order to test IL serialization, --test_il_serialization VM option is added to serialize and deserialize flow graph before generating code. TEST=vm/dart/splay_test now runs with --test_il_serialization. TEST=Manual run of vm-kernel-precomp-linux-debug-x64-try with --test_il_serialization enabled (only ffi tests failed). Issue: #43299 Change-Id: I7bbfd9e3a301e00c9cfbffa06b8f1f6c78a78470 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/254941 Reviewed-by: Ryan Macnak <rmacnak@google.com> Commit-Queue: Alexander Markov <alexmarkov@google.com> Reviewed-by: Slava Egorov <vegorov@google.com>
This reverts commit 9700458. Reason for revert: breaks Dart SDK build using gcc and dart-sdk-linux-main bot. Original change's description: > [vm/compiler] Initial implementation of IL binary serialization > > This change adds binary serialization/deserialization of flow graphs. > It supports all IL instructions and certain objects which can be > referenced from IL instructions. IL binary serialization is a useful > machanism which would allow us to split compilation into multiple parts > in order to parallelize AOT compilation. > > The program structure (libraries/classes/functions/fields) is not > serialized. It is assumed that reader and writer use the same > program structure. > > Caveats: > * FFI callbacks are not supported yet. > * Closure functions are not re-created when reading flow graph. > * Flow graph should be in SSA form (unoptimized flow graphs are not > supported). > * JIT mode is not supported (serializer currently assumes lazy > linking of native methods and empty ICData). > > In order to test IL serialization, --test_il_serialization VM option is > added to serialize and deserialize flow graph before generating code. > > TEST=vm/dart/splay_test now runs with --test_il_serialization. > > TEST=Manual run of vm-kernel-precomp-linux-debug-x64-try with > --test_il_serialization enabled (only ffi tests failed). > > Issue: #43299 > Change-Id: I7bbfd9e3a301e00c9cfbffa06b8f1f6c78a78470 > Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/254941 > Reviewed-by: Ryan Macnak <rmacnak@google.com> > Commit-Queue: Alexander Markov <alexmarkov@google.com> > Reviewed-by: Slava Egorov <vegorov@google.com> TBR=vegorov@google.com,kustermann@google.com,rmacnak@google.com,alexmarkov@google.com Change-Id: Iae4e4868f183815a8fc3cd79597141b3896e23d7 No-Presubmit: true No-Tree-Checks: true No-Try: true Issue: #43299 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/255780 Reviewed-by: Alexander Markov <alexmarkov@google.com> Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com> Commit-Queue: Alexander Markov <alexmarkov@google.com>
This is a reland of commit 9700458 Original change's description: > [vm/compiler] Initial implementation of IL binary serialization > > This change adds binary serialization/deserialization of flow graphs. > It supports all IL instructions and certain objects which can be > referenced from IL instructions. IL binary serialization is a useful > machanism which would allow us to split compilation into multiple parts > in order to parallelize AOT compilation. > > The program structure (libraries/classes/functions/fields) is not > serialized. It is assumed that reader and writer use the same > program structure. > > Caveats: > * FFI callbacks are not supported yet. > * Closure functions are not re-created when reading flow graph. > * Flow graph should be in SSA form (unoptimized flow graphs are not > supported). > * JIT mode is not supported (serializer currently assumes lazy > linking of native methods and empty ICData). > > In order to test IL serialization, --test_il_serialization VM option is > added to serialize and deserialize flow graph before generating code. TEST=vm/dart/splay_test now runs with --test_il_serialization. TEST=Manual run of vm-kernel-precomp-linux-debug-x64-try with --test_il_serialization enabled (only ffi tests failed). TEST=gcc build on dart-sdk-linux-try bot. Issue: #43299 > Change-Id: I7bbfd9e3a301e00c9cfbffa06b8f1f6c78a78470 > Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/254941 > Reviewed-by: Ryan Macnak <rmacnak@google.com> > Commit-Queue: Alexander Markov <alexmarkov@google.com> > Reviewed-by: Slava Egorov <vegorov@google.com> Change-Id: I64ff9747f761496a096371e490ef070a14023256 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/255840 Reviewed-by: Ryan Macnak <rmacnak@google.com> Commit-Queue: Alexander Markov <alexmarkov@google.com>
TEST=tests/ffi/function_callbacks_test.dart TEST=Manual run of vm-kernel-precomp-linux-debug-x64-try with --test_il_serialization enabled. Issue: #43299 Change-Id: Ia57021d9091e8a80de3645cb4723ebdbb5a3d33d Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/256371 Commit-Queue: Alexander Markov <alexmarkov@google.com> Reviewed-by: Daco Harkes <dacoharkes@google.com>
Hi, Please guide me anyway to debug AOT progress? Our bank application build AOT too long. stuck at step frontend_server.dart.snapshot debug without AOT: release with AOT: But problem is frontend_server.dart.snapshot --verbose dont print anything helpful so we can not know where is problem. Is it possible to print what dart file is processing? So we can see where it stuck. Thank you. |
@quyenvsp which Flutter version are you on? (could you post It is probably hitting some pathological case in the TFA. Currently the best way to diagnose that requires a bit of manual work - it's described in #47546 (comment)
/cc @alexmarkov (we should consider allowing users to set |
Thank so much for guide. We have 2 bank application project almost similar (one just little smaller) but TFA time is more different 377947ms >< 1997843ms (~5 time slower). Both do same Ubuntu machine (32Gb of ram), using Flutter stable 2.10.5 Dart 2.16.2 (Dart SDK sources), free memory before build each. While we continue find where is problem, hope fine_project_result.txt We see a huge different 258907308 >< 738198250 invocations queried in cache, but it not detailed yet. Please is these anyway counter per packages/classes? Something like package/class A 1000 invocations, package/class B 2000 invocations,...By this way we easy know where is problem then find more on it. |
@quyenvsp Thank you for the report. These huge compilation times are usually caused by a large amount of auto-generated Dart source code. Do you use Dart source auto-generation in your app? By looking at the logs, it seems like the second app is ~3x times larger (10676547 vs. 28089433 summaries analyzed). That may explain the difference in the compilation times. So far I do not see anything out of the ordinary from those logs, except maybe that in both apps analysis hits the hard limit of 5000 maximum invocations cached per selector. You can try to vary (decrease) constants in the sdk/pkg/vm/lib/transformations/type_flow/analysis.dart Lines 762 to 775 in 17bd00d
Also, you can try to run frontend server with |
hi, I'm facing the same issue, and most of my time is spent on the gen_snapshot, which takes nearly 8 minutes. |
@yaminet1024 how big is your code? You can try to run
You can look at the methods that take longest to compile. If there are some outliers - we would like to know if there is something special about them (e.g. huge autogenerated code is one common suspect). It would be nice to know histogram of compilation timings, how many methods are compiled in total ( |
After some searching I believe this is the right issue to comment on... I am seeing large compile times for a Flutter app using quite a bit of generated serializer code. I'm not sure what is considered big in terms of an app. I am seeing 25 minute release builds, here is output from verbose logs:
@mraleph I will try your suggestions above. So far I have tried to move most of the serializers into their own path dependency in the past, in the hopes it would cache the builds since they change less often than the main app. |
I found that much time was spent on serializers generated from ferry_generator. I managed to reduce some of the generated code with some config and shaved 500s from release build time, an improvement of about 40%. The shell script above has a missing quote somewhere. I gave up on it and wrote some Python to sort the files by time instead. """
/Users/michaelgolfi/development/flutter/bin/cache/artifacts/engine/ios-release/gen_snapshot_arm64 \
--print_precompiler_timings \
--trace-compiler \
--deterministic \
--snapshot_kind=app-aot-assembly \
--assembly=/Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/snapshot_assembly.S \
/Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/app.dill 2>compilation_trace.txt
"""
def process_file(input_file, output_file):
data = []
with open(input_file, "r") as file:
for line in file:
if "-->" in line:
parts = line.strip().split("time:")
if len(parts) > 1:
time_part = (
parts[1].strip().split(" ")[0]
) # Extract the time before 'us'
name_part = parts[0].split("'")[1] # Extract the function name
name_part = name_part.replace(
" ", "_"
) # Replace spaces with underscores
data.append((name_part, int(time_part)))
# Sort data by the time, which is the second item in the tuple
data.sort(key=lambda x: x[1])
with open(output_file, "w") as file:
for name, time in data:
file.write(f"{name} {time}\n")
# Usage
process_file("compilation_trace.txt", "sorted.txt") |
The I extracted a benchmark core from another issue I was looking at before (related to @alexmarkov could you take this for a spin? |
If helpful, I posted a repro on a build_runner issue a couple years ago. |
This is an umbrella issue to track efforts around reduction of critical path when building large Flutter applications in AOT mode
The text was updated successfully, but these errors were encountered: