Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce time spent in TFA during AOT compilation #42442

Open
2 of 3 tasks
alexmarkov opened this issue Jun 22, 2020 · 3 comments
Open
2 of 3 tasks

Reduce time spent in TFA during AOT compilation #42442

alexmarkov opened this issue Jun 22, 2020 · 3 comments
Assignees
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. P3 A lower priority bug or feature request triaged Issue has been triaged by sub team vm-tfa

Comments

@alexmarkov
Copy link
Contributor

alexmarkov commented Jun 22, 2020

We should reduce AOT compilation time as it begins hitting certain limit on an internal large app (b/158101962). There are multiple things we can do:

  • The vast majority of compilation time is spent in TFA. So we can investigate if we can do the analysis more efficiently.

  • The app uses protobuf tree shaker and repeats TFA after applying protobuf tree shaker, which means TFA is performed 2 times. We can integrate protobuf-aware tree shaker with TFA more closely and avoid the 2nd round of TFA: Consider integrating protobuf-aware treeshaking in TFA directly instead of running two rounds of TFA #40785

  • We can split compilation into multiple steps, serializing state after each step and resuming from the serialized state. This will allow us to reduce time spent on each step and avoid hitting the limit.
    https://dart-review.googlesource.com/c/sdk/+/150272 added --from-dill option which allows to split front-end part of the compilation into a separate step.

@alexmarkov alexmarkov added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. vm-tfa labels Jun 22, 2020
@alexmarkov alexmarkov self-assigned this Jun 22, 2020
dart-bot pushed a commit that referenced this issue Jun 23, 2020
This change introduces handling of protobufs while doing type flow
analysis. Metadata in protobuf message classes is updated dynamically
according to the set of called accessors, invalidating and rebuilding
TFA summaries as needed.

Previously, protobuf-aware tree shaker required the 2nd run of TFA
in order to do the actual tree-shaking after protobuf messages are
pruned. This significantly increases compilation time.

AOT compilation time of a large app (--from-dill): 274s -> 152s

New tree shaker is available in kernel compilers under the flag
--protobuf-tree-shaker-v2.

Issue #42442
Fixes #40785

Change-Id: I4347896737b9b0f7407b845e614dda9ba7621921
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/152100
Commit-Queue: Alexander Markov <alexmarkov@google.com>
Reviewed-by: Clement Skau <cskau@google.com>
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
@mraleph
Copy link
Member

mraleph commented Sep 2, 2020

It seems that this is mostly done (both 1 and 2 are addressed to an extent now). Do you think it worth closing this issue?

@alexmarkov
Copy link
Contributor Author

1 and 3 are not addressed. Also, the fact that you created another similar issue indicates that we should continue working on compilation time and it is too early to close. If you prefer to keep your issue open, then you can close this one.

@mraleph mraleph changed the title Reduce AOT compilation time Reduce time spent in TFA during AOT compilation Sep 3, 2020
@mraleph
Copy link
Member

mraleph commented Sep 3, 2020

Renamed the issue to reflect the fact that it mostly is concerned with TFA. Let me know if this makes sense.

dart-bot pushed a commit that referenced this issue Sep 25, 2020
This change improves TFA speed by adding
* Cache of dispatch targets.
* Identical types fast path to union and intersection of set and cone
  types.
* Subset fast path in the union of set types.
* More efficient ConcreteType.raw.

AOT step 2 (TFA):
app #1: 200s -> 140s (-30%)
app #2: 208s -> 150s (-27%)

Issue: #42442
Issue: #43299
b/154155290

Change-Id: Ie9039a6448a7655d2aed5f5260473c28b1d917d9
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/164480
Reviewed-by: Martin Kustermann <kustermann@google.com>
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
Commit-Queue: Alexander Markov <alexmarkov@google.com>
dart-bot pushed a commit that referenced this issue Jan 19, 2021
In certain cases involving auto-generated Dart sources there could be a
huge number of allocated classes which are subtypes of a certain class.
Specializing such cone types to set types, as well as intersection and
union operations on such types may be slow and may severely affect
compilation time. Also, gradually discovering allocated classes in
such cone types may cause a lot of invalidations during analysis.

In order to avoid servere degradation of compilation time in such case,
this change adds WideConeType which works like a ConeType when number of
allocated classes is large, but it doesn't specialize to a SetType and
has more efficient but approximate implementation of union and
intersection.

Uncompressed size of Flutter gallery AOT snapshot (android/arm64):
WideConeType approximation for types with
>32 allocated subtypes: +0.1176%
>64 allocated subtypes: +0.0956%
>128 allocated subtypes: +0.0027%

For now conservative approximation is used when number of allocated
types >128.

TFA time of large app #1: 175s -> 119s (-32%)
TFA time of large app #2: 211s -> 81s (-61.6%)
Snapshot size changes are insignificant.

TEST=Stress tested on precomp bots with
maxAllocatedTypesInSetSpecializations = 3 and 0.

Issue: #42442
Issue: #43299
Change-Id: Idae33205ddda81714e4aeccc7ae292e0164be651
b/154155290, b/177498788, b/177497864
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/179200
Commit-Queue: Alexander Markov <alexmarkov@google.com>
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
Reviewed-by: Aske Simon Christensen <askesc@google.com>
dart-bot pushed a commit that referenced this issue Jan 19, 2021
While investigating AOT compilation time it could be useful to
understand which members were analyzed the most in TFA.

This change adds "global.type.flow.print.timings" environment flag to
measure and show top TFA summaries (roughly correspond to members)
which were analyzed most number of times and which took the most
time to analyze.

TEST=manual testing

Issue #42442

Change-Id: I07d3253d1e6eb390074b7edf7c21686124a938d1
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/179600
Commit-Queue: Alexander Markov <alexmarkov@google.com>
Reviewed-by: Aske Simon Christensen <askesc@google.com>
copybara-service bot pushed a commit that referenced this issue Sep 24, 2021
When looking for a dispatch target, also cache not found selectors
to avoid repeated lookups.

Improves time of AOT compilation step 2 (TFA) on a large
Flutter application 189s -> 171s (-9.5%).

TEST=ci

Issue: #42442
Change-Id: I21686e1f40a09ef62abf010bfa3670615c108942
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/214342
Reviewed-by: Martin Kustermann <kustermann@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Commit-Queue: Alexander Markov <alexmarkov@google.com>
copybara-service bot pushed a commit that referenced this issue Oct 8, 2021
…ClosedWorldClassHierarchy from CFE

Previously, TFA used ClosedWorldClassHierarchy from CFE to look for
dispatch targets of invocations, and cached those lookups. However,
doing lookups via ClosedWorldClassHierarchy for the first time is still
slow. Now TFA builds maps of dispatch targets from AST and no longer
uses ClosedWorldClassHierarchy for the lookup.

Improves time of AOT compilation step 2 (TFA) on a large
Flutter application 170s -> 152s (-10.5%).

TEST=ci

Issue: #42442
Change-Id: I1a22d298e5b2c0ead57c38ddfbf5ebbd1876732f
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/215985
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
copybara-service bot pushed a commit that referenced this issue Oct 26, 2021
This change contains a few small improvements which reduce
time of type flow analysis (AOT compilation step 2):

* Faster construction of _DirectInvocation objects
* Eager approximation of arguments of operator==
* Eager approximation of arguments of identical

Improves time of AOT compilation step 2 (TFA) on a large
Flutter application 137s -> 124s (-9.4%).

Doesn't affect size of Flutter gallery AOT snapshot.

TEST=ci
Issue: #42442

Change-Id: I9da0b0e68d1ee8062d86094fb5cdb9462fb7ea6b
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/217741
Reviewed-by: Slava Egorov <vegorov@google.com>
Commit-Queue: Alexander Markov <alexmarkov@google.com>
copybara-service bot pushed a commit that referenced this issue Dec 10, 2021
This change introduces Rapid Type Analysis (RTA) and uses it to
calculate initial set of allocated classes for Type Flow Analysis (TFA).
As a result, TFA converges much faster and AOT compilation time
improves.

RTA is less precise than TFA, so the set of allocated classes is
larger compared to the one calculated by TFA. However, it has only
marginal effect on the size of resulting AOT snapshots.

Time of AOT compilation step 2 on a large Flutter application
118.652s -> 59.907s (-49.5% / improved by a factor of 1.98x)
Snapshot size on armv8: -0.13%

Flutter gallery snapshot size in release and release-sizeopt modes
armv7 +0.19%, armv8 +0.2%

Just in case, RTA can be disabled using --no-rta option.

TEST=ci
Issue: #42442
Issue: b/154155290

Change-Id: Iffbdabe7d486cad2e138f7592bffcb70474ddc34
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/222500
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Alexander Markov <alexmarkov@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
@alexmarkov alexmarkov added P3 A lower priority bug or feature request triaged Issue has been triaged by sub team labels Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. P3 A lower priority bug or feature request triaged Issue has been triaged by sub team vm-tfa
Projects
None yet
Development

No branches or pull requests

2 participants