-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce time spent in TFA during AOT compilation #42442
Comments
This change introduces handling of protobufs while doing type flow analysis. Metadata in protobuf message classes is updated dynamically according to the set of called accessors, invalidating and rebuilding TFA summaries as needed. Previously, protobuf-aware tree shaker required the 2nd run of TFA in order to do the actual tree-shaking after protobuf messages are pruned. This significantly increases compilation time. AOT compilation time of a large app (--from-dill): 274s -> 152s New tree shaker is available in kernel compilers under the flag --protobuf-tree-shaker-v2. Issue #42442 Fixes #40785 Change-Id: I4347896737b9b0f7407b845e614dda9ba7621921 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/152100 Commit-Queue: Alexander Markov <alexmarkov@google.com> Reviewed-by: Clement Skau <cskau@google.com> Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
It seems that this is mostly done (both 1 and 2 are addressed to an extent now). Do you think it worth closing this issue? |
1 and 3 are not addressed. Also, the fact that you created another similar issue indicates that we should continue working on compilation time and it is too early to close. If you prefer to keep your issue open, then you can close this one. |
Renamed the issue to reflect the fact that it mostly is concerned with TFA. Let me know if this makes sense. |
This change improves TFA speed by adding * Cache of dispatch targets. * Identical types fast path to union and intersection of set and cone types. * Subset fast path in the union of set types. * More efficient ConcreteType.raw. AOT step 2 (TFA): app #1: 200s -> 140s (-30%) app #2: 208s -> 150s (-27%) Issue: #42442 Issue: #43299 b/154155290 Change-Id: Ie9039a6448a7655d2aed5f5260473c28b1d917d9 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/164480 Reviewed-by: Martin Kustermann <kustermann@google.com> Reviewed-by: Vyacheslav Egorov <vegorov@google.com> Commit-Queue: Alexander Markov <alexmarkov@google.com>
In certain cases involving auto-generated Dart sources there could be a huge number of allocated classes which are subtypes of a certain class. Specializing such cone types to set types, as well as intersection and union operations on such types may be slow and may severely affect compilation time. Also, gradually discovering allocated classes in such cone types may cause a lot of invalidations during analysis. In order to avoid servere degradation of compilation time in such case, this change adds WideConeType which works like a ConeType when number of allocated classes is large, but it doesn't specialize to a SetType and has more efficient but approximate implementation of union and intersection. Uncompressed size of Flutter gallery AOT snapshot (android/arm64): WideConeType approximation for types with >32 allocated subtypes: +0.1176% >64 allocated subtypes: +0.0956% >128 allocated subtypes: +0.0027% For now conservative approximation is used when number of allocated types >128. TFA time of large app #1: 175s -> 119s (-32%) TFA time of large app #2: 211s -> 81s (-61.6%) Snapshot size changes are insignificant. TEST=Stress tested on precomp bots with maxAllocatedTypesInSetSpecializations = 3 and 0. Issue: #42442 Issue: #43299 Change-Id: Idae33205ddda81714e4aeccc7ae292e0164be651 b/154155290, b/177498788, b/177497864 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/179200 Commit-Queue: Alexander Markov <alexmarkov@google.com> Reviewed-by: Vyacheslav Egorov <vegorov@google.com> Reviewed-by: Aske Simon Christensen <askesc@google.com>
While investigating AOT compilation time it could be useful to understand which members were analyzed the most in TFA. This change adds "global.type.flow.print.timings" environment flag to measure and show top TFA summaries (roughly correspond to members) which were analyzed most number of times and which took the most time to analyze. TEST=manual testing Issue #42442 Change-Id: I07d3253d1e6eb390074b7edf7c21686124a938d1 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/179600 Commit-Queue: Alexander Markov <alexmarkov@google.com> Reviewed-by: Aske Simon Christensen <askesc@google.com>
When looking for a dispatch target, also cache not found selectors to avoid repeated lookups. Improves time of AOT compilation step 2 (TFA) on a large Flutter application 189s -> 171s (-9.5%). TEST=ci Issue: #42442 Change-Id: I21686e1f40a09ef62abf010bfa3670615c108942 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/214342 Reviewed-by: Martin Kustermann <kustermann@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Commit-Queue: Alexander Markov <alexmarkov@google.com>
…ClosedWorldClassHierarchy from CFE Previously, TFA used ClosedWorldClassHierarchy from CFE to look for dispatch targets of invocations, and cached those lookups. However, doing lookups via ClosedWorldClassHierarchy for the first time is still slow. Now TFA builds maps of dispatch targets from AST and no longer uses ClosedWorldClassHierarchy for the lookup. Improves time of AOT compilation step 2 (TFA) on a large Flutter application 170s -> 152s (-10.5%). TEST=ci Issue: #42442 Change-Id: I1a22d298e5b2c0ead57c38ddfbf5ebbd1876732f Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/215985 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Slava Egorov <vegorov@google.com>
This change contains a few small improvements which reduce time of type flow analysis (AOT compilation step 2): * Faster construction of _DirectInvocation objects * Eager approximation of arguments of operator== * Eager approximation of arguments of identical Improves time of AOT compilation step 2 (TFA) on a large Flutter application 137s -> 124s (-9.4%). Doesn't affect size of Flutter gallery AOT snapshot. TEST=ci Issue: #42442 Change-Id: I9da0b0e68d1ee8062d86094fb5cdb9462fb7ea6b Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/217741 Reviewed-by: Slava Egorov <vegorov@google.com> Commit-Queue: Alexander Markov <alexmarkov@google.com>
This change introduces Rapid Type Analysis (RTA) and uses it to calculate initial set of allocated classes for Type Flow Analysis (TFA). As a result, TFA converges much faster and AOT compilation time improves. RTA is less precise than TFA, so the set of allocated classes is larger compared to the one calculated by TFA. However, it has only marginal effect on the size of resulting AOT snapshots. Time of AOT compilation step 2 on a large Flutter application 118.652s -> 59.907s (-49.5% / improved by a factor of 1.98x) Snapshot size on armv8: -0.13% Flutter gallery snapshot size in release and release-sizeopt modes armv7 +0.19%, armv8 +0.2% Just in case, RTA can be disabled using --no-rta option. TEST=ci Issue: #42442 Issue: b/154155290 Change-Id: Iffbdabe7d486cad2e138f7592bffcb70474ddc34 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/222500 Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Martin Kustermann <kustermann@google.com> Commit-Queue: Alexander Markov <alexmarkov@google.com> Reviewed-by: Slava Egorov <vegorov@google.com>
We should reduce AOT compilation time as it begins hitting certain limit on an internal large app (b/158101962). There are multiple things we can do:
The vast majority of compilation time is spent in TFA. So we can investigate if we can do the analysis more efficiently.
The app uses protobuf tree shaker and repeats TFA after applying protobuf tree shaker, which means TFA is performed 2 times. We can integrate protobuf-aware tree shaker with TFA more closely and avoid the 2nd round of TFA: Consider integrating protobuf-aware treeshaking in TFA directly instead of running two rounds of TFA #40785
We can split compilation into multiple steps, serializing state after each step and resuming from the serialized state. This will allow us to reduce time spent on each step and avoid hitting the limit.
https://dart-review.googlesource.com/c/sdk/+/150272 added
--from-dill
option which allows to split front-end part of the compilation into a separate step.The text was updated successfully, but these errors were encountered: