-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve ui isolate startup of for large Flutter apps #46116
Comments
Avoid calling SetTypeTestingStub from PostLoad because it uses write lock. Saves ~10-15% on ReadProgramSnapshot on a large Flutter app. Issue #46116 TEST=ci Cq-Include-Trybots: luci.dart.try:vm-kernel-precomp-linux-release-x64-try,vm-kernel-precomp-linux-product-x64-try,vm-kernel-precomp-linux-debug-x64-try Change-Id: If843828661e68f18df19824af204df326bf016a0 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/201180 Auto-Submit: Vyacheslav Egorov <vegorov@google.com> Commit-Queue: Vyacheslav Egorov <vegorov@google.com> Reviewed-by: Martin Kustermann <kustermann@google.com>
I'm surprised by that ratio of ReadAlloc to ReadFill. ReadAlloc is a bunch of bump pointer allocation with no initialization; it ought to be a smaller fraction of ReadFill. |
@rmacnak-google I would not be surprised if most of the time is just spent touching and paging memory (given what we see for ROData cluster). |
Interestingly enough I am seeing different numbers on newer builds:
It seems that ReadAlloc is now significantly faster (the size of the snapshot is also smaller). I wonder what caused this. |
It seems number fluctuate quite a bit. Here is a bunch of numbers from Redmi 9A
I would like to investigate approach where we entirely eliminate a need to deserialise some of these objects and instead have various binary search tables in |
@mraleph Consider updating this bug with instructions on how you measured this.
This alone seems very surprising, since @alexmarkov has worked on shaking most
@sstrickl 's work on function shaking might improve this (both closure functions as well as static initializer functions) |
In the event that this info helps, here's some timing from a recent trace:
I'm happy to get the original trace to you. |
This change removes the extra pass over global object pool after AOT snapshot is loaded by adding extra kSwitchableCallMissEntryPoint and kMegamorphicCallEntryPoint object pool entry kinds which are handled during ReadFill phase. On a low-end phone and large Flutter app compiled in release mode with dwarf_stack_traces, FullSnapshotReader::ReadProgramSnapshot time Before: 232.41 ms After: 202.43 ms (-12.8%) Also, this change adds PrintTimeScope utility class which can be used to measure and print time in release mode without timeline and profiling tools: ApiErrorPtr FullSnapshotReader::ReadProgramSnapshot() { PrintTimeScope tm("FullSnapshotReader::ReadProgramSnapshot"); ... } TEST=ci Issue: #46116 Change-Id: I42bd46761eac8fc1e52ca695cacd2b86705034d4 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/215500 Reviewed-by: Slava Egorov <vegorov@google.com> Commit-Queue: Alexander Markov <alexmarkov@google.com>
Are there any concrete items folks are working on with regard to this issue ? |
Both #41052 and #45138 are indirectly affecting startup (as well as code size and memory) positively due to reducing the number of objects in the AOT snapshot that have to be deserialized. Though there's more direct work for startup that could happen (e.g. avoid building mapping tables at runtime between PC and stackmaps/pc-descriptors/...): @mraleph is doing some experiments in this direction AFAIK |
This change bakes binary search table which maps PC ranges to corresponding stack maps and Code objects (if still present in the snapshot) into RO data section of the snapshot - instead of constructing it at load time. This allows to considerably reduce amount of work done when loading Code cluster for programs which have majority of their Code objects discarded (i.e. in DWARF stack traces mode): as we no longer write / read any information for discarded Code objects. This CL also changes program visitor to deduplicate Code objects if their instructions are deduplicated in AOT mode. Only a single Code object can be choose as a representative for the given PC range so it does not make sense to write multiple Code objects into the snapshot which refer to the same Instructions. The overall improvement is hard to quantify but ReadProgramSnapshot shows the following improvement when starting a large Flutter application on a slow Android device: before 223.55±59.94 (192.02 .. 391.74) ms after 178.06±47.03 (151.31 .. 291.34) ms This CL packs CompressedStackMaps next to the binary search table itself allowing us to address them via offsets instead of pointers. Snapshot sizes are actually affected positively by this change. On the same large Flutter application I see DWARF stack traces on: -1.34% total SO size DWARF stack traces off: -1.63% total SO size Issue #46116 TEST=ci Cq-Include-Trybots: luci.dart.try:vm-kernel-precomp-dwarf-linux-product-x64-try,vm-kernel-precomp-linux-debug-simarm64c-try,vm-kernel-precomp-linux-debug-simarm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-linux-product-x64-try,vm-kernel-precomp-linux-release-simarm-try,vm-kernel-precomp-linux-release-simarm64-try,vm-kernel-precomp-linux-release-simarm_x64-try,vm-kernel-precomp-linux-release-x64-try Change-Id: Ic997045a33daa81ec68df462a0792915885df66b Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/220766 Reviewed-by: Alexander Markov <alexmarkov@google.com> Reviewed-by: Martin Kustermann <kustermann@google.com> Commit-Queue: Slava Egorov <vegorov@google.com>
This is an umbrella issue to track investigation in ui isolate startup cost for a large Flutter application.
Initial measurements on a Pixel 4 downclocked to 1Ghz (4 cores) reveal around 400-500ms in ReadProgramSnapshot (internal app with 7mb isolate snapshot).
Here are clusters that take longer than 10ms:
I have investigated
PostLoad
cost and have found that it is mostly caused by usage of locked operations there.The text was updated successfully, but these errors were encountered: