Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve ui isolate startup of for large Flutter apps #46116

Open
mraleph opened this issue May 24, 2021 · 8 comments
Open

improve ui isolate startup of for large Flutter apps #46116

mraleph opened this issue May 24, 2021 · 8 comments
Assignees
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. type-performance Issue relates to performance or code size

Comments

@mraleph
Copy link
Member

mraleph commented May 24, 2021

This is an umbrella issue to track investigation in ui isolate startup cost for a large Flutter application.

Initial measurements on a Pixel 4 downclocked to 1Ghz (4 cores) reveal around 400-500ms in ReadProgramSnapshot (internal app with 7mb isolate snapshot).

Here are clusters that take longer than 10ms:

E/DartVM  (17186): Size of isolate snapshot = 7144194
E/DartVM  (17186): ROData  41.28 ms
E/DartVM  (17186): >>>STARTUP<<<: ReadAlloc  78.58 ms
E/DartVM  (17186): Array  51.12 ms
E/DartVM  (17186): Code  33.77 ms
E/DartVM  (17186): Class  32.67 ms
E/DartVM  (17186): Function  25.40 ms
E/DartVM  (17186): ObjectPool  15.72 ms
E/DartVM  (17186): Array  23.06 ms
E/DartVM  (17186): >>>STARTUP<<<: ReadFill 238.15 ms
E/DartVM  (17186): Type  34.45 ms
E/DartVM  (17186): FunctionType  19.74 ms
E/DartVM  (17186): >>>STARTUP<<<: PostLoad  72.39 ms
E/DartVM  (17186): >>>STARTUP<<< ReadProgramSnapshot: 440.50 ms

I have investigated PostLoad cost and have found that it is mostly caused by usage of locked operations there.

@mraleph mraleph added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. type-performance Issue relates to performance or code size labels May 24, 2021
@mraleph mraleph self-assigned this May 24, 2021
dart-bot pushed a commit that referenced this issue May 25, 2021
Avoid calling SetTypeTestingStub from PostLoad because it
uses write lock.

Saves ~10-15% on ReadProgramSnapshot on a large Flutter app.

Issue #46116

TEST=ci

Cq-Include-Trybots: luci.dart.try:vm-kernel-precomp-linux-release-x64-try,vm-kernel-precomp-linux-product-x64-try,vm-kernel-precomp-linux-debug-x64-try
Change-Id: If843828661e68f18df19824af204df326bf016a0
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/201180
Auto-Submit: Vyacheslav Egorov <vegorov@google.com>
Commit-Queue: Vyacheslav Egorov <vegorov@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
@rmacnak-google
Copy link
Contributor

I'm surprised by that ratio of ReadAlloc to ReadFill. ReadAlloc is a bunch of bump pointer allocation with no initialization; it ought to be a smaller fraction of ReadFill.

@mraleph
Copy link
Member Author

mraleph commented May 31, 2021

@rmacnak-google I would not be surprised if most of the time is just spent touching and paging memory (given what we see for ROData cluster).

@mraleph
Copy link
Member Author

mraleph commented Jun 3, 2021

Interestingly enough I am seeing different numbers on newer builds:

E/DartVM  (22816): Size of isolate snapshot = 6454403
E/DartVM  (22816): >>>STARTUP<<<: ReadAlloc  39.46 ms
E/DartVM  (22816): Type  12.81 ms
E/DartVM  (22816): Code  32.08 ms
E/DartVM  (22816): Function  30.70 ms
E/DartVM  (22816): FunctionType  14.25 ms
E/DartVM  (22816): Array  26.26 ms
E/DartVM  (22816): Class  21.48 ms
E/DartVM  (22816): ObjectPool  15.96 ms
E/DartVM  (22816): Array  21.15 ms
E/DartVM  (22816): >>>STARTUP<<<: ReadFill 216.14 ms
E/DartVM  (22816): >>>STARTUP<<<: PostLoad  16.59 ms
E/DartVM  (22816): >>>STARTUP<<< ReadProgramSnapshot: 315.58 ms

It seems that ReadAlloc is now significantly faster (the size of the snapshot is also smaller). I wonder what caused this. ReadAlloc is still around 1/5th of ReadFill.

@mraleph
Copy link
Member Author

mraleph commented Sep 16, 2021

It seems number fluctuate quite a bit. Here is a bunch of numbers from Redmi 9A

09-16 13:38:59.955  3443  3480 E DartVM  : Size of isolate snapshot = 11363273
09-16 13:39:00.756  3443  3480 E DartVM  : >>>STARTUP<<<: ReadAlloc 786.67 ms
09-16 13:39:01.462  3443  3480 E DartVM  : >>>STARTUP<<<: ReadFill 704.78 ms
09-16 13:39:01.501  3443  3480 E DartVM  : >>>STARTUP<<<: PostLoad  26.06 ms
09-16 13:39:01.536  3443  3480 E DartVM  : >>>STARTUP<<< ReadProgramSnapshot   1.58 s
09-16 13:39:01.536  3443  3480 E DartVM  : >>>STARTUP<<< ReadProgramSnapshot:   1.58 s
09-16 13:50:38.038  4134  4235 E DartVM  : Size of isolate snapshot = 11363273
09-16 13:50:38.098  4134  4235 E DartVM  : ROData  14.23 ms
09-16 13:50:38.120  4134  4235 E DartVM  : Code  17.46 ms
09-16 13:50:38.152  4134  4235 E DartVM  : Array  10.94 ms
09-16 13:50:38.153  4134  4235 E DartVM  : >>>STARTUP<<<: ReadAlloc  68.95 ms
09-16 13:50:38.243  4134  4235 E DartVM  : Code  78.24 ms
09-16 13:50:38.306  4134  4235 E DartVM  : Function  63.51 ms
09-16 13:50:38.343  4134  4235 E DartVM  : FunctionType  35.16 ms
09-16 13:50:38.465  4134  4235 E DartVM  : Array 112.49 ms
09-16 13:50:38.551  4134  4235 E DartVM  : Class  62.67 ms
09-16 13:50:38.639  4134  4235 E DartVM  : ObjectPool  63.94 ms
09-16 13:50:38.696  4134  4235 E DartVM  : Array  55.86 ms
09-16 13:50:38.716  4134  4235 E DartVM  : TypedData  15.21 ms
09-16 13:50:38.716  4134  4235 E DartVM  : >>>STARTUP<<<: ReadFill 563.36 ms
09-16 13:50:38.766  4134  4235 E DartVM  : >>>STARTUP<<<: PostLoad  33.10 ms
09-16 13:50:38.798  4134  4235 E DartVM  : >>>STARTUP<<< ReadProgramSnapshot 759.18 ms
09-16 13:50:38.798  4134  4235 E DartVM  : >>>STARTUP<<< ReadProgramSnapshot: 759.55 ms
09-16 14:04:10.059  4795  4847 E DartVM  : Size of isolate snapshot = 11363273
09-16 14:04:10.080  4795  4847 E DartVM  : ROData  12.86 ms
09-16 14:04:10.133  4795  4847 E DartVM  : int  52.87 ms
09-16 14:04:10.146  4795  4847 E DartVM  : Type  13.12 ms
09-16 14:04:10.186  4795  4847 E DartVM  : TypeArguments  39.65 ms
09-16 14:04:10.487  4795  4847 E DartVM  : Code 300.13 ms
09-16 14:04:10.502  4795  4847 E DartVM  : Function  15.69 ms
09-16 14:04:10.544  4795  4847 E DartVM  : GrowableObjectArray  11.20 ms
09-16 14:04:10.544  4795  4847 E DartVM  : >>>STARTUP<<<: ReadAlloc 477.34 ms
09-16 14:04:10.555  4795  4847 E DartVM  : Type  10.08 ms
09-16 14:04:10.703  4795  4847 E DartVM  : Code 143.52 ms
09-16 14:04:10.798  4795  4847 E DartVM  : Function  95.17 ms
09-16 14:04:10.854  4795  4847 E DartVM  : Array  47.35 ms
09-16 14:04:10.885  4795  4847 E DartVM  : Class  19.27 ms
09-16 14:04:10.920  4795  4847 E DartVM  : ObjectPool  17.85 ms
09-16 14:04:10.950  4795  4847 E DartVM  : Array  28.93 ms
09-16 14:04:10.952  4795  4847 E DartVM  : >>>STARTUP<<<: ReadFill 407.47 ms
09-16 14:04:10.983  4795  4847 E DartVM  : >>>STARTUP<<<: PostLoad  23.73 ms
09-16 14:04:11.011  4795  4847 E DartVM  : >>>STARTUP<<< ReadProgramSnapshot 951.74 ms
09-16 14:04:11.011  4795  4847 E DartVM  : >>>STARTUP<<< ReadProgramSnapshot: 952.08 ms
09-16 15:14:58.459  6459  6616 E DartVM  : Size of isolate snapshot = 8210925
09-16 15:14:58.515  6459  6616 E DartVM  : ROData  47.94 ms
09-16 15:14:58.535  6459  6616 E DartVM  : Type  14.21 ms
09-16 15:14:58.763  6459  6616 E DartVM  : Instance  45.41 ms
09-16 15:14:58.834  6459  6616 E DartVM  : Class  57.31 ms
09-16 15:14:58.926  6459  6616 E DartVM  : PatchClass  92.04 ms
09-16 15:14:58.968  6459  6616 E DartVM  : ClosureData  41.68 ms
09-16 15:14:58.981  6459  6616 E DartVM  : Script  13.04 ms
09-16 15:14:58.996  6459  6616 E DartVM  : ExceptionHandlers  11.31 ms
09-16 15:14:59.037  6459  6616 E DartVM  : Array  40.31 ms
09-16 15:14:59.059  6459  6616 E DartVM  : TypedData  21.82 ms
09-16 15:14:59.059  6459  6616 E DartVM  : >>>STARTUP<<<: ReadAlloc 592.14 ms
09-16 15:14:59.306  6459  6616 E DartVM  : Code 240.19 ms
09-16 15:14:59.398  6459  6616 E DartVM  : Function  92.23 ms
09-16 15:14:59.413  6459  6616 E DartVM  : FunctionType  14.33 ms
09-16 15:14:59.548  6459  6616 E DartVM  : Array 134.55 ms
09-16 15:14:59.588  6459  6616 E DartVM  : Instance  14.66 ms
09-16 15:14:59.673  6459  6616 E DartVM  : Class  83.56 ms
09-16 15:14:59.696  6459  6616 E DartVM  : Library  14.59 ms
09-16 15:14:59.716  6459  6616 E DartVM  : ObjectPool  18.81 ms
09-16 15:14:59.747  6459  6616 E DartVM  : Array  30.29 ms
09-16 15:14:59.749  6459  6616 E DartVM  : >>>STARTUP<<<: ReadFill 690.36 ms
09-16 15:14:59.787  6459  6616 E DartVM  : >>>STARTUP<<<: PostLoad  26.69 ms
09-16 15:14:59.829  6459  6616 E DartVM  : >>>STARTUP<<< ReadProgramSnapshot   1.37 s
09-16 15:14:59.829  6459  6616 E DartVM  : >>>STARTUP<<< ReadProgramSnapshot:   1.37 s
09-16 15:17:18.136  6946  6991 E DartVM  : Size of isolate snapshot = 8210925
09-16 15:17:18.209  6946  6991 E DartVM  : ROData  61.33 ms
09-16 15:17:18.245  6946  6991 E DartVM  : int  35.83 ms
09-16 15:17:18.265  6946  6991 E DartVM  : Type  20.11 ms
09-16 15:17:18.288  6946  6991 E DartVM  : Code  17.06 ms
09-16 15:17:18.299  6946  6991 E DartVM  : Function  10.55 ms
09-16 15:17:18.314  6946  6991 E DartVM  : FunctionType  15.19 ms
09-16 15:17:18.342  6946  6991 E DartVM  : Array  28.10 ms
09-16 15:17:18.386  6946  6991 E DartVM  : Class  18.03 ms
09-16 15:17:18.395  6946  6991 E DartVM  : >>>STARTUP<<<: ReadAlloc 247.07 ms
09-16 15:17:18.496  6946  6991 E DartVM  : Code  94.92 ms
09-16 15:17:18.541  6946  6991 E DartVM  : Function  44.98 ms
09-16 15:17:18.623  6946  6991 E DartVM  : Array  74.56 ms
09-16 15:17:18.698  6946  6991 E DartVM  : Class  38.61 ms
09-16 15:17:18.745  6946  6991 E DartVM  : ObjectPool  23.62 ms
09-16 15:17:18.776  6946  6991 E DartVM  : Array  30.36 ms
09-16 15:17:18.787  6946  6991 E DartVM  : >>>STARTUP<<<: ReadFill 392.07 ms
09-16 15:17:18.834  6946  6991 E DartVM  : >>>STARTUP<<<: PostLoad  30.27 ms
09-16 15:17:18.926  6946  6991 E DartVM  : >>>STARTUP<<< ReadProgramSnapshot 789.22 ms
09-16 15:17:18.930  6946  6991 E DartVM  : >>>STARTUP<<< ReadProgramSnapshot: 793.71 ms
09-16 15:20:43.078  7406  7448 E DartVM  : Size of isolate snapshot = 8210925
09-16 15:20:43.247  7406  7448 E DartVM  : ROData  76.41 ms
09-16 15:20:43.263  7406  7448 E DartVM  : int  15.68 ms
09-16 15:20:43.274  7406  7448 E DartVM  : Type  10.51 ms
09-16 15:20:43.309  7406  7448 E DartVM  : Code  34.58 ms
09-16 15:20:43.407  7406  7448 E DartVM  : Function  97.96 ms
09-16 15:20:43.437  7406  7448 E DartVM  : FunctionType  29.48 ms
09-16 15:20:43.537  7406  7448 E DartVM  : >>>STARTUP<<<: ReadAlloc 366.53 ms
09-16 15:20:43.575  7406  7448 E DartVM  : Type  37.66 ms
09-16 15:20:43.887  7406  7448 E DartVM  : Code 309.56 ms
09-16 15:20:43.959  7406  7448 E DartVM  : Function  72.40 ms
09-16 15:20:43.971  7406  7448 E DartVM  : FunctionType  10.60 ms
09-16 15:20:44.065  7406  7448 E DartVM  : Array  93.78 ms
09-16 15:20:44.124  7406  7448 E DartVM  : Class  23.16 ms
09-16 15:20:44.143  7406  7448 E DartVM  : TypeParameters  16.87 ms
09-16 15:20:44.186  7406  7448 E DartVM  : ObjectPool  25.09 ms
09-16 15:20:44.218  7406  7448 E DartVM  : Array  31.15 ms
09-16 15:20:44.227  7406  7448 E DartVM  : >>>STARTUP<<<: ReadFill 689.89 ms
09-16 15:20:44.256  7406  7448 E DartVM  : >>>STARTUP<<<: PostLoad  20.81 ms
09-16 15:20:44.307  7406  7448 E DartVM  : >>>STARTUP<<< ReadProgramSnapshot   1.22 s
09-16 15:20:44.307  7406  7448 E DartVM  : >>>STARTUP<<< ReadProgramSnapshot:   1.23 s

I would like to investigate approach where we entirely eliminate a need to deserialise some of these objects and instead have various binary search tables in .rodata.

/cc @mkustermann @alexmarkov

@mkustermann
Copy link
Member

@mraleph Consider updating this bug with instructions on how you measured this.

09-16 15:20:43.887  7406  7448 E DartVM  : Code 309.56 ms

This alone seems very surprising, since @alexmarkov has worked on shaking most Code objects. Was this measured in flutter-release mode?

09-16 15:20:43.959  7406  7448 E DartVM  : Function  72.40 ms

@sstrickl 's work on function shaking might improve this (both closure functions as well as static initializer functions)

@devoncarew
Copy link
Member

In the event that this info helps, here's some timing from a recent trace:

Name                           Wall duration (ms)	Avg Wall duration (ms)	Occurrences
MessageLoop::FlushTasks	        974.255	974.255	1
DartIsolate::CreateRootIsolate	970.848	970.848	1
DartIsolate::CreateDartIsolateGroup 970.709001	970.709001	1
Dart_CreateIsolateGroup	        574.485	574.485	1
InitializeIsolate	        573.255	573.255	1
ReadProgramSnapshot	        572.640	572.641001	1
DartIsolate::InitializeIsolate	396.192	396.192	1
DartIsolate::LoadLibraries	395.922	395.922	1
ReadFill	                361.061	361.061	1
Code	                        111.112	 37.037333	3
Dart_SetField	                108.295	 21.659	5
Function	                104.423	 34.807999	3
PostLoad	                 77.870	 77.870999	1
ReadAlloc	                 77.857	 77.857	1
Instance	                 61.342	  0.00725	8460
Array	                         44.473	  7.412167	6
Class	                         20.669	  6.889999	3
ObjectPool	                 18.062	  6.020667	3
FunctionType	                 17.852	  5.950666	3
Library	                         15.037	  5.012334	3
Field	                         11.957	  3.985999	3
Type	                          7.207	  2.402333	3

I'm happy to get the original trace to you.

copybara-service bot pushed a commit that referenced this issue Oct 5, 2021
This change removes the extra pass over global object pool after AOT
snapshot is loaded by adding extra kSwitchableCallMissEntryPoint and
kMegamorphicCallEntryPoint object pool entry kinds which are handled
during ReadFill phase.

On a low-end phone and large Flutter app compiled in release mode
with dwarf_stack_traces, FullSnapshotReader::ReadProgramSnapshot time
Before: 232.41 ms
After:  202.43 ms (-12.8%)

Also, this change adds PrintTimeScope utility class which can be used
to measure and print time in release mode without timeline and profiling
tools:

ApiErrorPtr FullSnapshotReader::ReadProgramSnapshot() {
  PrintTimeScope tm("FullSnapshotReader::ReadProgramSnapshot");
  ...
}

TEST=ci

Issue: #46116
Change-Id: I42bd46761eac8fc1e52ca695cacd2b86705034d4
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/215500
Reviewed-by: Slava Egorov <vegorov@google.com>
Commit-Queue: Alexander Markov <alexmarkov@google.com>
@a-siva
Copy link
Contributor

a-siva commented Oct 18, 2021

Are there any concrete items folks are working on with regard to this issue ?

@mkustermann
Copy link
Member

Both #41052 and #45138 are indirectly affecting startup (as well as code size and memory) positively due to reducing the number of objects in the AOT snapshot that have to be deserialized.

Though there's more direct work for startup that could happen (e.g. avoid building mapping tables at runtime between PC and stackmaps/pc-descriptors/...): @mraleph is doing some experiments in this direction AFAIK

copybara-service bot pushed a commit that referenced this issue Dec 16, 2021
This change bakes binary search table which maps PC ranges
to corresponding stack maps and Code objects (if still present
in the snapshot) into RO data section of the snapshot - instead
of constructing it at load time.

This allows to considerably reduce amount of work done when
loading Code cluster for programs which have majority of their
Code objects discarded (i.e. in DWARF stack traces mode): as
we no longer write / read any information for discarded Code
objects.

This CL also changes program visitor to deduplicate Code objects
if their instructions are deduplicated in AOT mode. Only a single
Code object can be choose as a representative for the given
PC range so it does not make sense to write multiple Code objects
into the snapshot which refer to the same Instructions.

The overall improvement is hard to quantify but ReadProgramSnapshot
shows the following improvement when starting a large
Flutter application on a slow Android device:

  before  223.55±59.94 (192.02 .. 391.74) ms
  after   178.06±47.03 (151.31 .. 291.34) ms

This CL packs CompressedStackMaps next to the binary search table
itself allowing us to address them via offsets instead of
pointers.

Snapshot sizes are actually affected positively by this change. On
the same large Flutter application I see

  DWARF stack traces on:  -1.34% total SO size
  DWARF stack traces off: -1.63% total SO size

Issue #46116

TEST=ci

Cq-Include-Trybots: luci.dart.try:vm-kernel-precomp-dwarf-linux-product-x64-try,vm-kernel-precomp-linux-debug-simarm64c-try,vm-kernel-precomp-linux-debug-simarm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-linux-product-x64-try,vm-kernel-precomp-linux-release-simarm-try,vm-kernel-precomp-linux-release-simarm64-try,vm-kernel-precomp-linux-release-simarm_x64-try,vm-kernel-precomp-linux-release-x64-try
Change-Id: Ic997045a33daa81ec68df462a0792915885df66b
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/220766
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Slava Egorov <vegorov@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. type-performance Issue relates to performance or code size
Projects
None yet
Development

No branches or pull requests

5 participants