Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the quality of unoptimized code #36409

Closed
a-siva opened this issue Apr 1, 2019 · 5 comments
Closed

Improve the quality of unoptimized code #36409

a-siva opened this issue Apr 1, 2019 · 5 comments
Labels
area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. customer-fuchsia type-performance Issue relates to performance or code size

Comments

@a-siva
Copy link
Contributor

a-siva commented Apr 1, 2019

Focus on improving the quality of unoptimized code. Look at code quality and look for quick ways to improve the code

  • avoid unnecessary back to back store/load
  • investigate how method dispatch can be improved
@a-siva a-siva added area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. customer-fuchsia labels Apr 1, 2019
dart-bot pushed a commit that referenced this issue Apr 3, 2019
Rationale:
The "unoptimized code" representation is really a stack
machine that pushes and pops arguments. The intermediate
representation did not show the correct labeling for
"temporaries", in particular around push arguments.
This CL fixes that bug, making the human readable form
of the intermediate much easier to understand. Also,
it allows for proper stack depth computations.

First use:
Sharpen condition on OSR points a bit.
Simple control flow collections will now allow OSR.


#36421
#36409

Change-Id: I8be707af08462a22687b355b747ba85e42431ad5
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/98500
Commit-Queue: Aart Bik <ajcbik@google.com>
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Reviewed-by: Alexander Markov <alexmarkov@google.com>
dart-bot pushed a commit that referenced this issue Apr 18, 2019
Bug: #36409
Change-Id: Id1a6a65b26c95fb3f56bf844943aa09cc4f13a2a
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/99728
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Reviewed-by: Aart Bik <ajcbik@google.com>
Commit-Queue: Ryan Macnak <rmacnak@google.com>
dart-bot pushed a commit that referenced this issue Apr 25, 2019
… calls.

This is preparation for switching the call's stub when polymorphism gets too high.

Bug: #26780
Bug: #29294
Bug: #36409
Change-Id: If601ea97a1942a567bee0847afeb3b4e9c11481a
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/99705
Commit-Queue: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Alexander Markov <alexmarkov@google.com>
dart-bot pushed a commit that referenced this issue May 10, 2019
…res.

Bug: #26780
Bug: #36409
Change-Id: I7865e6fb033ad2a4b1fd185d38c1b5a4c96dd6e7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/101921
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Commit-Queue: Ryan Macnak <rmacnak@google.com>
@aartbik
Copy link
Contributor

aartbik commented May 11, 2019

I have started exploring some ideas. The first is a basic peephole optimizer, which does not add any significant overhead to compile-time, but reduces code size and possibly runtime. For example,

        ;; t0 <- Constant(#0)
0x7f45d0cb84e4    6a00                   push 0
        ;; StoreLocal(i @-2, t0)
0x7f45d0cb84e6    58                     pop rax
0x7f45d0cb84e7    488945d8               movq [rbp-0x28],rax

becomes the more "compact" (saving one byte in this case, but when combined with others, it may add up)

        ;; t0 <- Constant(#0)
        ;; StoreLocal(i @-2, t0)
0x7f29804f83a3    33c0                   xorl rax,rax
0x7f29804f83a5    488945d8               movq [rbp-0x28],rax

On ARM, the unoptimized code is

      ;; t0 <- Constant(#0)
0x7f4de1286408    d2800000               movz r0, 0x0 
0x7f4de128640c    f81f8de0               strx r0, [r15, #-8] !
       ;; StoreLocal(k @-1, t0)
0x7f4de1286410    f84085e0               ldrx r0, [r15], #8 !
0x7f4de1286414    f81e03a0               strx r0, [fp, #-32]

with peephole, saving 8 bytes

        ;; t0 <- Constant(#0)
0x7f1ff8bc2888    d2800000               movz r0, 0x0 
        ;; Peephole
        ;; StoreLocal(k @-1, t0)
0x7f1ff8bc288c    f81e03a0               strx r0, [fp, #-32]

@aartbik
Copy link
Contributor

aartbik commented May 16, 2019

Performance gains look promising too, even for very basic peephole optimization with window size 1. On a toy function with a loop and some operations, I get the following timings (with unoptimized referring to "--optimization_counter_threshold=-1"):

default JIT            :   0.30s
unoptimized            :   2.14s
unoptimized + peephole :   1.76s

So full optimization is about 7x faster, but peephole optimizations yields about 20% faster performance on unoptimized code.

dart-bot pushed a commit that referenced this issue May 16, 2019
Remove the optimization function stub and self function from the constant pool of every unoptimized function.

Remove one PP move from unoptimized frame entry.

Bug: #36409
Change-Id: Idbe37ce36b57dc316a131e6c83742ee2b3df4a0d
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/102660
Commit-Queue: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Aart Bik <ajcbik@google.com>
dart-bot pushed a commit that referenced this issue May 21, 2019
Function entry points have been subject to block scheduling since 06f9a9e. Block scheduling in AOT never reorders the entries.

Bug: #36409
Bug: #36731
Change-Id: Id6725fed4d9bbd381631fa88c5397435a35acc9a
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/102463
Commit-Queue: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Alexander Markov <alexmarkov@google.com>
dart-bot pushed a commit that referenced this issue May 21, 2019
Rationale:
A basic peephole optimizer with window size one that avoids
redundant push-pop sequences already results is substantial
savings in code size and runtime, without any noticeable impact
on compile time (the peephole is very, very fast).

Performance:
Golem unoptimized code (which typically runs -90% compared to
optimized code, sees 5-60% improvements

#36409

Change-Id: I08db4b3dbc92377d89340a4969db6e664e54bceb
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/102980
Commit-Queue: Aart Bik <ajcbik@google.com>
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
@aartbik
Copy link
Contributor

aartbik commented May 21, 2019

Next idea to explore is using a very cheap analysis to lower some of the instance calls for basic arithmetic, mainly to make e.g. loop logic faster.

dart-bot pushed a commit that referenced this issue May 21, 2019
When an instance call in unoptimized code creates more than FLAG_max_polymorphic_checks cases, switch the call to use a MegamorphicCache instead of ICData. The prevents unbounded collection of type feedback, and gives improvements on microbenchmarks in the 3-8% range for unoptimized code.

It also leads to a loss of target frequency information for the optimizer, leading to different ordering for range checks in polymorphic inlining. This leads to changes on megamorphic microbenchmarks from -31% to +60%, weighted toward the negative end.

In practice the frequency information seems unimportant, as dart2js has 4.01% geomean improvement.

This is a step toward direct monomorphic calls in unoptimized code, which will also make use of the patching and type feedback extraction added here.

Bug: #26780
Bug: #36409
Bug: #36731
Change-Id: I29f53f23b6794c5f5f0db8b8184788cee16fd9c5
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/99270
Reviewed-by: Alexander Markov <alexmarkov@google.com>
dart-bot pushed a commit that referenced this issue May 22, 2019
This reverts commit fde6a59.

Reason for revert: This commit looks to be causing new test failures on dart2js. While there are some odd results on some builders that look like infra failures, some builders (for example, dart2js-minified-strong-linux-x64-d8) show the new failing tests clearly.

Original change's description:
> [vm, compiler] Unoptimized megamorphic calls.
> 
> When an instance call in unoptimized code creates more than FLAG_max_polymorphic_checks cases, switch the call to use a MegamorphicCache instead of ICData. The prevents unbounded collection of type feedback, and gives improvements on microbenchmarks in the 3-8% range for unoptimized code.
> 
> It also leads to a loss of target frequency information for the optimizer, leading to different ordering for range checks in polymorphic inlining. This leads to changes on megamorphic microbenchmarks from -31% to +60%, weighted toward the negative end.
> 
> In practice the frequency information seems unimportant, as dart2js has 4.01% geomean improvement.
> 
> This is a step toward direct monomorphic calls in unoptimized code, which will also make use of the patching and type feedback extraction added here.
> 
> Bug: #26780
> Bug: #36409
> Bug: #36731
> Change-Id: I29f53f23b6794c5f5f0db8b8184788cee16fd9c5
> Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/99270
> Reviewed-by: Alexander Markov <alexmarkov@google.com>

TBR=rmacnak@google.com,alexmarkov@google.com,ajcbik@google.com

Change-Id: Icad46b93cdf8541a00563f49da6b4ac0a4df1ba1
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: #26780, #36409, #36731
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/103440
Reviewed-by: Teagan Strickland <sstrickl@google.com>
Commit-Queue: Teagan Strickland <sstrickl@google.com>
dart-bot pushed a commit that referenced this issue Jun 12, 2019
dart-bytecode, arm64:            +4.742% geomean
dart-bytecode-jit-unopt, arm64: +12.73% geomean
dart2js-compile, x64:            +3.635% geomean

In the polymorphic and unlinked cases, call to a stub the does a linear scan against an ICData.

In the monomorphic case, call to a prologue of the expected target function that checks the expected receiver class. There is additional indirection in the JIT version compared to the AOT version to also tick a usage counter so the inliner can make good decisions.

In the megamorphic case, call to a stub that does a hash table lookup against a MegamorphicCache.

Megamorphic call sites face a loss of precision in usage counts. The call site count is not recorded and the usage counter of the target function is used as an approximation.

Monomorphic and megamorphic calls sites are reset to the polymorphic/unlinked state on hot reload.

Monomorphic and megamorphic calls sites do not check the stepping state, so they are reset to the polymorphic/unlinked state when stepping begins and disabled.

Back-edges now increment the usage counter in addition to checking it. This ensures function with loops containing monomorphic calls will eventually cross the optimization threshold.

Fixed backwards use of kMonomorphicEntryOffset and kPolymorphicEntryOffset.

Fixed C stack overflow when bouncing between the KBC interpreter and a simulator.

Bug: #26780
Bug: #36409
Bug: #36731
Change-Id: I78a49cccd962703a459288e71ce246ed845df474
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/102820
Commit-Queue: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Alexander Markov <alexmarkov@google.com>
@aartbik aartbik assigned bkonyi and aartbik and unassigned bkonyi and aartbik Sep 11, 2019
@fizaaluthra fizaaluthra self-assigned this Sep 12, 2019
@mraleph mraleph added type-performance Issue relates to performance or code size and removed vm-perf-or-size-improvement labels Sep 18, 2019
dart-bot pushed a commit that referenced this issue Oct 24, 2019
…tion logic

Rationale:
1) Centralize constant expression evaluation logic.
2) Avoid code duplication with constant folding in unoptimized code.

Bug:
#36409


Change-Id: I0ee4916e9e731df3dd8eb30dd5f182bc40e1dfa3
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/122172
Reviewed-by: Aart Bik <ajcbik@google.com>
Commit-Queue: Fizaa Luthra <fizaaluthra@google.com>
tekknolagi pushed a commit to tekknolagi/dart-assembler that referenced this issue Nov 3, 2020
… calls.

This is preparation for switching the call's stub when polymorphism gets too high.

Bug: dart-lang#26780
Bug: dart-lang#29294
Bug: dart-lang#36409
Change-Id: If601ea97a1942a567bee0847afeb3b4e9c11481a
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/99705
Commit-Queue: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Alexander Markov <alexmarkov@google.com>
tekknolagi pushed a commit to tekknolagi/dart-assembler that referenced this issue Nov 3, 2020
Remove the optimization function stub and self function from the constant pool of every unoptimized function.

Remove one PP move from unoptimized frame entry.

Bug: dart-lang#36409
Change-Id: Idbe37ce36b57dc316a131e6c83742ee2b3df4a0d
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/102660
Commit-Queue: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Aart Bik <ajcbik@google.com>
tekknolagi pushed a commit to tekknolagi/dart-assembler that referenced this issue Nov 3, 2020
Function entry points have been subject to block scheduling since 06f9a9e. Block scheduling in AOT never reorders the entries.

Bug: dart-lang#36409
Bug: dart-lang#36731
Change-Id: Id6725fed4d9bbd381631fa88c5397435a35acc9a
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/102463
Commit-Queue: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Alexander Markov <alexmarkov@google.com>
tekknolagi pushed a commit to tekknolagi/dart-assembler that referenced this issue Nov 3, 2020
dart-bytecode, arm64:            +4.742% geomean
dart-bytecode-jit-unopt, arm64: +12.73% geomean
dart2js-compile, x64:            +3.635% geomean

In the polymorphic and unlinked cases, call to a stub the does a linear scan against an ICData.

In the monomorphic case, call to a prologue of the expected target function that checks the expected receiver class. There is additional indirection in the JIT version compared to the AOT version to also tick a usage counter so the inliner can make good decisions.

In the megamorphic case, call to a stub that does a hash table lookup against a MegamorphicCache.

Megamorphic call sites face a loss of precision in usage counts. The call site count is not recorded and the usage counter of the target function is used as an approximation.

Monomorphic and megamorphic calls sites are reset to the polymorphic/unlinked state on hot reload.

Monomorphic and megamorphic calls sites do not check the stepping state, so they are reset to the polymorphic/unlinked state when stepping begins and disabled.

Back-edges now increment the usage counter in addition to checking it. This ensures function with loops containing monomorphic calls will eventually cross the optimization threshold.

Fixed backwards use of kMonomorphicEntryOffset and kPolymorphicEntryOffset.

Fixed C stack overflow when bouncing between the KBC interpreter and a simulator.

Bug: dart-lang#26780
Bug: dart-lang#36409
Bug: dart-lang#36731
Change-Id: I78a49cccd962703a459288e71ce246ed845df474
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/102820
Commit-Queue: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Alexander Markov <alexmarkov@google.com>
@bkonyi
Copy link
Contributor

bkonyi commented Feb 23, 2021

I'm not actively working on this, so I'm going to unassign myself. cc @a-siva and @mraleph to determine a new owner or close.

@bkonyi bkonyi removed their assignment Feb 23, 2021
@a-siva
Copy link
Contributor Author

a-siva commented Feb 24, 2021

I am going to close this as the original issue of ephemeral apps is not being worked on.

@a-siva a-siva closed this as completed Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. customer-fuchsia type-performance Issue relates to performance or code size
Projects
None yet
Development

No branches or pull requests

5 participants