Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the quality of unoptimized code #36409

Open
a-siva opened this issue Apr 1, 2019 · 3 comments

Comments

Projects
None yet
2 participants
@a-siva
Copy link
Contributor

commented Apr 1, 2019

Focus on improving the quality of unoptimized code. Look at code quality and look for quick ways to improve the code

  • avoid unnecessary back to back store/load
  • investigate how method dispatch can be improved

dart-bot pushed a commit that referenced this issue Apr 3, 2019

[vm/compiler] Fix incorrect labeling of stack slots
Rationale:
The "unoptimized code" representation is really a stack
machine that pushes and pops arguments. The intermediate
representation did not show the correct labeling for
"temporaries", in particular around push arguments.
This CL fixes that bug, making the human readable form
of the intermediate much easier to understand. Also,
it allows for proper stack depth computations.

First use:
Sharpen condition on OSR points a bit.
Simple control flow collections will now allow OSR.


#36421
#36409

Change-Id: I8be707af08462a22687b355b747ba85e42431ad5
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/98500
Commit-Queue: Aart Bik <ajcbik@google.com>
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Reviewed-by: Alexander Markov <alexmarkov@google.com>

dart-bot pushed a commit that referenced this issue Apr 18, 2019

[vm, jit] The hottest Smi ops are + < == not + - ==.
Bug: #36409
Change-Id: Id1a6a65b26c95fb3f56bf844943aa09cc4f13a2a
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/99728
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Reviewed-by: Aart Bik <ajcbik@google.com>
Commit-Queue: Ryan Macnak <rmacnak@google.com>

dart-bot pushed a commit that referenced this issue Apr 25, 2019

[vm, jit] Make unoptimized instance calls compatible with megamorphic…
… calls.

This is preparation for switching the call's stub when polymorphism gets too high.

Bug: #26780
Bug: #29294
Bug: #36409
Change-Id: If601ea97a1942a567bee0847afeb3b4e9c11481a
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/99705
Commit-Queue: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Alexander Markov <alexmarkov@google.com>

dart-bot pushed a commit that referenced this issue May 10, 2019

[vm, compiler] Make IA32 megamorphic calls match the other architectu…
…res.

Bug: #26780
Bug: #36409
Change-Id: I7865e6fb033ad2a4b1fd185d38c1b5a4c96dd6e7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/101921
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Commit-Queue: Ryan Macnak <rmacnak@google.com>
@aartbik

This comment has been minimized.

Copy link
Contributor

commented May 11, 2019

I have started exploring some ideas. The first is a basic peephole optimizer, which does not add any significant overhead to compile-time, but reduces code size and possibly runtime. For example,

        ;; t0 <- Constant(#0)
0x7f45d0cb84e4    6a00                   push 0
        ;; StoreLocal(i @-2, t0)
0x7f45d0cb84e6    58                     pop rax
0x7f45d0cb84e7    488945d8               movq [rbp-0x28],rax

becomes the more "compact" (saving one byte in this case, but when combined with others, it may add up)

        ;; t0 <- Constant(#0)
        ;; StoreLocal(i @-2, t0)
0x7f29804f83a3    33c0                   xorl rax,rax
0x7f29804f83a5    488945d8               movq [rbp-0x28],rax

On ARM, the unoptimized code is

      ;; t0 <- Constant(#0)
0x7f4de1286408    d2800000               movz r0, 0x0 
0x7f4de128640c    f81f8de0               strx r0, [r15, #-8] !
       ;; StoreLocal(k @-1, t0)
0x7f4de1286410    f84085e0               ldrx r0, [r15], #8 !
0x7f4de1286414    f81e03a0               strx r0, [fp, #-32]

with peephole, saving 8 bytes

        ;; t0 <- Constant(#0)
0x7f1ff8bc2888    d2800000               movz r0, 0x0 
        ;; Peephole
        ;; StoreLocal(k @-1, t0)
0x7f1ff8bc288c    f81e03a0               strx r0, [fp, #-32]
@aartbik

This comment has been minimized.

Copy link
Contributor

commented May 16, 2019

Performance gains look promising too, even for very basic peephole optimization with window size 1. On a toy function with a loop and some operations, I get the following timings (with unoptimized referring to "--optimization_counter_threshold=-1"):

default JIT            :   0.30s
unoptimized            :   2.14s
unoptimized + peephole :   1.76s

So full optimization is about 7x faster, but peephole optimizations yields about 20% faster performance on unoptimized code.

dart-bot pushed a commit that referenced this issue May 16, 2019

[vm, compiler] Shorter invocation counter check.
Remove the optimization function stub and self function from the constant pool of every unoptimized function.

Remove one PP move from unoptimized frame entry.

Bug: #36409
Change-Id: Idbe37ce36b57dc316a131e6c83742ee2b3df4a0d
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/102660
Commit-Queue: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Aart Bik <ajcbik@google.com>

dart-bot pushed a commit that referenced this issue May 21, 2019

[vm, compiler] Fix Code checked entry offsets.
Function entry points have been subject to block scheduling since 06f9a9e. Block scheduling in AOT never reorders the entries.

Bug: #36409
Bug: #36731
Change-Id: Id6725fed4d9bbd381631fa88c5397435a35acc9a
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/102463
Commit-Queue: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Alexander Markov <alexmarkov@google.com>

dart-bot pushed a commit that referenced this issue May 21, 2019

[dart/vm] Peephole optimizer (window size one) on stack code
Rationale:
A basic peephole optimizer with window size one that avoids
redundant push-pop sequences already results is substantial
savings in code size and runtime, without any noticeable impact
on compile time (the peephole is very, very fast).

Performance:
Golem unoptimized code (which typically runs -90% compared to
optimized code, sees 5-60% improvements

#36409

Change-Id: I08db4b3dbc92377d89340a4969db6e664e54bceb
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/102980
Commit-Queue: Aart Bik <ajcbik@google.com>
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
@aartbik

This comment has been minimized.

Copy link
Contributor

commented May 21, 2019

Next idea to explore is using a very cheap analysis to lower some of the instance calls for basic arithmetic, mainly to make e.g. loop logic faster.

dart-bot pushed a commit that referenced this issue May 21, 2019

[vm, compiler] Unoptimized megamorphic calls.
When an instance call in unoptimized code creates more than FLAG_max_polymorphic_checks cases, switch the call to use a MegamorphicCache instead of ICData. The prevents unbounded collection of type feedback, and gives improvements on microbenchmarks in the 3-8% range for unoptimized code.

It also leads to a loss of target frequency information for the optimizer, leading to different ordering for range checks in polymorphic inlining. This leads to changes on megamorphic microbenchmarks from -31% to +60%, weighted toward the negative end.

In practice the frequency information seems unimportant, as dart2js has 4.01% geomean improvement.

This is a step toward direct monomorphic calls in unoptimized code, which will also make use of the patching and type feedback extraction added here.

Bug: #26780
Bug: #36409
Bug: #36731
Change-Id: I29f53f23b6794c5f5f0db8b8184788cee16fd9c5
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/99270
Reviewed-by: Alexander Markov <alexmarkov@google.com>

dart-bot pushed a commit that referenced this issue May 22, 2019

Revert "[vm, compiler] Unoptimized megamorphic calls."
This reverts commit fde6a59.

Reason for revert: This commit looks to be causing new test failures on dart2js. While there are some odd results on some builders that look like infra failures, some builders (for example, dart2js-minified-strong-linux-x64-d8) show the new failing tests clearly.

Original change's description:
> [vm, compiler] Unoptimized megamorphic calls.
> 
> When an instance call in unoptimized code creates more than FLAG_max_polymorphic_checks cases, switch the call to use a MegamorphicCache instead of ICData. The prevents unbounded collection of type feedback, and gives improvements on microbenchmarks in the 3-8% range for unoptimized code.
> 
> It also leads to a loss of target frequency information for the optimizer, leading to different ordering for range checks in polymorphic inlining. This leads to changes on megamorphic microbenchmarks from -31% to +60%, weighted toward the negative end.
> 
> In practice the frequency information seems unimportant, as dart2js has 4.01% geomean improvement.
> 
> This is a step toward direct monomorphic calls in unoptimized code, which will also make use of the patching and type feedback extraction added here.
> 
> Bug: #26780
> Bug: #36409
> Bug: #36731
> Change-Id: I29f53f23b6794c5f5f0db8b8184788cee16fd9c5
> Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/99270
> Reviewed-by: Alexander Markov <alexmarkov@google.com>

TBR=rmacnak@google.com,alexmarkov@google.com,ajcbik@google.com

Change-Id: Icad46b93cdf8541a00563f49da6b4ac0a4df1ba1
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: #26780, #36409, #36731
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/103440
Reviewed-by: Teagan Strickland <sstrickl@google.com>
Commit-Queue: Teagan Strickland <sstrickl@google.com>

dart-bot pushed a commit that referenced this issue Jun 12, 2019

[vm, compiler] Specialize unoptimized monomorphic and megamorphic calls.
dart-bytecode, arm64:            +4.742% geomean
dart-bytecode-jit-unopt, arm64: +12.73% geomean
dart2js-compile, x64:            +3.635% geomean

In the polymorphic and unlinked cases, call to a stub the does a linear scan against an ICData.

In the monomorphic case, call to a prologue of the expected target function that checks the expected receiver class. There is additional indirection in the JIT version compared to the AOT version to also tick a usage counter so the inliner can make good decisions.

In the megamorphic case, call to a stub that does a hash table lookup against a MegamorphicCache.

Megamorphic call sites face a loss of precision in usage counts. The call site count is not recorded and the usage counter of the target function is used as an approximation.

Monomorphic and megamorphic calls sites are reset to the polymorphic/unlinked state on hot reload.

Monomorphic and megamorphic calls sites do not check the stepping state, so they are reset to the polymorphic/unlinked state when stepping begins and disabled.

Back-edges now increment the usage counter in addition to checking it. This ensures function with loops containing monomorphic calls will eventually cross the optimization threshold.

Fixed backwards use of kMonomorphicEntryOffset and kPolymorphicEntryOffset.

Fixed C stack overflow when bouncing between the KBC interpreter and a simulator.

Bug: #26780
Bug: #36409
Bug: #36731
Change-Id: I78a49cccd962703a459288e71ce246ed845df474
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/102820
Commit-Queue: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.