Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dart AOT performance issue #37455

Open
renatoathaydes opened this issue Jul 6, 2019 · 3 comments

Comments

@renatoathaydes
Copy link

commented Jul 6, 2019

  • Dart SDK Version (dart --version)

Dart VM version: 2.4.0 (Unknown timestamp) on "linux_x64"

  • Whether you are using Windows, MacOSX, or Linux (if applicable)

Linux renato 4.4.0-154-generic #181-Ubuntu SMP Tue Jun 25 05:29:03 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Sample code

The code I am using to benchmark different Dart compilation modes is a simple brainfuck implementation.

This is the bf program I am using for benchmarking.

When running with dart as a script, I get the following output:

➜  dart-bf git:(master) ✗ time dart bin/dart-bf.dart bf/example.bf
dart bin/dart-bf.dart bf/example.bf   3,21s  user 0,13s system 112% cpu 2,956 total
avg shared (code):         0 KB
avg unshared (data/stack): 0 KB
total (sum):               0 KB
max memory:                144 MB
page faults from disk:     0
other page faults:         39866

With dartaotruntime:

➜  dart-bf git:(master) ✗ dart2aot bin/dart-bf.dart dart-bf.aot
➜  dart-bf git:(master) ✗ time dartaotruntime dart-bf.aot bf/example.bf
dartaotruntime dart-bf.aot bf/example.bf   11,42s  user 0,00s system 99% cpu 11,421 total
avg shared (code):         0 KB
avg unshared (data/stack): 0 KB
total (sum):               0 KB
max memory:                15 MB
page faults from disk:     0
other page faults:         2680

With dart --snapshot-kind=kernel:

➜  dart-bf git:(master) ✗ dart --snapshot=main.snapshot --snapshot-kind=kernel bin/dart-bf.dart
➜  dart-bf git:(master) ✗ time dart main.snapshot bf/example.bf
dart main.snapshot bf/example.bf   2,47s  user 0,04s system 106% cpu 2,348 total
avg shared (code):         0 KB
avg unshared (data/stack): 0 KB
total (sum):               0 KB
max memory:                51 MB
page faults from disk:     0
other page faults:         10598

Summary:

Method Time (secs) Memory (MB)
dart script 3.21 144
dartaotruntime 11.42 15
dart --snapshot-kind=kernel 2.47 51

It seems that dartaotruntime has a performance that's just too far from the alternatives. It might be due to it not having a JIT, as the other options? But still, would be reasonable to make further optimizations to get a more predictable performance, I'd think.

BTW: the snapshot performance is amazing, pretty close to Go and Java!

@mraleph

This comment has been minimized.

Copy link
Contributor

commented Jul 8, 2019

Thank you for the detailed bug report and the reproduction.

It is known that depending on the work-load AOT might exhibit worse (sometimes much worse) performance characteristics than JIT.

I did a quick look and I think ultimately there are few reasons why AOT version is slower:

  • TFA fails to infer type of Loop.ops (should be inferred as _GrowableList). (/cc @alexmarkov is this because Program._parseOps is recursive?). As a result we don't really inline anything called on ops, which for example means that the loop below allocates closures and calls forEach dynamically - which makes it considerably more expensive than a normal loop. (JIT would actually inline forEach and eliminate closure allocation).
  @override
  void call(Tape tape) {
    while (tape.current > 0) {
      ops.forEach((op) => op(tape));
    }
  }
  • AOT does not do polymorphic inlining at op.call - meaning that this call becomes a major performance sink compared to JIT which does polymorphic inlining. Realistically I think this can only be addressed if we introduce faster dispatch mechanisms (e.g. virtual / interface dispatch tables) - because doing polymorphic inlining would go against AOT's goal to keep code size under control.

/cc @mkustermann

@mraleph mraleph added the area-vm label Jul 8, 2019

@alexmarkov

This comment has been minimized.

Copy link
Contributor

commented Jul 8, 2019

I can confirm that type of Loop.ops is not inferred because Program._parseOps is recursive. In particular, while analyzing Program._parseOps, the result of recursive call into Program._parseOps is approximated using its static type and then it is passed to the constructor of Loop, where it is used to initialize Loop.ops. So the field gets an approximate type. The approximation in case of recursive calls is suboptimal, but it allows us to cut down compilation time.

However, in this particular case result type of Program._parseOps does not depend on the incoming parameters, so it might be possible to use it for recursive calls even without extra iterations of the analysis.

dart-bot pushed a commit that referenced this issue Jul 10, 2019

[vm/aot/tfa] Improve handling of recursive calls in TFA
In general case, TFA approximates results of recursive calls using static
types.

However, if result type of a function does not depend on the flow inside its
body, it cannot change and it can be used in case of recursive calls
instead of a static type.

This improves micro-benchmark from #37455:
Before: 0m11.506s
After: 0m7.324s

Issue: #37455
Change-Id: I967d7add906c8dbd59dbbea1b993e1b4e1733514
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/108500
Commit-Queue: Alexander Markov <alexmarkov@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
@alexmarkov

This comment has been minimized.

Copy link
Contributor

commented Jul 10, 2019

318a482 fixed the first problem mentioned by @mraleph - actual type of Loop.ops is now inferred.
This improves benchmark on Dart AOT from 11.506s to 7.324s on my machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.