Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM: 25% of dart2js run time in MegamorphicLookup #26017

Open
rakudrama opened this issue Mar 16, 2016 · 10 comments
Open

VM: 25% of dart2js run time in MegamorphicLookup #26017

rakudrama opened this issue Mar 16, 2016 · 10 comments
Labels
area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. type-performance Issue relates to performance or code size

Comments

@rakudrama
Copy link
Member

Run dart2js on a large program with this command:

DART_VM_OPTIONS='--observe=8181 --profile-period=5000 --pause-isolates-on-exit --old_gen_growth_rate=9999' sdk/bin/dart2js ~/app/main.dart -v --out=g1.js

At isolate exit, get the cpu profile, sort by Tags: VM > User

75% of time is spend in "Dart"
Of this time, 34% is in "Type Inference"
Of this time, 31% is in [Stub] MegamorphicLookup

This seems like an excessive overhead.

@a-siva

@rakudrama rakudrama added area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. type-bug Incorrect behavior (everything from a crash to more subtle misbehavior) labels Mar 16, 2016
@fsc8000
Copy link
Contributor

fsc8000 commented Mar 19, 2016

Apart from making megamorphic calls faster in the VM: Is there any way to avoid some of these megamorphic call sites in dart2js? - They will always be slower than monomorphic calls.

@bwilkerson
Copy link
Member

Note that this strongly impacts the dartanalyzer and the analysis server as well. Between those three tools, it impacts the lives of every Dart developer every day in some way or another.

As I understand it, at least one source of these megamorphic call sites is as follows. Design a class hierarchy in which a method defined at the root of the hierarchy is overridden in a large enough number of subclasses, then write a general method that can, and does, take an instance of any class in that hierarchy and invoke this method. (By the way, how many implementations of a method does it take to make it megamorphic?)

If that's true, then, yes, there might be a way of avoiding some of them, but only by writing worse code. Support for polymorphism is near the heart of what it means to be an object-oriented language with inheritance. Anything you can do to improve the performance of these cases would benefit every developer every day.

@rakudrama
Copy link
Member Author

@fsc8000 I can get rid of some of them by copying methods from mixins and base classes to the classes containing definitions of the other members used by the method.
I would rather the runtime do that when it can tell it is worthwhile.

Consider:

abstract class ListMixin<E> implements List<E> {
  // int get length;  in List interface.
  bool get isEmpty => length == 0;
  bool get isNotEmpty => !isEmpty;

Once ListMixin is used above some number of times, with some clients defining isEmpty, ListMixin.isNotEmpty contains a megamorphic call to isEmpty which contains a megamorphic call to length.
We have the equivalent structure in our Element model.

@rakudrama
Copy link
Member Author

I can help categorize cases of MegamorphicLookup:

[A]. Calls from a method in a base class to a method that is defined on one or more derived classes.
[B]. Calls from a method in a mixin class to method provided at or below a mixin application.
These are really the same issue. We are failing to capitalize on information inherent in the receiver. It can be very effective to manually clone the caller to the class that defines the callees. The runtime should do this for small methods.

[C]. Overgeneralization. (This is what Brian describes)

class C {
   foo() {...}
}
class C1 extends C {}
...
class C20 extends C3 {}

method(C p) {
   p.foo();
}

None of the dozen or so subclasses of C declares anything called foo.
What I see is that method gets optimized after seeing a few classes, gets deoptimized, and then recompiled. The recompiled code contains the MegamorphicLookup.
The problem is that there is no intermediate step between a few classes and all classes (megamorphic call). There is a huge hint in the declared type. If the call is monomorphic for the declared type, develop an efficient predicate for the 'is C' check and widen the test to that. In many cases of OO code, 'is C' could be a range check.

Given that p.foo is monomorphic, the initial compilation should have widened the receiver check as much as possible and prevented the deoptimization.

In dart2js, WorkQueue.add has three field accesses that are each a MegamorphicLookup. They should be single x64 load and store instructions. The cid values in the megamorphic cache are {2870-1, 2873-5, 2877-9, 2881-2887, 2889}. The holes are probably abstract classes in the hierarchy.

[D] Calling get:hashCode and ==.
3% of dart2js runtime is in MegamorphicLookup called from hashtable implementations to find get:hashCode.
It is much harder to estimate == since it is called from so many places.
MegamorphicLookup of toString is 15% of the time in StringBuffer.write
Perhaps there ought to be a more vtable-like lookup of these methods defined on Object.


An example that combines [A] and [D] is using small integers as hash table keys.

class Object {
  int get hashCode => _identityHashCode;
}
class _Smi {
  int get _identityHashCode => this;
}
the hashtable calls:
  MegamorphicLooup (get:hashCode)
  Object.hashCode
Object.hashCode calls:
  MegamorphicLookup (get:_identityHashCode)
  _Smi._identityHashCode

If the VM had cloned Object.hashCode into _Smi, Object.hashCode@_Smi could be optimized by inlining _Smi._identityHashCode.
Tracking polymorphic call targets might be a reasonable way of deciding which methods should be considered cloning to the target class.

@rakudrama
Copy link
Member Author

dartfmt (third_party/pkg_tested/dart_style/bin/format.dart) has some interesting case.
Run it on the 1.8M file sdk/lib/html/dartium/html_dartium.dart

dartfmt spends 28% of Dart time in MegamorphicLookup. @munificent

10% of the time is in MegamorphicLookup for get:hashCode. This is issue [D].
(Another 3% is spent in the hash function which returns a field and would be faster if frameless.)

There is a closure that calls MegamorphicLookup accounting for 5% of total Dart time.

(outer) {
      if (!outer.splitsOnInnerRules) return;
      rule.imply(outer);
    }

There is a 'Rule' hierarchy of 7 classes
Both outer and rule only ever are subclasses of Rule.
get:splitsOnInnerRules and imply() both use megamorphic caches.
There is only one definition of imply(), so this is like case [C] above.

There is a default implementation of get splitsOnInnerRules => true and and one override (2 classes) that returns an adjacently declared field. These two implementations are inlined when outer has only three classes are known, but the code gets deoptimized and five classes causes a megamorphic lookup.
There are no type declarations to help here but the class check and dispatch should have been extended to all classes in the same hierarchy that use the definitions in the inline cache.

@fsc8000 fsc8000 self-assigned this Apr 2, 2016
@rakudrama
Copy link
Member Author

More details about the MegamorphicLookup in dartfmt, at .splitsOnInnerRules.

(outer) {
      if (!outer.splitsOnInnerRules) return;
      rule.imply(outer);
    }

There are 5 classes in the ICData and megamorphic cache (one class is not instantiated, one class is abstract).
There are two targets.
The class-ids look a little bit irregular:

cid Class In cache? Notes
1572 ::
1573 ArgumentRule abstract
1574 PositionalRule Yes
1575 NamedRule Yes
1576 ::
1577 CombinatorRule Yes
1578 ::
1579 MetadataRule not instantiated in this run
1580 ::
1581 Rule Yes base class, not abstract
1582 ::
1583 TypeArgumentRule Yes
1584 ::

The :: names are the classes that implement top-level library scopes. This is because the class hierarchy is spread over several libraries.
What we can learn from this is that class hierarchies do not currently always have nice ranges of class id values.

@fsc8000
Copy link
Contributor

fsc8000 commented Apr 4, 2016

Thanks. The issue with :: classes should be easily addressable. In this case though there would still be a hole in the range because of MetaDataRule. We'd need to expand the set of cids using CHA to make it a 100% dense range.

btw: My CL (f6c19ee) brought time spent megamorphic lookup for dartfmt down from 28% to 22%.

@fsc8000
Copy link
Contributor

fsc8000 commented Apr 6, 2016

The improvement with f6c19ee on dartfmt is around 5%.

Time before and after my change:

5.235
5.289
5.288
5.230
5.211
5.298
5.228
5.282
5.235
5.242

4.983
4.982
5.009
5.026
5.027
4.962
5.018
4.957
5.037
5.053

@rakudrama
Copy link
Member Author

I have opened separate tracking issues for each example that I have seen

#26217
#26218
#26219
#26220
#26221
#26222
#26223

@zanderso zanderso added type-performance Issue relates to performance or code size and removed type-bug Incorrect behavior (everything from a crash to more subtle misbehavior) labels Jun 23, 2016
@leafpetersen
Copy link
Member

See also #26858

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. type-performance Issue relates to performance or code size
Projects
None yet
Development

No branches or pull requests

5 participants