Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRT_InstanceOf uses 49% of total CPU #47680

Open
scheglov opened this issue Nov 11, 2021 · 11 comments
Open

DRT_InstanceOf uses 49% of total CPU #47680

scheglov opened this issue Nov 11, 2021 · 11 comments
Labels
area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. type-performance Issue relates to performance or code size

Comments

@scheglov
Copy link
Contributor

Interestingly, I observe this only when I'm running benchmarks, i.e. run Dart from Dart and drive Dart Analysis Server with requests. Maybe too intensive, and VM does not get time to do some background optimization work?

With var extensions = exportElements.whereType<ExtensionElement>().toList();
image

With

      var extensions = <ExtensionElement>[];
      for (var element in exportElements) {
        if (element is ExtensionElement) {
          extensions.add(element);
        }
      }

image

@scheglov scheglov added area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. type-performance Issue relates to performance or code size labels Nov 11, 2021
@a-siva
Copy link
Contributor

a-siva commented Nov 12, 2021

@scheglov can you provide us the command line to reproduce this situation.
//cc @alexmarkov

@scheglov
Copy link
Contributor Author

You will need https://dart-review.googlesource.com/c/sdk/+/220070

I run it with time dart /Users/scheglov/Source/Dart/sdk.git/sdk/pkg/analysis_server/benchmark/benchmarks.dart run das-flutter --flutter-repository=/Users/scheglov/Source/flutter.

FYI

scheglov@scheglov-macbookpro2:~/Source/Dart/sdk.git/sdk (ss-completion-benchmark-output)$ dart --version
Dart SDK version: 2.16.0-edge.b34f39985971d099994e93012cb3b2319712e6b6 (be) (Wed Nov 10 20:38:34 2021 +0000) on "macos_x64"

@alexmarkov
Copy link
Contributor

In this code whereType<ExtensionElement>().toList() is not fully inlined, so WhereTypeIterator.moveNext ends up testing _source.current is T for an unknown type argument T. This kind of type test currently involves subtype test cache and a slow call into runtime. I'm not sure why cache doesn't work in this case, but it is possible that it simply overflows as cache is shared among all invocations of WhereTypeIterator.moveNext and all whereType with different types.

Manual loop performs element is ExtensionElement for a known class ExtensionElement which is generated much more efficiently.

@alexmarkov
Copy link
Contributor

I tried passing --max_subtype_cache_entries=1000 VM option to the analysis server, and it eliminates DRT_InstanceOf from profile. This means that previously runtime was called due to an overflow of subtype test cache.

@alexmarkov
Copy link
Contributor

I see the following ways of improving performance in this case:

  1. Make sure whereType<Foo>().toList() is fully inlined and type test is T is optimized if type argument T becomes known after inlining. Note that it may increase code size if we always inline all involved methods.
  2. Improve performance of generic is T type tests by generating specialized type testing stubs for is tests (currently they are used only for as tests and parameter type checks).
  3. Improve performance of generic is T type tests by using a more smart caching strategy: we could use a hash table for large caches to avoid linear lookup and make cache size unlimited.

@mraleph @mkustermann @sstrickl WDYT?

@mkustermann
Copy link
Member

To maybe rephrase @alexmarkov 's suggestions (and add one) for improving general is checks:

  • Pay with code size: Similar to TTS we emit specialized machine code per type (possibly composable, which they are currently not)
    => @sstrickl has made some prototype of making TTS work for is checks, we could evaluate the code size cost of that on some of our apps (and the analyzer in this case)
  • Pay with memory: Similar to STC use a faster caching mechanism
    => Using hashing over linear search would improve performance.
    => Allowing unlimited cache size risk having hard-to-debug/reproduce "memory leaks" in real world apps
  • Increase speed of doing the actual type checks: Would be good to understand why they are so slow
    => Possibly by faster calls to our runtime type system (e.g. use FFI calls to avoid heavy native transitions)
    => Possibly by having a runtime type system implementation in generated code (e.g. to avoid the handle overhead we pay in the C++ implementation)

copybara-service bot pushed a commit that referenced this issue Jan 25, 2022
Bug: #47680
Change-Id: I7e90f52e7658eef4b9280f485b457d86ac0f4e9a
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/229820
Reviewed-by: Brian Wilkerson <brianwilkerson@google.com>
Commit-Queue: Konstantin Shcheglov <scheglov@google.com>
@sstrickl
Copy link
Contributor

I expect that https://dart-review.googlesource.com/c/sdk/+/308941 and its recent predecessors have probably improved this situation, since STCs are now larger and also are hash-based at most sizes for quick lookup, though a TTS-like solution to is checks to avoid hitting the STC in the first place (versus the currently generated inline checks, which are limited due to being per-call site) would probably finish it off if we find in the future that InstanceOf is still a chokepoint.

@mkustermann
Copy link
Member

Thanks for the update, @sstrickl !

The one issue we have with <obj> is <type> checks is that - compared to as checks, the STCs will fill up with positive & negative answers. So there can be many different <obj> classes flowing into the check, making the caches very large.

A concrete example that makes the CFE slow (/cc @jensjoha @johnniwinther ) is a stack implementation that's on the hot path (see here):

  Object? pop(NullValue? nullValue) {
    final Object? value = array[--arrayLength];
    ...
    if (value is! NullValue) {
      return value;
    }
    ...
  }

There's many different kinds of objects being pop()ed from the stack (subtypes of NullValue may even be the rare case). So the STC caches would fill up considerably.

Though for this case (and other similar situations) we could do an exhaustive inline cid-range check for the NullValue class - as there's only two types implementing it (the NullValue class itself and enum NullValues implements NullValue<Object>). Currently this is NullValue requires 2 cid-ranges (due to implements) which makes our compiler not optimize it atm - but we can change the compiler to emit exhaustive cid-range checks even if we have more than 1 cid range.

@sstrickl
Copy link
Contributor

sstrickl commented Jul 3, 2023

It actually should do an exhaustive cid-range check for the NullValue class, if I'm reading the block at

if (ranges.length() <= kMaxNumberOfCidRangesToTest) {
correctly, unless the number of cid ranges are greater than kMaxNumberOfCidRangesToTest (currently, 4).

(The fact that we saw it hit the STC, since we needed 0304832 to reduce the overhead until stubs handled hash-based caches, makes it clear that it doesn't and now I'm curious as to why as I'd only expect 2 cid ranges at most here unless I'm forgetting something... maybe there is an additional range for the bottom types as well, but those should be consecutive and thus a single additional range, making it 3 ranges.)

@mkustermann
Copy link
Member

It actually should do an exhaustive cid-range check for the NullValue class, if I'm reading the block at

I think the code you refer to is for FlowGraphCompiler::GenerateCallerChecksForAssertAssignable aka as-checks but the CFE example is using is checks against NullValue

@sstrickl
Copy link
Contributor

sstrickl commented Jul 3, 2023

... right. Yes. Of course. 🤦‍♀️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. type-performance Issue relates to performance or code size
Projects
None yet
Development

No branches or pull requests

5 participants