DRT_InstanceOf uses 49% of total CPU #47680

scheglov · 2021-11-11T22:58:37Z

Interestingly, I observe this only when I'm running benchmarks, i.e. run Dart from Dart and drive Dart Analysis Server with requests. Maybe too intensive, and VM does not get time to do some background optimization work?

With var extensions = exportElements.whereType<ExtensionElement>().toList();

With

      var extensions = <ExtensionElement>[];
      for (var element in exportElements) {
        if (element is ExtensionElement) {
          extensions.add(element);
        }
      }

The text was updated successfully, but these errors were encountered:

a-siva · 2021-11-12T00:16:36Z

@scheglov can you provide us the command line to reproduce this situation.
//cc @alexmarkov

scheglov · 2021-11-12T00:31:36Z

You will need https://dart-review.googlesource.com/c/sdk/+/220070

I run it with time dart /Users/scheglov/Source/Dart/sdk.git/sdk/pkg/analysis_server/benchmark/benchmarks.dart run das-flutter --flutter-repository=/Users/scheglov/Source/flutter.

FYI

scheglov@scheglov-macbookpro2:~/Source/Dart/sdk.git/sdk (ss-completion-benchmark-output)$ dart --version
Dart SDK version: 2.16.0-edge.b34f39985971d099994e93012cb3b2319712e6b6 (be) (Wed Nov 10 20:38:34 2021 +0000) on "macos_x64"

alexmarkov · 2021-11-12T18:31:56Z

In this code whereType<ExtensionElement>().toList() is not fully inlined, so WhereTypeIterator.moveNext ends up testing _source.current is T for an unknown type argument T. This kind of type test currently involves subtype test cache and a slow call into runtime. I'm not sure why cache doesn't work in this case, but it is possible that it simply overflows as cache is shared among all invocations of WhereTypeIterator.moveNext and all whereType with different types.

Manual loop performs element is ExtensionElement for a known class ExtensionElement which is generated much more efficiently.

alexmarkov · 2021-11-12T18:58:35Z

I tried passing --max_subtype_cache_entries=1000 VM option to the analysis server, and it eliminates DRT_InstanceOf from profile. This means that previously runtime was called due to an overflow of subtype test cache.

alexmarkov · 2021-11-12T19:18:50Z

I see the following ways of improving performance in this case:

Make sure whereType<Foo>().toList() is fully inlined and type test is T is optimized if type argument T becomes known after inlining. Note that it may increase code size if we always inline all involved methods.
Improve performance of generic is T type tests by generating specialized type testing stubs for is tests (currently they are used only for as tests and parameter type checks).
Improve performance of generic is T type tests by using a more smart caching strategy: we could use a hash table for large caches to avoid linear lookup and make cache size unlimited.

@mraleph @mkustermann @sstrickl WDYT?

mkustermann · 2021-11-15T16:04:30Z

To maybe rephrase @alexmarkov 's suggestions (and add one) for improving general is checks:

Pay with code size: Similar to TTS we emit specialized machine code per type (possibly composable, which they are currently not)
=> @sstrickl has made some prototype of making TTS work for is checks, we could evaluate the code size cost of that on some of our apps (and the analyzer in this case)
Pay with memory: Similar to STC use a faster caching mechanism
=> Using hashing over linear search would improve performance.
=> Allowing unlimited cache size risk having hard-to-debug/reproduce "memory leaks" in real world apps
Increase speed of doing the actual type checks: Would be good to understand why they are so slow
=> Possibly by faster calls to our runtime type system (e.g. use FFI calls to avoid heavy native transitions)
=> Possibly by having a runtime type system implementation in generated code (e.g. to avoid the handle overhead we pay in the C++ implementation)

Bug: #47680 Change-Id: I7e90f52e7658eef4b9280f485b457d86ac0f4e9a Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/229820 Reviewed-by: Brian Wilkerson <brianwilkerson@google.com> Commit-Queue: Konstantin Shcheglov <scheglov@google.com>

sstrickl · 2023-06-24T13:04:50Z

I expect that https://dart-review.googlesource.com/c/sdk/+/308941 and its recent predecessors have probably improved this situation, since STCs are now larger and also are hash-based at most sizes for quick lookup, though a TTS-like solution to is checks to avoid hitting the STC in the first place (versus the currently generated inline checks, which are limited due to being per-call site) would probably finish it off if we find in the future that InstanceOf is still a chokepoint.

mkustermann · 2023-06-26T08:30:06Z

Thanks for the update, @sstrickl !

The one issue we have with <obj> is <type> checks is that - compared to as checks, the STCs will fill up with positive & negative answers. So there can be many different <obj> classes flowing into the check, making the caches very large.

A concrete example that makes the CFE slow (/cc @jensjoha @johnniwinther ) is a stack implementation that's on the hot path (see here):

  Object? pop(NullValue? nullValue) {
    final Object? value = array[--arrayLength];
    ...
    if (value is! NullValue) {
      return value;
    }
    ...
  }

There's many different kinds of objects being pop()ed from the stack (subtypes of NullValue may even be the rare case). So the STC caches would fill up considerably.

Though for this case (and other similar situations) we could do an exhaustive inline cid-range check for the NullValue class - as there's only two types implementing it (the NullValue class itself and enum NullValues implements NullValue<Object>). Currently this is NullValue requires 2 cid-ranges (due to implements) which makes our compiler not optimize it atm - but we can change the compiler to emit exhaustive cid-range checks even if we have more than 1 cid range.

sstrickl · 2023-07-03T12:20:44Z

It actually should do an exhaustive cid-range check for the NullValue class, if I'm reading the block at

sdk/runtime/vm/compiler/backend/flow_graph_compiler.cc

Line 2984 in 85de9c1

if (ranges.length() <= kMaxNumberOfCidRangesToTest) {

correctly, unless the number of cid ranges are greater than kMaxNumberOfCidRangesToTest (currently, 4).

(The fact that we saw it hit the STC, since we needed 0304832 to reduce the overhead until stubs handled hash-based caches, makes it clear that it doesn't and now I'm curious as to why as I'd only expect 2 cid ranges at most here unless I'm forgetting something... maybe there is an additional range for the bottom types as well, but those should be consecutive and thus a single additional range, making it 3 ranges.)

mkustermann · 2023-07-03T12:28:19Z

It actually should do an exhaustive cid-range check for the NullValue class, if I'm reading the block at

I think the code you refer to is for FlowGraphCompiler::GenerateCallerChecksForAssertAssignable aka as-checks but the CFE example is using is checks against NullValue

sstrickl · 2023-07-03T12:29:13Z

... right. Yes. Of course. 🤦‍♀️

scheglov added area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. type-performance Issue relates to performance or code size labels Nov 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRT_InstanceOf uses 49% of total CPU #47680

DRT_InstanceOf uses 49% of total CPU #47680

scheglov commented Nov 11, 2021

a-siva commented Nov 12, 2021

scheglov commented Nov 12, 2021

alexmarkov commented Nov 12, 2021

alexmarkov commented Nov 12, 2021

alexmarkov commented Nov 12, 2021

mkustermann commented Nov 15, 2021

sstrickl commented Jun 24, 2023

mkustermann commented Jun 26, 2023

sstrickl commented Jul 3, 2023 •

edited

mkustermann commented Jul 3, 2023

sstrickl commented Jul 3, 2023

DRT_InstanceOf uses 49% of total CPU #47680

DRT_InstanceOf uses 49% of total CPU #47680

Comments

scheglov commented Nov 11, 2021

a-siva commented Nov 12, 2021

scheglov commented Nov 12, 2021

alexmarkov commented Nov 12, 2021

alexmarkov commented Nov 12, 2021

alexmarkov commented Nov 12, 2021

mkustermann commented Nov 15, 2021

sstrickl commented Jun 24, 2023

mkustermann commented Jun 26, 2023

sstrickl commented Jul 3, 2023 • edited

mkustermann commented Jul 3, 2023

sstrickl commented Jul 3, 2023

sstrickl commented Jul 3, 2023 •

edited