Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF8.decode is slow (performance) #31954

Closed
Hixie opened this issue Jan 21, 2018 · 18 comments
Closed

UTF8.decode is slow (performance) #31954

Hixie opened this issue Jan 21, 2018 · 18 comments
Assignees
Labels
area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends.

Comments

@Hixie
Copy link
Contributor

Hixie commented Jan 21, 2018

UTF8.decode seems surprisingly slow for large buffers.

Using this Flutter program:

import 'dart:async';
import 'dart:convert';
import 'dart:typed_data';

import 'package:flutter/services.dart';

Future<Null> main() async {
  ByteData data = await rootBundle.load('LICENSE');
  for (int i = 1024; i < data.lengthInBytes; i += 1024) {
    final Stopwatch watch = new Stopwatch()..start();
    UTF8.decode(data.buffer.asUint8List(0, i));
    int time = watch.elapsedMilliseconds;
    print('${"$i bytes".padLeft(15)} ${"${time}ms".padLeft(10)} ${(1000 / 1024 * i / time).toStringAsFixed(1)} KB per second');
  }
}

...I find that on a Pixel 2 XL, 4KB is parsed in about 1ms, 40KB in about 12ms, and 400KB in about 135ms. This is only about 3MB per second. Then again, I'm not sure what would be considered good throughput on this device. Maybe this is as good as we can get?

Presumably this could be fixed in the most radical fashion by supporting UTF-8 as a native data type.

@mraleph
Copy link
Member

mraleph commented Jan 21, 2018

All of the decoding is implemented in Dart so performance suffers in AOT mode.

JIT mode (debug) does around 15-20 MB per second vs 1.7-2MB in AOT mode on Nexus 5X.

  • Short term this can be addressed by moving UTF8 decoding into C++, if it is a bottleneck.

We expect things to become better in AOT mode post Dart 2 transition as AOT code quality should improve considerably. However it is not trivial to reach the parity with JIT on this benchmark.

LICENSE is a Latin1 file so it essentially tests how fast we can determine that and create a string out of it.

Most important loops for the benchmark are in scanOneByteCharacters, _scanCodeUnits and _createOneByteString.

  • Note: _scanCodeUnits does thing that was already done by scanOneByteCharacters so it is obviously wasteful and needs to be fixed first thing.

Here is how all those functions look:

    int scanOneByteCharacters(List<int> units, int from) {
      final to = endIndex;
      final mask = _ONE_BYTE_LIMIT;
      for (var i = from; i < to; i++) {
        final unit = units[i];
        if ((unit & mask) != unit) return i - from;
      }
      return to - from;
    }

List<int> units does not really tell much to the AOT compiler (even in Dart 2 mode) because there are multiple types implementing lists, while JIT is capable of establishing that units is a Uint8 view (by observing the type feedback collected in runtime) and completely inlining units[i] because the code is monomorphic.

In Dart 2 mode we might be able to establish the same thing using type flow analysis, but I am cautiously optimistic about that because it is likely that people invoke UTF8.decode on various List<int> implementations.

It might be that we need to implement something heavier as a real solution - e.g. some mechanism to duplicate / "version" performance sensitive code that manipulates typed lists.

@mraleph mraleph added the area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. label Jan 21, 2018
@rakudrama
Copy link
Member

We have already solved this problem for dart2js/browser.

There is a hook for host-specific acceleration.
The dart2js runtime uses this hook to do decoding via the browser's TextDecoder, which provided a 10x speedup on browsers that support TextDecoder.
VM/Flutter could also use this hook to detect byte arrays and use an alternative implementation.

If you hook to a second implementation, you could use C++, or a clone of the current implementation and modify it to use the byte array implementation type. If you do this, I would advise bailing out to the main algorithm to handle error cases since the Dart runtime is incompatible with the browser implementation in error cases. We want to fix the discrepancy by changing the Dart library to match the browser as that would make the in-browser conversion more efficient by avoiding a validation scan. If you fall-back to the main implementation on edge cases, that will ensure that we need to fix edge cases in only one implementation.

@lrhn fyi.

@mraleph
Copy link
Member

mraleph commented Jan 21, 2018

Thanks for the tip about _convertIntercepted @rakudrama. We could use it if we decide to do a short term solution for just JSON.decode.

The real problem however is much larger - if the user writes similar code they will see similar slowness (especially comparing JIT vs AOT), so I would like us to think about it in that context too.

@Hixie
Copy link
Contributor Author

Hixie commented Jan 22, 2018

it is likely that people invoke UTF8.decode on various List<int> implementations.

It seems extremely likely to me that any large inputs are going to be from ByteData instances.

FWIW, I would be perfectly fine with specialized ByteData or Uint8List variants of this API (and other similar APIs that take List<int>). In general in Flutter land there are definitely codepaths where we don't even support the non-ByteData case.

Hixie added a commit to Hixie/flutter that referenced this issue Jan 22, 2018
This reduces the jank of bringing up the license screen further.
The remaining lost frame or two are caused by Dart itself, see:

   dart-lang/sdk#31954
   dart-lang/sdk#31959
   dart-lang/sdk#31960

Fixes flutter#5187
Hixie added a commit to flutter/flutter that referenced this issue Jan 25, 2018
This reduces the jank of bringing up the license screen further.
The remaining lost frame or two are caused by Dart itself, see:

   dart-lang/sdk#31954
   dart-lang/sdk#31959
   dart-lang/sdk#31960

Fixes #5187
DaveShuckerow pushed a commit to DaveShuckerow/flutter that referenced this issue May 14, 2018
This reduces the jank of bringing up the license screen further.
The remaining lost frame or two are caused by Dart itself, see:

   dart-lang/sdk#31954
   dart-lang/sdk#31959
   dart-lang/sdk#31960

Fixes flutter#5187
@mraleph
Copy link
Member

mraleph commented Jun 12, 2018

12c461b by @aartbik improved performance of this microbenchmark on 64-bit platforms (4-5x on my measurements). I have not measured ARM32 yet.

Before the change:

I/flutter (22444):    622592 bytes      337ms 1804.2 KB per second

after the change:

I/flutter (22623):    622592 bytes       66ms 9212.1 KB per second

@aartbik aartbik self-assigned this Jun 14, 2018
@Hixie Hixie changed the title UTF8.decode is slow UTF8.decode is slow (performance) Jun 26, 2018
@aartbik
Copy link
Contributor

aartbik commented Jul 23, 2018

Focusing on just the Dart part with a benchmark that does just

String unpack(Uint8List data) {
  return utf8.decode(data);
}

using a synthetic Uint8List. This yields still a big JIT vs AOT difference:

JIT executed in 0:00:12.020120
AOT executed in 0:01:12.535360

Here is a quick profile:

+   35.62%    35.62%  dart_precompile  libb.so                   [.] Precompiled__Utf8Decoder_8003594_convert_scanOneByteCharacters
+   29.98%    29.98%  dart_precompile  libb.so                   [.] Precompiled__Uint8List_6027147____1007                   
+   22.21%    22.02%  dart_precompile  libb.so                   [.] Precompiled__StringBase_0150898__createOneByteString

@aartbik
Copy link
Contributor

aartbik commented Jul 24, 2018

Just to make sure we are measuring the same thing, here is a stand-alone toy benchmark that operates on long lists of ASCII 'X' bytes, with JIT/AOT timings below.

JIT: executed in 0:00:02.853522 (verify="XXXXXXXXX")
AOT: executed in 0:00:16.729649 (verify="XXXXXXXXX")

benchmark:

import 'dart:convert';
import 'dart:typed_data';

String unpack(Uint8List data) {
  return utf8.decode(data);
}

main(List<String> args) {
  var data = new Uint8List.fromList(
    new List<int>.generate(1000000, (int i) => 88)
  );
  String x = null;
  Stopwatch stopwatch = new Stopwatch()..start();
  for (int i = 0; i < 1000; i++) {
    x = unpack(data);
  }
  print('executed in ${stopwatch.elapsed} (verify="${x.substring(1,10)}")');
}

@mraleph
Copy link
Member

mraleph commented Jul 24, 2018

@aartbik It might be also good to have a benchmark that measures Uint8ListView performance because when decoding binary formats people pass views instead of copying bytes out.

@aartbik
Copy link
Contributor

aartbik commented Jul 24, 2018

Sure, I made it "your" format as follows:

main(List<String> args) {
  var data = new Uint8List.fromList(
    new List<int>.generate(1000000, (int i) => 88)
  );
  if (args.contains('--use-view')) {
    data = new Uint8List.view(data.buffer);
  }
  // AS BEFORE
}

which gives me

JIT-direct/view:
executed in 0:00:02.555726 (verify="XXXXXXXXX")
executed in 0:00:05.349252 (verify="XXXXXXXXX")

AOT-direct/view
executed in 0:00:16.817085 (verify="XXXXXXXXX")
executed in 0:00:26.025103 (verify="XXXXXXXXX")

@aartbik
Copy link
Contributor

aartbik commented Jul 24, 2018

In the long run, something like cloning for type may help here. In the short run, we could get the utf performance up by specializing a bit more in the library code. The _convertIntercepted() idea is very interesting, but perhaps done a bit too high in the calling tree. Specialization seems only necessary for the leaf methods where most work is done. Still, let me play around with this a bit....

@aartbik
Copy link
Contributor

aartbik commented Jul 25, 2018

Ah, I thought that using generics would make it extremely easy to specialize library code in a non-intrusive way, but Dart does not follow the C++ way of generating different versions for each instance. Without this sort of generics, the code duplication of such a solution seem to become a bit excessive (any best practices on this?).

@alexmarkov @miyoyo

@mraleph
Copy link
Member

mraleph commented Jul 25, 2018

One of the ideas I entertained is that we can do a Kernel to Kernel transformation based on TFA analysis results and an explicit pragma hint, e.g. in core libraries we do

@pragma('vm.type-specialize')
void convert(List<int> codeUnits, int startIndex, int endIndex) {
  /* body */
}

which would be a hint for a subsequent transformation to look at TFA analysis results and see which types can reach codeUnits. Then if for example _List<int> and Uint8List are the reaching types it can transform the code like this:

@pragma('vm.type-specialize')
void convert(List<int> codeUnits, int startIndex, int endIndex) {
  if (codeUnits is Uint8List) {
    _convert$Uint8List(codeUnits, ...);
  } else if (codeUnits is _List) {
    _convert$_List(codeUnits, ...);
  }
}

void  _convert$Uint8List(Uint8List codeUnits, ...) {
  // copy of body specialized for Uint8List
}

This is one possible approach that works for library code that people are willing to annotate with hints.

Another approach would be to try and change how we do []= and [] in loops, currently when we do:

for (var i = from; i < to; i++) {
  final unit = units[i];  // this will be an inline cache based dispatch (unless call-site if monomorphic)
  if ((unit & mask) != unit) return i - from;
}

maybe the cost of invoking [] can be reduced, e.g. if lookup of the [] can be hoisted out of the loop.

// pseudo-code
final getIndexedPtr = getIndexingStubFor(units);
for (var i = from; i < to; i++) {
  final unit = getIndexedPtr(units, i);  // fast invocation, maybe even without setting up a pool pointer, etc.
  if ((unit & mask) != unit) return i - from;
}

Sidenote: it might be good to check that the rest of the code in hot functions actually looks good in AOT mode (i.e. that [] access is the only slow operation).

@mkustermann mkustermann assigned mkustermann and unassigned aartbik Nov 5, 2018
@mkustermann
Copy link
Member

mkustermann commented Nov 5, 2018

@aartbik I was asked to help with some protobuf decoding speed and it seems like we happen to spend significant time in string decoding. Hope it's ok with you if I look into this?

One aspect might be to unify the object layouts of typed data views and non-views. Currently if code is specialized via if (data is Uint8List) { ... } the compiler cannot inline accesses to data since there are two implementations of it, namely _Uint8ArrayList or _Uint8ArrayView where the views have different layout than the non-views.

dart-bot pushed a commit that referenced this issue Nov 16, 2018
…generally

Currently the @pramga('vm:exact-result-type') annotation only works if
the function is a recognized method.  This change changes that to make
the VM just look if a function has the annotation (no matter if it's
also in the list of recognized methods or not).

Furthermore this CL adds a "has_pragma" bit to [Field] objects, similar
to how we have it on [Function]/[Class]es.  This allows annotating
fields with types, as we do with function return types.

Furthermore this CL lets the type propgagator use
@pragma('vm:exact-result-type') annotations to narrow the [CompileType]
set on [LoadFieldInstr]s.

Since the @pragma is a general feature, this CL moves the
`Function::FindPragma()` to `Library::FindPragma` (which is where any
other metadata lookup happens).  We also let the `FindPragma` accept any
of Class/Function/Field objects.

Furthermore this CL adds a bailout if we try to evaluate metadata in
the background compiler, since the background compiler is not allowed
to execute generated code. The bailout should trigger a re-compilation
on the mutator thread.

Furthermore the `FindPragma()` function is fixed to handle the case
when the evaluation of the metadata results in e.g. a language error.
In this case we simply claim to not have found a pragma annotation.

Issue #31954

Change-Id: I0900a80d5ae0f3e8d09baf13cba1b20dd974df31
Reviewed-on: https://dart-review.googlesource.com/c/84037
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
dart-bot pushed a commit that referenced this issue Nov 16, 2018
…ns work generally"

This reverts commit 2de63ac.

Reason for revert: This triggers shutdown deadlocks for some reason:

#0  0xf7782cd9 __kernel_vsyscall
#1  0xf7753d0b pthread_cond_wait@@GLIBC_2.3.2
#2  0x0108a515 dart::Monitor::WaitMicros(long long)
#3  0x0108a467 dart::Monitor::Wait(long long)
#4  0x00fe9048 dart::KernelIsolate::Shutdown()
#5  0x00f99ae4 dart::Dart::Cleanup()
#6  0x01278fce Dart_Cleanup
#7  0x00dd347e dart::bin::main(int, char**)
#8  0x00dd3d64 main
#9  0xf745caf3 __libc_start_main
#10 0x00dd2021 _start
TID 30443:
#0  0xf7782cd9 __kernel_vsyscall
#1  0xf7531ee6 epoll_wait
#2  0x00ddab61 dart::bin::EventHandlerImplementation::Poll(unsigned int)
#3  0x00dfb2d4 dart::bin::ThreadStart(void*)
#4  0xf774ff72 start_thread
#5  0xf753143e __clone
TID 30444:
#0  0xf7782cd9 __kernel_vsyscall
#1  0xf7753d0b pthread_cond_wait@@GLIBC_2.3.2
#2  0x0108a515 dart::Monitor::WaitMicros(long long)
#3  0x0108a467 dart::Monitor::Wait(long long)
#4  0x012484d8 dart::BackgroundCompiler::Run()
#5  0x01248a37 dart::BackgroundCompilerTask::Run()
#6  0x0112203a dart::ThreadPool::Worker::Loop()
#7  0x01121efe dart::ThreadPool::Worker::Main(unsigned int)
#8  0x01089e73 dart::ThreadStart(void*)
#9  0xf774ff72 start_thread
#10 0xf753143e __clone

Original change's description:
> [VM] Add @pragma annotations on Field, make @pragma annotations work generally
> 
> Currently the @pramga('vm:exact-result-type') annotation only works if
> the function is a recognized method.  This change changes that to make
> the VM just look if a function has the annotation (no matter if it's
> also in the list of recognized methods or not).
> 
> Furthermore this CL adds a "has_pragma" bit to [Field] objects, similar
> to how we have it on [Function]/[Class]es.  This allows annotating
> fields with types, as we do with function return types.
> 
> Furthermore this CL lets the type propgagator use
> @pragma('vm:exact-result-type') annotations to narrow the [CompileType]
> set on [LoadFieldInstr]s.
> 
> Since the @pragma is a general feature, this CL moves the
> `Function::FindPragma()` to `Library::FindPragma` (which is where any
> other metadata lookup happens).  We also let the `FindPragma` accept any
> of Class/Function/Field objects.
> 
> Furthermore this CL adds a bailout if we try to evaluate metadata in
> the background compiler, since the background compiler is not allowed
> to execute generated code. The bailout should trigger a re-compilation
> on the mutator thread.
> 
> Furthermore the `FindPragma()` function is fixed to handle the case
> when the evaluation of the metadata results in e.g. a language error.
> In this case we simply claim to not have found a pragma annotation.
> 
> Issue #31954
> 
> Change-Id: I0900a80d5ae0f3e8d09baf13cba1b20dd974df31
> Reviewed-on: https://dart-review.googlesource.com/c/84037
> Commit-Queue: Martin Kustermann <kustermann@google.com>
> Reviewed-by: Alexander Markov <alexmarkov@google.com>
> Reviewed-by: Vyacheslav Egorov <vegorov@google.com>

TBR=vegorov@google.com,kustermann@google.com,alexmarkov@google.com,sjindel@google.com

Change-Id: Ic0d22d32b0eea3a76ec245cabab0006f97ca1b05
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://dart-review.googlesource.com/c/84622
Reviewed-by: Martin Kustermann <kustermann@google.com>
dart-bot pushed a commit that referenced this issue Nov 22, 2018
Currently the @pramga('vm:exact-result-type') annotation only works if
the function is a recognized method.  This change changes that to make
the VM just look if a function has the annotation (no matter if it's
also in the list of recognized methods or not).

Furthermore this CL lets the type propgagator use
@pragma('vm:exact-result-type') annotations to narrow the [CompileType]
set on [LoadFieldInstr]s.

Since the @pragma is a general feature, this CL moves the
`Function::FindPragma()` to `Library::FindPragma` (which is where any
other metadata lookup happens).  We also let the `FindPragma` accept any
of Class/Function/Field objects.

Furthermore the `FindPragma()` function is fixed to handle the case
when the evaluation of the metadata results in an error.
In this case we simply claim to not have found a pragma annotation.

Issue #31954

Change-Id: If03f566e334cd53549985823ee3dd6b5e9672969
Reviewed-on: https://dart-review.googlesource.com/c/85163
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
dart-bot pushed a commit that referenced this issue Nov 22, 2018
…ys initialized with _Smi's

Until now a typed data view was first constructed and then afterwards the
constructor body validated that the offsetInBytes/length are non-null
and in range.

This CL moves the range check to the call sites of the constructor,
thereby guarenteeing that we will never initialize the
_TypedListView.{offsetInBytes,length} fields with non-_Smi values. (The
views are constructed only within the "dart:typed_data" library.  Neither the
embedder nor the runtime construct them.)

Further more this CL removes the redundant _defaultIfNull() call, since
the call sites already handled the `length == null` case.

This CL also starts annotating _TypedListView.{offsetInBytes,length}
with @pragma("vm:exact-result-type", "dart:core#_Smi") to ensure any
[LoadFieldInstr] will have the right type attached, thereby avoiding any
null-checks and mint-handling.

This improves dart-aot

On arm7hf:
    MD5: +38%, SHA256: +87%, SHA1: +89% (plus various typed data microbenchmarks)
    JsonParseDefaultReviver/StringBuffer/StringIdiomatic/JsonParseCustomReviver: -5-8% (probably due to not inlining static calls, will find out)

On arm8:
    MD5: +6.5%, SHA256: +12%, SHA1: 3.6%, JsonUtf8RoundTrip: 8% (plus various typed data microbenchmarks)
    DeltaBlue: -6.7% (probably due to not inlining static calls, will find out)

Issue #35154
Issue #31954

Change-Id: I37c822e6879f5a2d17fd9650a68cf2eee4326b01
Reviewed-on: https://dart-review.googlesource.com/c/84241
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
dart-bot pushed a commit that referenced this issue Nov 22, 2018
A field/function annotated with this pragma must be guaranteed to not
return `null` at runtime.

Make use of this non-nullable annotation in the VM's type propagator.

Annotates the "_TypedListView._typedData" field to ensure the VM knows it
returns a non-nullable _TypedListView.

Furthermore annotates methods on the integer implementation.  Those particular
methods are recognized methods with a "dynamic" return type.  This caused
the type propagator to use CompileType::Dynamic() as result type.  Since a
previous CL started to only utilize the annotated type if it is better than
"dynamic" more integer operations got handled in-line, though with null-checks.
Annotating those methods to return non-null improves the in-line handling of
integer operations.

This improves dart-aot

On arm7hf:
  SHA256: +5%, SHA: +6%, JsonObjectRoundTrip: +7%, ...

On arm8:
  SHA1: +28%, MD5: +25%, SHA256: +15%, TypedData.Int16ListViewBench: +18.5%, StringInterpolation: +18%, ...


Issue #31954
Issue #35154

Change-Id: Ia4263a37241a36c9dc35e8a48893297effa6f4b2
Reviewed-on: https://dart-review.googlesource.com/c/84421
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
Reviewed-by: Alexander Markov <alexmarkov@google.com>
dart-bot pushed a commit that referenced this issue Nov 26, 2018
…the end

If we inline something inside a loop which might throw, the throwing blocks
are currently spliced in the middle of the loop.  This CL moves those blocks
to the very end.

This improves a number of typed data benchmarks in dart-aot mode by 5-10%.

Issue #31954

Change-Id: I5dc86291240d8dac61798ff873ffa7205edc0007
Reviewed-on: https://dart-review.googlesource.com/c/85263
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
dart-bot pushed a commit that referenced this issue Nov 27, 2018
… recognize it, use it to special case ascii decoding

This change extends/fixes the exiting "pattern recognition" which tries
to recognize the pattern

  v2 <- LoadClassIdInstr(v1)
  BranchIf v2 == IntegerConstant(cid)

Furthermore we start inlining the recognized `ClassID.getID` method very
early in the pipeline.  This allows the VM to recognize the above
pattern and insert redefinitions before the actual inlining pass.

Furthermore we special-case two very hot methods in utf8 decoding by
manually having two loops, one of which is guarded by a class-id check
against the _Uint8ArrayView class, which is most common.  (In the future
we would like to unify the typed data layouts so we no longer need to
use `ClassId.getID`, thereby also allowing non core library code to use
this).

This improves dart-aot by
  * 31%+ for a protobuf decoding benchmark we care about


Issue #31954

Change-Id: Ia567b92b7e76ff28eda1726deaafda32732ed8f5
Reviewed-on: https://dart-review.googlesource.com/c/85281
Reviewed-by: Johnni Winther <johnniwinther@google.com>
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
Reviewed-by: Jenny Messerly <jmesserly@google.com>
Commit-Queue: Martin Kustermann <kustermann@google.com>
dart-bot pushed a commit that referenced this issue Nov 27, 2018
…logic to recognize it, use it to special case ascii decoding"

This reverts commit d7cf695.

Reason for revert: This causes crashes on the vm-obfuscate builder at https://ci.chromium.org/p/dart/builders/luci.dart.ci.sandbox/vm-kernel-precomp-obfuscate-linux-release-x64
like

FAILED: dartkp-dart_precompiled release_x64 corelib_2/linked_hash_map_test
Expected: Pass
Actual: Crash
Unexpected compile error.

--- Command "vm_compile_to_kernel" (took 11.000189s):
DART_CONFIGURATION=ReleaseX64 /b/s/w/ir/cache/builder/sdk/pkg/vm/tool/gen_kernel --aot --platform=out/ReleaseX64/vm_platform_strong.dill -o /b/s/w/ir/cache/builder/sdk/out/ReleaseX64/generated_compilations/dartkp/tests_corelib_2_linked_hash_map_test/out.dill /b/s/w/ir/cache/builder/sdk/tests/corelib_2/linked_hash_map_test.dart --packages=/b/s/w/ir/cache/builder/sdk/.packages -Ddart.developer.causal_async_stacks=true

exit code:
0

--- Command "precompiler" (took 583ms):
DART_CONFIGURATION=ReleaseX64 out/ReleaseX64/gen_snapshot --snapshot-kind=app-aot-assembly --assembly=/b/s/w/ir/cache/builder/sdk/out/ReleaseX64/generated_compilations/dartkp/tests_corelib_2_linked_hash_map_test/out.S --obfuscate --sync-async --ignore-unrecognized-flags --packages=/b/s/w/ir/cache/builder/sdk/.packages /b/s/w/ir/cache/builder/sdk/out/ReleaseX64/generated_compilations/dartkp/tests_corelib_2_linked_hash_map_test/out.dill

exit code:
-6

stderr:
Warning: This VM has been configured to obfuscate symbol information which violates the Dart standard.
         See dartbug.com/30524 for more information.

===== CRASH =====
version=2.2.0-edge.83712405657ff736033380cb24b0bd5a62fc3692 (Tue Nov 27 16:56:07 2018 +0000) on "linux_x64"
si_signo=Segmentation fault(11), si_code=1, si_addr=0xd1400050e5f
Dumping native stack trace for thread 2b71
  [0x000056001fc426c7] Unknown symbol
  [0x000056001fc426c7] Unknown symbol
  [0x000056001fe5763d] Unknown symbol
  [0x000056001fe4c99c] Unknown symbol
  [0x000056001fe6166a] Unknown symbol
  [0x000056001fe59c0c] Unknown symbol
 

Original change's description:
> [VM] Inline ClassID.getID() eagerly, extend pattern matching logic to recognize it, use it to special case ascii decoding
> 
> This change extends/fixes the exiting "pattern recognition" which tries
> to recognize the pattern
> 
>   v2 <- LoadClassIdInstr(v1)
>   BranchIf v2 == IntegerConstant(cid)
> 
> Furthermore we start inlining the recognized `ClassID.getID` method very
> early in the pipeline.  This allows the VM to recognize the above
> pattern and insert redefinitions before the actual inlining pass.
> 
> Furthermore we special-case two very hot methods in utf8 decoding by
> manually having two loops, one of which is guarded by a class-id check
> against the _Uint8ArrayView class, which is most common.  (In the future
> we would like to unify the typed data layouts so we no longer need to
> use `ClassId.getID`, thereby also allowing non core library code to use
> this).
> 
> This improves dart-aot by
>   * 31%+ for a protobuf decoding benchmark we care about
> 
> 
> Issue #31954
> 
> Change-Id: Ia567b92b7e76ff28eda1726deaafda32732ed8f5
> Reviewed-on: https://dart-review.googlesource.com/c/85281
> Reviewed-by: Johnni Winther <johnniwinther@google.com>
> Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
> Reviewed-by: Jenny Messerly <jmesserly@google.com>
> Commit-Queue: Martin Kustermann <kustermann@google.com>

TBR=vegorov@google.com,kustermann@google.com,jmesserly@google.com,johnniwinther@google.com,sra@google.com

Change-Id: I912b0768c32cbb00297ce48db29dbdbea44c14fa
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://dart-review.googlesource.com/c/85441
Reviewed-by: William Hesse <whesse@google.com>
Commit-Queue: William Hesse <whesse@google.com>
dart-bot pushed a commit that referenced this issue Nov 28, 2018
…logic to recognize it, use it to special case ascii decoding"

This change extends/fixes the exiting "pattern recognition" which tries
to recognize the pattern

  v2 <- LoadClassIdInstr(v1)
  BranchIf v2 == IntegerConstant(cid)

Furthermore we start inlining the recognized `ClassID.getID` method very
early in the pipeline.  This allows the VM to recognize the above
pattern and insert redefinitions before the actual inlining pass.

Furthermore we special-case two very hot methods in utf8 decoding by
manually having two loops, one of which is guarded by a class-id check
against the _Uint8ArrayView class, which is most common.  (In the future
we would like to unify the typed data layouts so we no longer need to
use `ClassId.getID`, thereby also allowing non core library code to use
this).

This improves dart-aot by
  * 31%+ for a protobuf decoding benchmark we care about

Issue #31954

Change-Id: I7181bbf096aabe303634fd3b2bff9cc96d69719c
Reviewed-on: https://dart-review.googlesource.com/c/85443
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
Commit-Queue: Martin Kustermann <kustermann@google.com>
dart-bot pushed a commit that referenced this issue Feb 20, 2019
…coding speed

Issue #31954

Change-Id: I525157d1c91276d35f6678a9066ac72900cc11ed
Reviewed-on: https://dart-review.googlesource.com/c/93434
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Sigurd Meldgaard <sigurdm@google.com>
Reviewed-by: Lasse R.H. Nielsen <lrn@google.com>
dart-bot pushed a commit that referenced this issue Mar 15, 2019
This removes the 3 fields from the classes in Dart and instead describes
the layout in C++ via a RawTypedDataView class (as already do with
normal RawTypedData). The existing "semi" TypedDataView handle class is
changed to be a real handle class.

This decreases performance of some microbenchmarks due to field guards
not being used anymore (before the JIT could add a field guard to view classes
guarding that only normal typed data is used as backing store (and not e.g. external
typed data)

Issue #35154
Issue #31954

Change-Id: I7a0022b843a4c0fa69f53dedcc4c7bd2117cdc37
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/96806
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
@mkustermann
Copy link
Member

mkustermann commented Apr 16, 2019

Recently I've worked on unifying the heap object layout of our typed data classes so that our AOT compiler can directly access the bytes of typed data members (e.g. of Uint8List) independent of whether the object is internal/external typed data array or a view on those (**)

All of the work combined has lead to the following numbers (measured on the benchmark mentioned above in #31954 (comment) on my workstation):

JIT non-view/view
executed in 0:00:03.429451 (verify="XXXXXXXXX")
executed in 0:00:06.607781 (verify="XXXXXXXXX")

AOT (view and non-view roughly the same)
executed in 0:00:05.855320 (verify="XXXXXXXXX")

So the AOT is almost on par with the JIT (for this particular benchmark)

There is still room for improvement and we continue working on it!

(**) Notice of Warning: Our AOT compiler will only directly access the bytes (e.g. in foo(Uint8List bytes) => bytes[0];) if there are no 3rd party user classes implementing these interfaces.

@mkustermann
Copy link
Member

The AOT typed data optimizations had a huge performance gain. Though the Dart code is only taking advantage of it for the ascii case atm.

@askeksa-google is now going to extend this fast case to arbitrary unicode.

We'll try to keep optimizing the Dart implementation - though still consider intensifying it afterwards.

@kmigo
Copy link

kmigo commented Jun 18, 2022

@aartbik I was asked to help with some protobuf decoding speed and it seems like we happen to spend significant time in string decoding. Hope it's ok with you if I look into this?

One aspect might be to unify the object layouts of typed data views and non-views. Currently if code is specialized via if (data is Uint8List) { ... } the compiler cannot inline accesses to data since there are two implementations of it, namely _Uint8ArrayList or _Uint8ArrayView where the views have different layout than the non-views.

@kmigo
Copy link

kmigo commented Jun 18, 2022

how to convert a _uint8arrayview object to image ?

@lrhn
Copy link
Member

lrhn commented Jun 20, 2022

I recommend asking such questions on Stack Overflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends.
Projects
Development

No branches or pull requests

8 participants