-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Before https://dart-review.googlesource.com/c/sdk/+/385722 a compile of the CFE (an aot compile of pkg/front_end/tool/_fasta/compile.dart is what I'll be using below) includes two scanners: The Utf8BytesScanner and StringScanner. After the CL landed it it only included one.
Before, or indeed now if I include it again, say by applying this patch (here done on top of 329760d):
diff --git a/pkg/front_end/lib/src/kernel_generator_impl.dart b/pkg/front_end/lib/src/kernel_generator_impl.dart
index 2daf9ed7002..86673e57c33 100644
--- a/pkg/front_end/lib/src/kernel_generator_impl.dart
+++ b/pkg/front_end/lib/src/kernel_generator_impl.dart
@@ -5,6 +5,7 @@
/// Defines the front-end API for converting source code to Dart Kernel objects.
library front_end.kernel_generator_impl;
+import 'package:_fe_analyzer_shared/src/scanner/scanner.dart';
import 'package:_fe_analyzer_shared/src/messages/severity.dart' show Severity;
import 'package:kernel/ast.dart';
import 'package:kernel/class_hierarchy.dart';
@@ -77,6 +78,11 @@ Future<InternalCompilerResult> generateKernelInternal(
ProcessedOptions options = compilerContext.options;
assert(options.haveBeenValidated, "Options have not been validated");
+ if (1+1==3) {
+ ScannerResult scannerResult = scanString("void main(){}");
+ print(scannerResult);
+ }
+
options.reportNullSafetyCompilationModeInfo();
FileSystem fs = options.fileSystem;
looking at an AOT compile on the function AbstractScanner.select at what happens to advance we can see that it checks the class id and then inlines both of the two versions:
[...]
903547 v91 <- LoadClassId(v2) [2073, 2074] int64
0x00000000006337ea <+34>: mov -0x1(%rdx),%ebx
0x00000000006337ed <+37>: shr $0xc,%ebx
903548 ParallelMove fp[-1] <- rbx
0x00000000006337f0 <+40>: mov %rbx,-0x8(%rbp)
903549 Branch if EqualityCompare:10(v91 == v52) goto (17, 27)
0x00000000006337f4 <+44>: cmp $0x819,%rbx
0x00000000006337fb <+51>: jne 0x633855 <AbstractScanner.select+141>
903550 B17
903551 v58 <- LoadField(v2 T{Utf8BytesScanner} . byteOffset) [-9223372036854775808, 9223372036854775807] int64
0x0000000000633801 <+57>: mov 0x9f(%rdx),%rdi
[...]
903564 B27
903565 v74 <- LoadField(v2 T{StringScanner} . scanOffset) [-9223372036854775808, 9223372036854775807] int64
0x0000000000633855 <+141>: mov 0x9f(%rdx),%rdi
[...]
With only one included, though, it no longer inlines it, and the method disassembly looks like this:
[...]
901662 v6 <- StaticCall:10( advance<0> v2, using unchecked entrypoint, result_type = T{_Smi}) [-4611686018427387904, 4611686018427387903] int64
0x0000000000631559 <+49>: call 0x62fe60 <Utf8BytesScanner.advance>
[...]
I can ask the VM nicely for it to be inlined by applying @pragma("vm:prefer-inline"), making it in fact inline it:
diff --git a/pkg/_fe_analyzer_shared/lib/src/scanner/utf8_bytes_scanner.dart b/pkg/_fe_analyzer_shared/lib/src/scanner/utf8_bytes_scanner.dart
index 2ea201ed8f5..bedc0edc015 100644
--- a/pkg/_fe_analyzer_shared/lib/src/scanner/utf8_bytes_scanner.dart
+++ b/pkg/_fe_analyzer_shared/lib/src/scanner/utf8_bytes_scanner.dart
@@ -126,6 +126,7 @@ class Utf8BytesScanner extends AbstractScanner {
@override
@pragma('vm:unsafe:no-bounds-checks')
+ @pragma("vm:prefer-inline")
int advance() {
// Always increment so byteOffset goes past the end.
++byteOffset;
[...]
901884 v32 <- LoadField(v2 T{Utf8BytesScanner} . byteOffset) [-9223372036854775808, 9223372036854775807] int64
0x00000000006318ea <+34>: mov 0x9f(%rdx),%rbx
901885 ParallelMove rbx <- rbx
901886 v34 <- BinaryInt64Op(+ [tr], v32, v72 T{_Smi}) [-9223372036854775808, 9223372036854775807] int64
0x00000000006318f1 <+41>: add $0x1,%rbx
901887 StoreField(v2 T{Utf8BytesScanner} . byteOffset = v34 T{int} <int64>)
0x00000000006318f5 <+45>: mov %rbx,0x9f(%rdx)
[...]
In fact inlining it is (slightly) faster --- here the difference having it inlined when compiling a fixed version of the CFE with itself:
instructions:u: -0.1007% +/- 0.0006% (-21314554.00 +/- 122170.23)
Knowing that @mraleph for instance is sceptical about the instruction count, looking instead on the scanner benchmark pkg/_fe_analyzer_shared/test/scanner_benchmark.dart this too has two live scanners, and they're inlined by default (.1 below). Out-commenting the string one (leaving only one) does StaticCalls (.2 below). Asking nicely for it to be inlines and it is (.3 below) compiled like out/ReleaseX64/dart-sdk/bin/dart compile aot-snapshot pkg/_fe_analyzer_shared/test/scanner_benchmark.dart (and moved to .1 etc), run like e.g. out/ReleaseX64/dart-sdk/bin/dartaotruntime pkg/_fe_analyzer_shared/test/scanner_benchmark.aot.1 pkg/kernel/lib/src/equivalence.dart --bytes gives these runtimes:
.1: ~103.1 bytes per microsecond
.2: ~108.7 bytes per microsecond
.3: ~112.9 bytes per microsecond
So having only one non-inlined one processes 5.47075% +/- 1.30016% more bytes per time unit compared to having two inlined ones. So far so good. Inlining advance makes it process 3.8906% +/- 0.812575% more bytes still per time unit though.
I guess my questions are this:
- Why did it go from being inlined to not being inlined? That's quite a surprising result and as we've seen it would have been faster had it still been inlined.
Depending on the answer to that, - why wasn't the before a class-id check and two (specific)
StaticCalls instead? (probably that would be slower, which is why it depends on the answer to 1 above).
and also - If - when in the situation where we have two options and it inlines by default - I mark both as
@pragma("vm:never-inline")it doesn't check class id's and instead does aDispatchTableCallwhich makes sense. But if I only mark one of them with@pragma("vm:never-inline")it inlines the non-marked on and does aDispatchTableCallon the other one --- which seems to me like it could have been aStaticCallconsidering it should now know exactly what is left. Why doesn't it?