Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vm] Records of primitives requires boxing #51637

Open
ds84182 opened this issue Mar 6, 2023 · 7 comments
Open

[vm] Records of primitives requires boxing #51637

ds84182 opened this issue Mar 6, 2023 · 7 comments
Assignees
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. P3 A lower priority bug or feature request triaged Issue has been triaged by sub team type-performance Issue relates to performance or code size

Comments

@ds84182
Copy link
Contributor

ds84182 commented Mar 6, 2023

Godbolt link

When passing a record of primitives between functions, the values are boxed. Since these are value types without identity, they could exist as an unboxed record shape until they escape via covariance, similar to how int and double primitives work today (int -> num or double -> num require boxing).

Records also lose static type information, resulting in a runtime call to _Double operator+ instead of unboxing.

@navyakarna
Copy link

can i work on this issue ?

@mraleph mraleph added the area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. label Mar 6, 2023
@mraleph
Copy link
Member

mraleph commented Mar 6, 2023

/cc @alexmarkov

@mraleph mraleph added the type-performance Issue relates to performance or code size label Mar 6, 2023
@alexmarkov
Copy link
Contributor

There are multiple optimizations missing here:

  1. Unboxing record instances when passing them as parameters (remove record instance allocation and pass record fields as separate arguments).
  2. Unboxing fields inside record instances (remove int/double boxing for record fields when they are passed or returned separately).
  3. Global type inference should infer actual types through record instances and fields.
  4. Local type inference should keep static types of parameters as actual type (_Record) could be less informative. That should allow the compiler to replace call to _Double.+.

Currently AOT compiler can unbox record instances (of 2 fields) in return values, as returning multiple values would be probably the most common use of records. At this point we are not sure how records would be used in practice and what other scenarios would be performance critical. Do you have a real use case where performance of passing record instances as arguments would be critical?

(1) and (2) is not that hard to implement. However, we should try to find a motivating example / real use case before adding these optimizations, as they would increase complexity of the implementation.

(3) may severely affect AOT compilation time, so I'd like to avoid adding it unless absolutely necessary.

(4) looks like a bug and I'd like to fix it.

@alexmarkov alexmarkov self-assigned this Mar 6, 2023
@ds84182
Copy link
Contributor Author

ds84182 commented Mar 6, 2023

My primary use-case for records is to support vector value types via records + inline classes. It's much nicer to deal with compared to vector_math since the underlying storage is immutable and const.

inline class Vec4 {
  final (double, double, double, double) xyzw;
  const Vec4.raw(this.xyzw);
  const Vec4(double x, double y, double z, double w) : super((x, y, z, w));

  double get x => xyzw.$1;
  double get y => xyzw.$2;
  double get z => xyzw.$3;
  double get w => xyzw.$4;

  Vec4 operator+(Vec4 rhs) {
    var (lx, ly, lz, lw) = xyzw;
    var (rx, ry, rz, rw) = rhs.xyzw;
    return Vec4(lx + rx, ly + ry, lz + rz, lw + rw);
  }

  // ...
}

Another example is 128-bit ints via inline class:

inline class Int128 {
  final (int, int) _lohi;

  Int128.from(int value) : _lohi = (value, 0);

  // ... implementations here
}

I'd still expect something as large as a 4x4 matrix to be heap allocated though. But these small records are the perfect candidate for unboxing.

copybara-service bot pushed a commit that referenced this issue Mar 9, 2023
…gation

Types of record field accesses are based on static record types,
so it is useful to keep and propagate static types even when
concrete class id is known.

TEST=runtime/tests/vm/dart/records_field_operations_il_test.dart

Issue: #49719
Issue: #51637
Cq-Include-Trybots: luci.dart.try:vm-kernel-precomp-nnbd-linux-release-x64-try,vm-kernel-precomp-linux-release-x64-try
Change-Id: I268e3d519b07e12d1e2f8929cbd704a6995e2053
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/287222
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Alexander Markov <alexmarkov@google.com>
@ds84182
Copy link
Contributor Author

ds84182 commented Mar 26, 2023

There's some weird interactions with inlining.

These two programs are not equivalent:

class Box2<T> { final T x, y; const Box2(this.x, this.y); }

void main(List<String> args) {
    final record = Box2(1.0, 1.0);
    inline(record);
}

@pragma('vm:prefer-inline')
void inline(Box2<double> xy) {
    var Box2(:x, :y) = xy;
    print(x + y);
}
*** BEGIN CFG
After AllocateRegisters
==== file://<source>_::_main (RegularFunction)
  0: B0[graph]:0 {
      v0 <- Constant(#null) T{Null?}
      v26 <- UnboxedConstant(#2.0 double) T{_Double}
}
  2: B1[function entry]:2
ParallelMove xmm0 <- C {
      v2 <- Parameter(0) T{List<String>}
}
  4:     CheckStackOverflow:8(stack=0, loop=0)
  6:     MoveArgument(v26, SP+0)
  8:     StaticCall:48( print<0> v26)
  9:     ParallelMove rax <- C
 10:     Return:16(v0)
*** END CFG
void main(List<String> args) {
    final record = (1.0, 1.0);
    inline(record);
}

@pragma('vm:prefer-inline')
void inline((double, double) xy) {
    var (x, y) = xy;
    print(x + y);
}
*** BEGIN CFG
After AllocateRegisters
==== file://<source>_::_main (RegularFunction)
  0: B0[graph]:0 {
      v0 <- Constant(#null) T{Null?}
      v3 <- Constant(#1.0) T{_Double}
}
  2: B1[function entry]:2 {
      v2 <- Parameter(0) T{List<String>}
}
  4:     CheckStackOverflow:8(stack=0, loop=0)
  6:     MoveArgument(v3, SP+1)
  8:     MoveArgument(v3, SP+0)
 10:     v14 <- StaticCall:42( +<0> v3, v3, using unchecked entrypoint, recognized_kind = Double_add, result_type = T{_Double}) T{_Double}
 12:     v16 <- Unbox(v14) T{_Double}
 14:     MoveArgument(v16, SP+0)
 16:     StaticCall:44( print<0> v16)
 17:     ParallelMove rax <- C
 18:     Return:14(v0)
*** END CFG

Without inlining, it unboxes x and y. But with inlining + load-store forwarding, it loses type information. Guessing it trusts the static type of the parameter when not inlined, but when inlined it only has AllocateSmallRecord to go off of?

copybara-service bot pushed a commit that referenced this issue Apr 4, 2023
…the same cid

Previously, a type with known cid was immediately selected, even
if another type also has a known cid. Now, if cids match, the static
types are also compared. This is useful for record types, because
static record type is more accurate than known kRecordCid.

TEST=vm/dart/records_field_operations_il_test

Issue: #51637
Cq-Include-Trybots: luci.dart.try:vm-aot-linux-release-x64-try,vm-aot-linux-debug-x64-try
Change-Id: I4e528d80a355a79d428bf3f03212c5a65af0b661
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/292983
Auto-Submit: Alexander Markov <alexmarkov@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Martin Kustermann <kustermann@google.com>
@alexmarkov
Copy link
Contributor

Here is what was done so far:

  • Global type inference is improved to infer record shape (110b0f0) and the inferred record shape is now used in the unboxing (295e210). This is a necessary groundwork for unboxing record parameters, but they are not unboxed yet.
  • Local type inference is improved to better propagate static types (8c7d47c). This allows compiler to recognize int/double operations performed on record fields. The first example now uses BinaryDoubleOp instead of a call to Double.+.
  • Argument and parameter types are combined during inlining (5117c45 and 9c3e294 which further improves handling of record types). These changes fix the problem during inlining which was reported in the previous comment [vm] Records of primitives requires boxing #51637 (comment).

@ds84182
Copy link
Contributor Author

ds84182 commented May 29, 2023

Thanks! I accidentally stumbled upon another place this comes up, looks like type propagation does not work for records that are allocated then immediately destructured: Godbolt

This causes the compiler to switch to doing table calls for Uint8List.[] which increases code size by a lot. Maybe an earlier CSE pass could help here, maybe directly after inlining?

@alexmarkov alexmarkov added P3 A lower priority bug or feature request triaged Issue has been triaged by sub team labels Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. P3 A lower priority bug or feature request triaged Issue has been triaged by sub team type-performance Issue relates to performance or code size
Projects
None yet
Development

No branches or pull requests

4 participants