Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vm/ffi] Array accesses & asTypedList both require static calls #51976

Open
ds84182 opened this issue Apr 6, 2023 · 2 comments
Open

[vm/ffi] Array accesses & asTypedList both require static calls #51976

ds84182 opened this issue Apr 6, 2023 · 2 comments
Labels
area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. library-ffi P3 A lower priority bug or feature request triaged Issue has been triaged by sub team type-performance Issue relates to performance or code size

Comments

@ds84182
Copy link
Contributor

ds84182 commented Apr 6, 2023

Ideally these three functions should generate the same IL:

import 'dart:ffi';

final class OAM extends Struct {
    @Array((40 * 4) ~/ 8)
    external Array<Uint64> data;
}

int main(List<String> s, Pointer<OAM> o) => o.ref.data[0];
// int main(List<String> s, Pointer<Uint64> o) => o[0];
// int main(List<String> s, Pointer<Uint64> o) => o.asTypedList((40 * 4) ~/ 8)[0];

IL as of Dart SDK version: 3.0.0-400.0.dev (dev) (Sun Apr 2 19:08:38 2023 -0700) on "linux_x64"

Pointer<OAM>:

*** BEGIN CFG
After AllocateRegisters
==== file://<source>_::_main (RegularFunction)
  0: B0[graph]:0 {
}
  2: B1[function entry]:2 {
      v2 <- Parameter(0) T{List<String>}
      v3 <- Parameter(1) T{Pointer}
}
  4:     CheckStackOverflow:8(stack=0, loop=0)
  6:     v4 <- AllocateObject:10(cls=OAM) T{OAM}
  7:     ParallelMove rcx <- rax, rax <- S+2
  8:     StoreField(v4 . _typedDataBase@8050071 = v3, NoStoreBarrier)
 10:     MoveArgument(v4, SP+0)
 12:     v5 <- StaticCall:14( get:data<0> v4, result_type = T{Array}) T{Array}
 14:     MoveArgument(v5, SP+0)
 16:     v6 <- StaticCall:16( Uint64Array|[]<0> v5, result_type = T{int}) [-9223372036854775808, 9223372036854775807] T{int}
 18:     Return:18(v6)
*** END CFG

Pointer<Uint64> + asTypedList:

*** BEGIN CFG
After AllocateRegisters
==== file://<source>_::_main (RegularFunction)
  0: B0[graph]:0 {
      v13 <- UnboxedConstant(#20 int64) [20, 20] T{_Smi}
      v21 <- UnboxedConstant(#0 int64) [0, 0] T{_Smi}
}
  2: B1[function entry]:2
ParallelMove rax <- C {
      v2 <- Parameter(0) T{List<String>}
      v3 <- Parameter(1) T{Pointer}
}
  4:     CheckStackOverflow:8(stack=0, loop=0)
  6:     MoveArgument(v3, SP+1)
  8:     MoveArgument(v13, SP+0)
 10:     v9 <- StaticCall:14( Uint64Pointer|asTypedList<0> v3, v13, result_type = T{!null}) T{Uint64List}
 11:     ParallelMove rcx <- rax
 12:     v14 <- LoadField(v9 . TypedDataBase.length {final}) [0, 4611686018427387903] T{_Smi}
 14:     v19 <- UnboxInt64([non-speculative], v14) [0, 4611686018427387903] T{int}
 15:     ParallelMove rax <- rdx, rbx <- C
 16:     GenericCheckBound(v19, v21) [-4611686018427387904, 4611686018427387903] T{_Smi}
 18:     v16 <- LoadUntagged(v9, 8) T{*?}
 20:     v22 <- LoadIndexed(v16, v21) [-9223372036854775808, 9223372036854775807] T{int}
 22:     Return:18(v22 T{int})
*** END CFG

Pointer<Uint64>:

*** BEGIN CFG
After AllocateRegisters
==== file://<source>_::_main_main (ImplicitClosureFunction)
  0: B0[graph]:0 {
      v38 <- UnboxedConstant(#0 int64) [0, 0] T{_Smi}
}
  2: B2[function entry]:2 {
      v2 <- Parameter(0) T{*?}
      v3 <- Parameter(1) T{List<String>}
      v4 <- Parameter(2) T{Pointer}
}
  3:     ParallelMove rcx <- S+2
  4:     v32 <- LoadUntagged(v4, 8) T{*?}
  6:     v34 <- LoadIndexed(v32, v38) [-9223372036854775808, 9223372036854775807] T{int}
  8:     v35 <- BoxInt64(v34) [-9223372036854775808, 9223372036854775807] T{int}
 10:     Return:12(v35 T{int})
*** END CFG

The issues stem from lack of inlining for struct getters & setters, asTypedList, and Uint64Array operator[]. Ideally the object allocations and bounds checks can be dropped once the compiler is able to see through them. Not a VM expert, but I was thinking that there could be a graph intrinsic in dart:_internal that can output a GenericCheckBound instruction, which could then be used in ffi_patch to make the resulting IL for Uint64Array operator[] smaller.

cc @dcharkes

@lrhn lrhn added area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. library-ffi labels Apr 6, 2023
@dcharkes
Copy link
Contributor

dcharkes commented Apr 7, 2023

Hey @ds84182!

Are you running into this as a performance bottleneck? If so, is it possible to provide a program that exercises this as a benchmark?

Another potential performance improvement with structs currently is that we allocate Pointer objects on all the offset calculations of fields.

Regarding inlining, it would be interesting to see why we don't inline, if you have a debug build you can pass --trace-inlining.

@ds84182
Copy link
Contributor Author

ds84182 commented Apr 7, 2023

Not specifically a performance bottleneck for me right now, but it does inform some of my performance related decisions around FFI.

I don't have a debug build of Dart currently, but I can try a bit later today.

@mraleph mraleph added the type-performance Issue relates to performance or code size label Apr 11, 2023
@a-siva a-siva added P3 A lower priority bug or feature request triaged Issue has been triaged by sub team labels Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. library-ffi P3 A lower priority bug or feature request triaged Issue has been triaged by sub team type-performance Issue relates to performance or code size
Projects
None yet
Development

No branches or pull requests

5 participants