Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attaching many sized objects to NativeFinalizer #50537

Closed
nielsenko opened this issue Nov 23, 2022 · 11 comments
Closed

Attaching many sized objects to NativeFinalizer #50537

nielsenko opened this issue Nov 23, 2022 · 11 comments
Labels
area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. type-performance Issue relates to performance or code size vm-gc Related to the VM's garbage collector

Comments

@nielsenko
Copy link

nielsenko commented Nov 23, 2022

I have created a barebone reproduction of an issue we first observed wrt. Realm.

You can find it on github here: https://github.com/nielsenko/weird

It basically allocates and leave as garbage a lot of PointHandles that holds a Pointer<Point> where Point is a struct bound via ffi.

In the ctor of PointHandle I allocate a Point struct and attach the pointer to a NativeFinalizer to ensure the native memory is freed when the handle is GC'ed. On attach I set externalSize : 16 which is important (I dont see the issue if II forget to set externalSize)

When I allocate (and leave as garbage) a lot of PointHandles in a loop, I see a drop in the speed of allocations after a while.

I would expect the GC to kick in, clean out some handles, causing speed to degrade while it happens, but I would also expect the speed to recover after the GC has run. The latter does not seem to happen, at least not to the pre-first-gc speed.

Example output:

     8585000 all: 0:00:10.510152, lap:      0ms, live points:       305499
     8586000 all: 0:00:10.510682, lap:      0ms, live points:       306499
     8587000 all: 0:00:10.511202, lap:      0ms, live points:       307499
     8588000 all: 0:00:10.511721, lap:      0ms, live points:       308499
     8589000 all: 0:00:10.533678, lap:     21ms, live points:       309499
     8590000 all: 0:00:10.534269, lap:      0ms, live points:       310499
     8591000 all: 0:00:10.534927, lap:      0ms, live points:       311499
     8592000 all: 0:00:10.535493, lap:      0ms, live points:       312499
     8593000 all: 0:00:10.536042, lap:      0ms, live points:       313499
     8594000 all: 0:00:10.536587, lap:      0ms, live points:       314499
     8595000 all: 0:00:10.537132, lap:      0ms, live points:       315499
     8596000 all: 0:00:10.537685, lap:      0ms, live points:       316499
     8597000 all: 0:00:10.538228, lap:      0ms, live points:       317499
     8598000 all: 0:00:10.538781, lap:      0ms, live points:       318499
     8599000 all: 0:00:10.539314, lap:      0ms, live points:       319499
     8600000 all: 0:00:11.967296, lap:   1427ms, live points:       320499 // <--- I find this jump weird
     8601000 all: 0:00:14.060393, lap:   2093ms, live points:       321499
     8602000 all: 0:00:16.150930, lap:   2090ms, live points:       322499
     8603000 all: 0:00:18.215032, lap:   2064ms, live points:       323499
     8604000 all: 0:00:20.282250, lap:   2067ms, live points:       324499
     8605000 all: 0:00:22.361571, lap:   2079ms, live points:       325499
     8606000 all: 0:00:24.432646, lap:   2071ms, live points:       326499

First row is number of objects (PointHandle + Point) allocated, then total time, time since previous line, and last the current number of live Point structs - these are counted on the native side.

Environment:
Dart SDK version: 2.18.4 (stable) (Tue Nov 1 15:15:07 2022 +0000) on "macos_arm64"
Flutter 3.3.8 • channel stable • https://github.com/flutter/flutter.git
Framework • revision 52b3dc25f6 (2 weeks ago) • 2022-11-09 12:09:26 +0800
Engine • revision 857bd6b74c
Tools • Dart 2.18.4 • DevTools 2.15.0

@mraleph
Copy link
Member

mraleph commented Nov 23, 2022

/cc @dcharkes @rmacnak-google

@mraleph mraleph added area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. type-performance Issue relates to performance or code size labels Nov 23, 2022
@nielsenko
Copy link
Author

After 1.5 hour..

    12231000 all: 1:31:48.095638, lap:   1447ms, live points:       553549
    12232000 all: 1:31:49.485101, lap:   1389ms, live points:       554549
    12233000 all: 1:31:50.818911, lap:   1333ms, live points:       555549
    12234000 all: 1:31:52.134033, lap:   1315ms, live points:       556549
    12235000 all: 1:31:53.459152, lap:   1325ms, live points:       557549
    12236000 all: 1:31:55.660166, lap:   2200ms, live points:        19907
    12237000 all: 1:31:56.852459, lap:   1192ms, live points:        20907
    12238000 all: 1:31:58.028367, lap:   1175ms, live points:        21907
    12239000 all: 1:31:59.174411, lap:   1146ms, live points:        22907
    12240000 all: 1:32:00.307212, lap:   1132ms, live points:        23907
    12241000 all: 1:32:01.640688, lap:   1333ms, live points:        24907
    12242000 all: 1:32:03.154265, lap:   1513ms, live points:        25907
    12243000 all: 1:32:04.687900, lap:   1533ms, live points:        26907
    12244000 all: 1:32:06.179470, lap:   1491ms, live points:        27907
    12245000 all: 1:32:07.774116, lap:   1594ms, live points:        28907

@dcharkes
Copy link
Contributor

On MacOS arm64 2.18.5 this reproduces: https://gist.github.com/dcharkes/352aee3d486c3bd5fba0b828ac3042df

On MacOS arm64 version 2.17.5 stable (in Flutter), it becomes slower at a point, but around 100ms rather than 2000ms: https://gist.github.com/dcharkes/9b9e063dbb6d8859025c917085bc7612

     8750000 all: 0:00:10.328055, lap:      0ms, live points:       149461
     8751000 all: 0:00:10.619408, lap:    291ms, live points:        84000
     8752000 all: 0:00:10.725501, lap:    106ms, live points:        85000

On Flutter master it goes quickly out of memory: https://gist.github.com/dcharkes/b80cce1f1a1a91cf848581c9b8c3deb6
On Linux in Dart standalone master, it goes quickly out of memory: https://gist.github.com/dcharkes/35b8d341a6ab5fbf84c54779a4f6a674
On MacOS in Dart standalone master, it goes quickly out of memory: https://gist.github.com/dcharkes/09e5334efcd1439aac47a7bd5abcb214

I do not believe I did any changes to native finalizers, so it might be GC changes related.

@nielsenko can you reproduce this on Flutter master?
@rmacnak-google do the behavior changes between 2.17.5, 2.18.5 and master ring a bell? Did we change GC behavior recently?

@rmacnak-google
Copy link
Contributor

The OOM seems to be caused by c9f0120.

If I comment out the OOM, I can reproduce something similar that involves GC thrashing with a negative total external size.

@dcharkes
Copy link
Contributor

Will you investigate and fix @rmacnak-google ?

@rmacnak-google
Copy link
Contributor

Yeah, this feels like a GC-accounting issue rather than a finalizer issuer.

@dcharkes
Copy link
Contributor

Thanks @rmacnak-google !

@nielsenko
Copy link
Author

@dcharkes & @rmacnak-google

With dart from flutter master (Dart SDK version: 2.19.0-424.0.dev on "macos_arm64"), I quickly hit OoM with dart run

      779000 all: 0:00:00.801511, lap:      0ms, live points:       124499
      780000 all: 0:00:00.801850, lap:      0ms, live points:       125499
      781000 all: 0:00:00.802264, lap:      0ms, live points:       126499
Unhandled exception:
Out of Memory
#0      FfiTrampoline__setExternalSize$Method$FfiNative$Ptr (dart:ffi)
#1      _NativeFinalizer.attach (dart:ffi-patch/ffi_native_finalizer_patch.dart:58:13)
#2      main.<anonymous closure> (package:weird/main.dart:15:33)
#3      new _GrowableList.generate (dart:core-patch/growable_array.dart:136:28)
#4      main (package:weird/main.dart:15:10)
#5      main (file:///Users/kasper/Projects/mongodb/reproductions/weird/bin/weird.dart:4:11)
#6      _delayEntrypointInvocation.<anonymous closure> (dart:isolate-patch/isolate_patch.dart:297:19)
#7      _RawReceivePort._handleMessage (dart:isolate-patch/isolate_patch.dart:192:26)

I have also setup the project with a flutter example. Again on flutter master

Flutter 3.6.0-7.0.pre.41 • channel master • https://github.com/flutter/flutter.git
Framework • revision f5c14807fd (47 minutes ago) • 2022-11-23 17:08:10 -0500
Engine • revision 8a40e8324b
Tools • Dart 2.19.0 (build 2.19.0-424.0.dev) • DevTools 2.19.0

This time with flutter run -d macos I see the same:

flutter:       700000 all: 0:00:00.639087, lap:      0ms, live points:        58773
flutter:       701000 all: 0:00:00.639506, lap:      0ms, live points:        59773
flutter:       702000 all: 0:00:00.639909, lap:      0ms, live points:        60773
flutter:       703000 all: 0:00:00.640327, lap:      0ms, live points:        61773
flutter:       704000 all: 0:00:00.640747, lap:      0ms, live points:        62773
flutter:       705000 all: 0:00:00.641131, lap:      0ms, live points:        63773
[ERROR:flutter/runtime/dart_vm_initializer.cc(41)] Unhandled Exception: Out of Memory
#0      FfiTrampoline__setExternalSize$Method$FfiNative$Ptr (dart:ffi)
#1      _NativeFinalizer.attach (dart:ffi-patch/ffi_native_finalizer_patch.dart:58:13)
#2      main.<anonymous closure> (package:weird/main.dart:15:33)
#3      new _GrowableList.generate (dart:core-patch/growable_array.dart:136:28)
#4      main (package:weird/main.dart:15:10)
#5      main (package:weird_flutter/main.dart:4:11)
#6      _runMain.<anonymous closure> (dart:ui/hooks.dart:131:23)
#7      _delayEntrypointInvocation.<anonymous closure> (dart:isolate-patch/isolate_patch.dart:297:19)
#8      _RawReceivePort._handleMessage (dart:isolate-patch/isolate_patch.dart:192:26)

With stable I see the weird behaviour as before, also on flutter, but 300ms cliff instead of 2000ms for some reason:

flutter:      4386000 all: 0:00:05.007203, lap:      0ms, live points:       353959
flutter:      4387000 all: 0:00:05.007675, lap:      0ms, live points:       354959
flutter:      4388000 all: 0:00:05.008136, lap:      0ms, live points:       355959
flutter:      4389000 all: 0:00:05.008624, lap:      0ms, live points:       356959
flutter:      4390000 all: 0:00:05.009097, lap:      0ms, live points:       357959
flutter:      4391000 all: 0:00:05.275879, lap:    266ms, live points:       358959
flutter:      4392000 all: 0:00:05.648689, lap:    372ms, live points:       359959
flutter:      4393000 all: 0:00:06.022679, lap:    373ms, live points:       360959
flutter:      4394000 all: 0:00:06.396466, lap:    373ms, live points:       361959
flutter:      4395000 all: 0:00:06.769583, lap:    373ms, live points:       362959

@blagoev
Copy link
Contributor

blagoev commented Nov 24, 2022

We don't hit this if we implement our own NativeFinalizers with

void handle_finalizer(void* isolate_callback_data, void* realmPtr) {
  realm_release(realmPtr);
}

RLM_API void* realm_attach_finalizer(Dart_Handle handle, void* realmPtr, int size) {
  return Dart_NewFinalizableHandle_DL(handle, realmPtr, size, handle_finalizer);
}

RLM_API void realm_dettach_finalizer(void* finalizableHandle, Dart_Handle handle) {
  Dart_FinalizableHandle finalHandle = reinterpret_cast<Dart_FinalizableHandle>(finalizableHandle);
  return Dart_DeleteFinalizableHandle_DL(finalHandle, handle);
}

Where we are still using a non-zero size for Dart_NewFinalizableHandle_DL

copybara-service bot pushed a commit that referenced this issue Nov 30, 2022
…e incrementing external size.

When a FinalizerEntry's target gets promoted, the associated external size needs to also get promoted. We were handling the cases where the FinalizerEntry itself was either already old or remained new, but not the case where it was promoted. Failing to promote the external size meant that when the finalizer was collected, external size was subtraced from old space that was still being attributed to new space, so the old space total external size became negative.

TEST=ci
Bug: #50537
Change-Id: Id2fe2d748311de73f04de367c9cd153d87b74ad1
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/272350
Reviewed-by: Daco Harkes <dacoharkes@google.com>
Commit-Queue: Ryan Macnak <rmacnak@google.com>
@rmacnak-google rmacnak-google added the vm-gc Related to the VM's garbage collector label Nov 30, 2022
@blaugold
Copy link
Contributor

I just ran into the Out of Memory exception on stable, so Dart 2.19.4. Was the fix meant to go into 2.19?

@dcharkes
Copy link
Contributor

There was a comment on https://dart-review.googlesource.com/c/sdk/+/272350 about cherry picking.

But it looks like the CP was never done https://dart-review.googlesource.com/q/%22Add+missing+promotion%22

cc @rmacnak-google .

copybara-service bot pushed a commit that referenced this issue Mar 21, 2023
…move race incrementing external size.

When a FinalizerEntry's target gets promoted, the associated external size needs to also get promoted. We were handling the cases where the FinalizerEntry itself was either already old or remained new, but not the case where it was promoted. Failing to promote the external size meant that when the finalizer was collected, external size was subtraced from old space that was still being attributed to new space, so the old space total external size became negative.

TEST=ci
Bug: #50537
Change-Id: I83316636dea13415f38343212157c32f5ef19857
Cherry-pick: https://dart-review.googlesource.com/c/sdk/+/272350
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/289941
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. type-performance Issue relates to performance or code size vm-gc Related to the VM's garbage collector
Projects
None yet
Development

No branches or pull requests

6 participants