Hot-Reload can cause crashes with old closures in the heap #41482

mkustermann · 2020-04-14T19:26:31Z

The following flutter issues reporting SEGVs

Hot reload crash: ../../third_party/dart/runtime/vm/object.h: 3491: error: Handle check failed: saw 1101507569947902338 expected Function flutter/flutter#50270
Unable get flutter attach to work on iOS platform, dart vm seems to crash? flutter/flutter#54424
Flutter Mobile App Hot Reload Crash with Android Studio flutter/flutter#54730

might be caused by this problem.

This example vm/cc test can cause a segfault:

TEST_CASE(IsolateReload_ShapeChangeClosure) {
  const char* kScript = R"(
    class A {
      final x;
      final y;
      const A(this.x, this.y);
    }

    dynamic closure;

    main() {
      closure = () => A(1, 2);
      return "okay";
    }
  )";

 Dart_Handle lib = TestCase::LoadTestScript(kScript, NULL);
  EXPECT_VALID(lib);
  EXPECT_STREQ("okay", SimpleInvokeStr(lib, "main"));

  const char* kReloadScript = R"(
    class A {
      final x;
      const A(this.x);
    }

    dynamic closure;

    main() {
      // Call the old closure, which will try to call A(1, 2).
      closure();

      return "okay";
    }
  )";

  lib = TestCase::ReloadTestScript(kReloadScript);
  EXPECT_VALID(lib);
  EXPECT_STREQ("okay", SimpleInvokeStr(lib, "main"));
}

% ninja -C out/ReleaseX64 runtime_kernel
ninja: Entering directory `out/ReleaseX64'
ninja: no work to do.
%  out/ReleaseX64/run_vm_tests --dfe=$PWD/out/ReleaseX64/gen/kernel-service.dart.snapshot IsolateReload_ShapeChangeClosure
Running test: IsolateReload_ShapeChangeClosure
RELOAD REPORT:
{"type":"ReloadReport","success":true,"details":{"finalLibraryCount":20,"receivedLibraryCount":0,"receivedLibrariesBytes":0,"receivedClassesCount":0,"receivedProceduresCount":0,"savedLibraryCount":19,"loadedLibraryCount":1,"shapeChangeMappings":[]}}

===== CRASH =====
si_signo=Segmentation fault(11), si_code=1, si_addr=0x9
version=2.8.0-edge.22ad590c6829cfdf5ef246ecb7344482ca976dcb (be) (Tue Apr 14 14:09:56 2020 +0000) on "linux_x64"
pid=223413, thread=223413, isolate=isolate(0x560d9ee54900)
isolate_instructions=560d9bd7b1e0, vm_instructions=560d9bd7b1e0
  pc 0x00007f6451100672 fp 0x00007ffc807c0b20 Unknown symbol
  pc 0x00007f6451100596 fp 0x00007ffc807c0b80 Unknown symbol
  pc 0x00007f6451100447 fp 0x00007ffc807c0bb8 Unknown symbol
  pc 0x00007f6451100366 fp 0x00007ffc807c0be8 Unknown symbol
  pc 0x00007f645198174c fp 0x00007ffc807c0c48 Unknown symbol
  pc 0x0000560d9bf74652 fp 0x00007ffc807c0ce0 dart::DartEntry::InvokeCode(dart::Code const&, dart::Array const&, dart::Array const&, dart::Thread*)+0x112
  pc 0x0000560d9bf743b9 fp 0x00007ffc807c0d70 dart::DartEntry::InvokeFunction(dart::Function const&, dart::Array const&, dart::Array const&, unsigned long)+0x2e9
  pc 0x0000560d9c021f49 fp 0x00007ffc807c0de0 dart::Library::Invoke(dart::String const&, dart::Array const&, dart::Array const&, bool, bool) const+0x279
  pc 0x0000560d9c50f107 fp 0x00007ffc807c0ec0 Dart_Invoke+0x5d7
  pc 0x0000560d9bc2f143 fp 0x00007ffc807c0ef0 dart::SimpleInvokeStr(_Dart_Handle*, char const*)+0x23
  pc 0x0000560d9bc40f4f fp 0x00007ffc807c0fb0 dart::Dart_TestIsolateReload_ShapeChangeClosure()+0x27f
  pc 0x0000560d9bbabf3e fp 0x00007ffc807c0fd0 dart::TestCase::Run()+0x1e
  pc 0x0000560d9bbac083 fp 0x00007ffc807c1000 dart::TestCaseBase::RunTest()+0x53
  pc 0x0000560d9bcd2ae7 fp 0x00007ffc807c1020 dart::TestCaseBase::RunAll()+0x27
  pc 0x0000560d9bbac6d1 fp 0x00007ffc807c10b0 out/ReleaseX64/run_vm_tests+0x37f96d1
  pc 0x0000560d9bbac239 fp 0x00007ffc807c10c0 main+0x9
-- End of DumpStackTrace
zsh: abort      out/ReleaseX64/run_vm_tests  IsolateReload_ShapeChangeClosure

Tentatively assigning to @rmacnak-google due to hot-reload.

The text was updated successfully, but these errors were encountered:

rmacnak-google · 2020-04-14T22:36:04Z

This looks like an easier version of TEST_CASE(IsolateReload_PendingStaticCall_DefinedToNSM). This was definitely handled before, but likely broken by kernel.

mkustermann · 2020-04-15T09:57:44Z

I don't know how it was handled before, but it could be handled in the flow graph building code: Once hot-reload is done we clear out all code. Then we continue running, which will invoke the closure. That will trigger unoptimized compile of closure. The flow graph builder can then see that the call to the constructor (or any static method) is invalid due to mismatch in signatures (or types) and can call NSM directly.

/cc @alexmarkov @mraleph WDYT?

mraleph · 2020-04-15T11:52:20Z

This looks like an easier version of TEST_CASE(IsolateReload_PendingStaticCall_DefinedToNSM). This was definitely handled before, but likely broken by kernel.

Out of curiosity I did a bit of archeology: IsolateReload_PendingStaticCall_DefinedToNSM had Fail, Crash expectation when hot reload was initially landed 48c8ffa, then it morphed into Fail, Crash, Timeout, about which I complained. Then it became Skip and SkipSlow(unclear why?). Then @bkonyi triaged these tests as aspirational and reinstated Fail, Crash expectations for them.

So it seems we have never really supported the case when the code is not recompiled from scratch.

The flow graph builder can then see that the call to the constructor (or any static method) is invalid due to mismatch in signatures (or types) and can call NSM directly.

Checking number of parameters is easy (though it would still require changing flow graph builder to stop trusting Kernel, it's a bit of a slippery slope) - but with types things would get complicated, we would essentially need to duplicate some of the front end code (especially because static types of expressions are not serialised into kernel).

How about a different approach: in unoptimized code our static calls are already going through a stub. There is even a code which is supposed to rebind static target on reload, and this code even does some argument descriptor validation, which does not actually do anything if argument descriptors does not match (there is TODO to redirect to NSM).

Can it be that fixing this TODO would actually fix the crash?

mkustermann · 2020-04-15T12:07:24Z

@mraleph What about types? If we keep number of positional/named parameters but change the types of parameters, it becomes unsound, because we have old callers not providing the right type the new callee expects.

mraleph · 2020-04-15T12:09:01Z

@mkustermann rebinder has access to the old target - so it can compare the signature and check if new one is a valid override for the old one.

alexmarkov · 2020-04-15T17:34:50Z

It is not sufficient to check static calls. Old code was compiled with different class hierarchy and different surrounding code, and it may not be valid with hot-reloaded code. Executing old code of a closure is totally unsound and may cause crashes.

For example, consider the following code:

class A {}
class B extends A {}

main() {
  A foo() => new B();
  // hot-reload here
  A a = foo();
}

In the closure code the instance of class B is returned as a value of type A without any casts.

It could be hot-reloaded to

class A {}
class B {}
...

Executing old code of a closure breaches soundness, although no signatures changed.

Some time ago we discussed this problem in the context of hot-reloading bytecode, and I remeber that @rmacnak-google had a prototype to match closure instances to the new hot-reloaded code if possible, otherwise throw an error if unmatched closure is executed. This would guarantee that we will not attempt to execute invalid old code. @rmacnak-google could you remind what was the result of that experiment?

rmacnak-google · 2020-04-15T18:32:12Z

@mraleph Right, IsolateReload_PendingStaticCall_DefinedToNSM never passed. It is a more difficult version of the test Martin provided, which historically did pass in the VM's frontend.

Executing old code, either through a stored closure or activations still on the stack, or executing additional code, such as loaded through eval or loadUri, can breach soundness because they assume but do not verify the world of types is the same.

The experiment @alexmarkov mentioned is in https://dart-review.googlesource.com/c/sdk/+/111316/ It seemed okay for the few cases I tried in Flutter Gallery, but my intuition is still that this will reject more valid changes that than it will prevent invalid changes, which is par for Dart 2. Soundness will also require disallowing reload except at the event loop boundary, which means fix-and-continue debugging will be impossible.

franklinyow · 2020-04-15T20:13:14Z

Following.
We should try to have a fix before Friday if we want to cherrypick this to Beta.

Does not verify argument types, which remains a source of unsoundness. Bug: #37517 Bug: #41482 Change-Id: I2ebfd36370bc86f9f8a0e2c2463a2f5b72a50388 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/143541 Commit-Queue: Ryan Macnak <rmacnak@google.com> Reviewed-by: Alexander Markov <alexmarkov@google.com>

a-siva · 2020-07-30T17:58:55Z

Is this issue fixed now after 847a127 ? can it be closed ?

mkustermann · 2020-07-31T08:10:38Z

Is this issue fixed now after 847a127 ? can it be closed ?

No - there are still potential issues with old closures. They were compiled to kernel under the old static knowledge (e.g. class hierarchy). Even after hot-reload, the re-compilation of old closures in the VM will use the old kernel and therefore assume the old static knowledge is still valid, which can be no longer the case after hot-reload. (Alex provided an example in #41482 (comment) )

We should find someone to look into this, possibly in collaboration with CFE team.

Ideally the CFE (which knows about the static knowledge) could give the vm a hot-reload diff, containing additional information about closures. The VM could then refuse the reload if there is an old closure in the heap which the CFE told us is no longer valid.

franklinyow · 2020-08-10T23:34:06Z

@mkustermann
Please re-evaluate this and do one of the following:

Add it to a milestone if it is still P1
Change priority label and add milestone if no longer P1
Close, if this issue is no longer needed

a-siva · 2020-08-11T19:29:30Z

@johnniwinther is going to take a look at the suggestion made in #41482 and see if it is feasible for CFE to provide that information. Based on that investigation we will decide on how to proceed on this issue.

johnniwinther · 2020-08-12T07:13:47Z

@jensjoha Can you look into whether we can provide information as suggested in #41482 (comment) ?

jensjoha · 2020-08-12T09:13:25Z

Either I don't understand, or it's also a problem for non-closures. E.g. if I have this flutter library:

import 'package:flutter/material.dart';

void main() {
  runApp(MyApp());
}

class MyApp extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return MaterialApp(
      title: 'Flutter Demo',
      theme: ThemeData(
        primarySwatch: Colors.blue,
        visualDensity: VisualDensity.adaptivePlatformDensity,
      ),
      home: MyHomePage(title: 'Flutter Demo Home Page'),
    );
  }
}

class MyHomePage extends StatefulWidget {
  MyHomePage({Key key, this.title}) : super(key: key);

  final String title;

  @override
  _MyHomePageState createState() => _MyHomePageState();
}

class A {
  String foo() {
    print("Hello from A!");
    return "foo!";
  }
}

class B extends A {}

A Function() lookAtMe;
A lookAtMe2;

foo() {
  print("Setting data to B's");
  lookAtMe = () => new B();
  lookAtMe2 = new B();
}

bar() {
  A a = lookAtMe();
  print(a);
  if (a is A) {
    print("Closure returns an A as promised :)");
  } else {
    print("Closure doesn't return an A as promised :(");
  }
  print("");

  print(lookAtMe2);
  if (lookAtMe2 is A) {
    print("Field is an A as promised :)");
  } else {
    print("Field is not an A as promised :(");
  }
  print("");

  try {
    print(a.foo());
  } catch (e) {
    print(" => trying to print a.foo crashed :(");
  }
  try {
    print(lookAtMe2.foo());
  } catch (e) {
    print(" => trying to print lookAtMe2.foo crashed :(");
  }
}

class _MyHomePageState extends State<MyHomePage> {
  int _counter = 0;

  void _incrementCounter() {
    if (_counter.isEven) {
      foo();
    } else {
      bar();
    }

    setState(() {
      _counter++;
    });
  }

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      appBar: AppBar(
        title: Text(widget.title),
      ),
      body: Center(
        child: Column(
          mainAxisAlignment: MainAxisAlignment.center,
          children: <Widget>[
            Text(
              'You have pushed the button this many times:',
            ),
            Text(
              '$_counter',
              style: Theme.of(context).textTheme.headline4,
            ),
          ],
        ),
      ),
      floatingActionButton: FloatingActionButton(
        onPressed: _incrementCounter,
        tooltip: 'Increment',
        child: Icon(Icons.add),
      ),
    );
  }
}

and run it, pressing the + 3 times I get this terminal output:

flutter: Setting data to B's
flutter: Instance of 'B'
flutter: Closure returns an A as promised :)
flutter: 
flutter: Instance of 'B'
flutter: Field is an A as promised :)
flutter: 
flutter: Hello from A!
flutter: foo!
flutter: Hello from A!
flutter: foo!
flutter: Setting data to B's

(looking at the code, notice that lookAtMe contains a closure claiming to return an A returning a B (which at this point is an A) and a field of claimed type A containing a B (which at this point is an A).

Now apply these changes:

diff --git a/lib/main.dart b/lib/main.dart
index e73638d..8dfb383 100644
--- a/lib/main.dart
+++ b/lib/main.dart
@@ -34,15 +34,15 @@ class A {
   }
 }
 
-class B extends A {}
+class B /*extends A*/ {}
 
 A Function() lookAtMe;
 A lookAtMe2;
 
 foo() {
-  print("Setting data to B's");
-  lookAtMe = () => new B();
-  lookAtMe2 = new B();
+  print("Setting data to A's");
+  lookAtMe = () => new A();
+  lookAtMe2 = new A();
 }
 
 bar() {

hot reload, press the + 1 time and the terminal now says this:

Performing hot reload...                                                
Reloaded 1 of 526 libraries in 225ms.
flutter: Instance of 'B'
flutter: Closure doesn't return an A as promised :(
flutter: 
flutter: Instance of 'B'
flutter: Field is not an A as promised :(
flutter: 
flutter:  => trying to print a.foo crashed :(
flutter:  => trying to print lookAtMe2.foo crashed :(

Isn't this the situation - both for closures and fields - that is expressed in #41482 (comment) and thus a problem not centered around closures? (neither of these seem to crash the VM though --- is that the only concern?)

(pressing it twice more gives

flutter: Setting data to A's
flutter: Instance of 'A'
flutter: Closure returns an A as promised :)
flutter: 
flutter: Instance of 'A'
flutter: Field is an A as promised :)
flutter: 
flutter: Hello from A!
flutter: foo!
flutter: Hello from A!
flutter: foo!

i.e. the old stuff is cleared and the new stuff is set and everything is fine again)

franklinyow · 2020-09-28T17:09:45Z

What's the status on this?

a-siva · 2020-09-28T23:36:52Z

After further discussions with @rmacnak-google and @alexmarkov it is clear that fixing this would require some prototyping to evaluate the options available and the fix is going to take longer so moving this issue out of the October milestone and the plan is to complete this in Q4.

vsmenon · 2021-08-04T15:42:11Z

Any updates here? Is this issue still coming up in practice? (I.e., should this be a P1?)

a-siva · 2021-08-04T15:56:51Z

I don't think this needs to be a P1.

a-siva · 2022-11-04T21:31:18Z

//cc @rmacnak-google @alexmarkov do we still think this is something that we will plan to fix in the near term, should it still be a 'p2' ?

mkustermann added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. vm-hot-reload crash Process exits with SIGSEGV, SIGABRT, etc. An unhandled exception is not a crash. labels Apr 14, 2020

mkustermann assigned rmacnak-google Apr 14, 2020

franklinyow added the P1 A high priority bug; for example, a single project is unusable or has many test failures label Apr 16, 2020

mkustermann mentioned this issue Apr 17, 2020

Please merge #2cf981f354e15a9a88cec0b5c534c01aa55828c9 into dev channel #41550

Closed

franklinyow assigned johnniwinther Aug 11, 2020

a-siva added the area-front-end Use area-front-end for front end / CFE / kernel format related issues. label Aug 11, 2020

vsmenon added this to the October Release 2020 milestone Aug 18, 2020

a-siva modified the milestones: October Release 2020, Future Sep 28, 2020

mkustermann mentioned this issue Apr 15, 2021

Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0 in tid 31140 (1.ui), pid 30963 flutter/flutter#80469

Closed

a-siva added P2 A bug or feature request we're likely to work on and removed P1 A high priority bug; for example, a single project is unusable or has many test failures labels Aug 9, 2021

a-siva added the vm-triaged label Nov 4, 2022

a-siva added triaged Issue has been triaged by sub team and removed vm-triaged labels Dec 20, 2022

a-siva removed the P2 A bug or feature request we're likely to work on label Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hot-Reload can cause crashes with old closures in the heap #41482

Hot-Reload can cause crashes with old closures in the heap #41482

mkustermann commented Apr 14, 2020 •

edited

Loading

rmacnak-google commented Apr 14, 2020

mkustermann commented Apr 15, 2020

mraleph commented Apr 15, 2020

mkustermann commented Apr 15, 2020

mraleph commented Apr 15, 2020

alexmarkov commented Apr 15, 2020 •

edited

Loading

rmacnak-google commented Apr 15, 2020

franklinyow commented Apr 15, 2020

a-siva commented Jul 30, 2020

mkustermann commented Jul 31, 2020

franklinyow commented Aug 10, 2020

a-siva commented Aug 11, 2020

johnniwinther commented Aug 12, 2020

jensjoha commented Aug 12, 2020

franklinyow commented Sep 28, 2020

a-siva commented Sep 28, 2020

vsmenon commented Aug 4, 2021

a-siva commented Aug 4, 2021

a-siva commented Nov 4, 2022

Hot-Reload can cause crashes with old closures in the heap #41482

Hot-Reload can cause crashes with old closures in the heap #41482

Comments

mkustermann commented Apr 14, 2020 • edited Loading

rmacnak-google commented Apr 14, 2020

mkustermann commented Apr 15, 2020

mraleph commented Apr 15, 2020

mkustermann commented Apr 15, 2020

mraleph commented Apr 15, 2020

alexmarkov commented Apr 15, 2020 • edited Loading

rmacnak-google commented Apr 15, 2020

franklinyow commented Apr 15, 2020

a-siva commented Jul 30, 2020

mkustermann commented Jul 31, 2020

franklinyow commented Aug 10, 2020

a-siva commented Aug 11, 2020

johnniwinther commented Aug 12, 2020

jensjoha commented Aug 12, 2020

franklinyow commented Sep 28, 2020

a-siva commented Sep 28, 2020

vsmenon commented Aug 4, 2021

a-siva commented Aug 4, 2021

a-siva commented Nov 4, 2022

mkustermann commented Apr 14, 2020 •

edited

Loading

alexmarkov commented Apr 15, 2020 •

edited

Loading