Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hot-Reload can cause crashes with old closures in the heap #41482

Open
mkustermann opened this issue Apr 14, 2020 · 19 comments
Open

Hot-Reload can cause crashes with old closures in the heap #41482

mkustermann opened this issue Apr 14, 2020 · 19 comments
Assignees
Labels
area-front-end Use area-front-end for front end / CFE / kernel format related issues. area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. crash Process exits with SIGSEGV, SIGABRT, etc. An unhandled exception is not a crash. triaged Issue has been triaged by sub team vm-hot-reload
Milestone

Comments

@mkustermann
Copy link
Member

mkustermann commented Apr 14, 2020

The following flutter issues reporting SEGVs

might be caused by this problem.

This example vm/cc test can cause a segfault:

TEST_CASE(IsolateReload_ShapeChangeClosure) {
  const char* kScript = R"(
    class A {
      final x;
      final y;
      const A(this.x, this.y);
    }

    dynamic closure;

    main() {
      closure = () => A(1, 2);
      return "okay";
    }
  )";

 Dart_Handle lib = TestCase::LoadTestScript(kScript, NULL);
  EXPECT_VALID(lib);
  EXPECT_STREQ("okay", SimpleInvokeStr(lib, "main"));

  const char* kReloadScript = R"(
    class A {
      final x;
      const A(this.x);
    }

    dynamic closure;

    main() {
      // Call the old closure, which will try to call A(1, 2).
      closure();

      return "okay";
    }
  )";

  lib = TestCase::ReloadTestScript(kReloadScript);
  EXPECT_VALID(lib);
  EXPECT_STREQ("okay", SimpleInvokeStr(lib, "main"));
}
% ninja -C out/ReleaseX64 runtime_kernel
ninja: Entering directory `out/ReleaseX64'
ninja: no work to do.
%  out/ReleaseX64/run_vm_tests --dfe=$PWD/out/ReleaseX64/gen/kernel-service.dart.snapshot IsolateReload_ShapeChangeClosure
Running test: IsolateReload_ShapeChangeClosure
RELOAD REPORT:
{"type":"ReloadReport","success":true,"details":{"finalLibraryCount":20,"receivedLibraryCount":0,"receivedLibrariesBytes":0,"receivedClassesCount":0,"receivedProceduresCount":0,"savedLibraryCount":19,"loadedLibraryCount":1,"shapeChangeMappings":[]}}

===== CRASH =====
si_signo=Segmentation fault(11), si_code=1, si_addr=0x9
version=2.8.0-edge.22ad590c6829cfdf5ef246ecb7344482ca976dcb (be) (Tue Apr 14 14:09:56 2020 +0000) on "linux_x64"
pid=223413, thread=223413, isolate=isolate(0x560d9ee54900)
isolate_instructions=560d9bd7b1e0, vm_instructions=560d9bd7b1e0
  pc 0x00007f6451100672 fp 0x00007ffc807c0b20 Unknown symbol
  pc 0x00007f6451100596 fp 0x00007ffc807c0b80 Unknown symbol
  pc 0x00007f6451100447 fp 0x00007ffc807c0bb8 Unknown symbol
  pc 0x00007f6451100366 fp 0x00007ffc807c0be8 Unknown symbol
  pc 0x00007f645198174c fp 0x00007ffc807c0c48 Unknown symbol
  pc 0x0000560d9bf74652 fp 0x00007ffc807c0ce0 dart::DartEntry::InvokeCode(dart::Code const&, dart::Array const&, dart::Array const&, dart::Thread*)+0x112
  pc 0x0000560d9bf743b9 fp 0x00007ffc807c0d70 dart::DartEntry::InvokeFunction(dart::Function const&, dart::Array const&, dart::Array const&, unsigned long)+0x2e9
  pc 0x0000560d9c021f49 fp 0x00007ffc807c0de0 dart::Library::Invoke(dart::String const&, dart::Array const&, dart::Array const&, bool, bool) const+0x279
  pc 0x0000560d9c50f107 fp 0x00007ffc807c0ec0 Dart_Invoke+0x5d7
  pc 0x0000560d9bc2f143 fp 0x00007ffc807c0ef0 dart::SimpleInvokeStr(_Dart_Handle*, char const*)+0x23
  pc 0x0000560d9bc40f4f fp 0x00007ffc807c0fb0 dart::Dart_TestIsolateReload_ShapeChangeClosure()+0x27f
  pc 0x0000560d9bbabf3e fp 0x00007ffc807c0fd0 dart::TestCase::Run()+0x1e
  pc 0x0000560d9bbac083 fp 0x00007ffc807c1000 dart::TestCaseBase::RunTest()+0x53
  pc 0x0000560d9bcd2ae7 fp 0x00007ffc807c1020 dart::TestCaseBase::RunAll()+0x27
  pc 0x0000560d9bbac6d1 fp 0x00007ffc807c10b0 out/ReleaseX64/run_vm_tests+0x37f96d1
  pc 0x0000560d9bbac239 fp 0x00007ffc807c10c0 main+0x9
-- End of DumpStackTrace
zsh: abort      out/ReleaseX64/run_vm_tests  IsolateReload_ShapeChangeClosure

Tentatively assigning to @rmacnak-google due to hot-reload.

@rmacnak-google
Copy link
Contributor

This looks like an easier version of TEST_CASE(IsolateReload_PendingStaticCall_DefinedToNSM). This was definitely handled before, but likely broken by kernel.

@mkustermann
Copy link
Member Author

I don't know how it was handled before, but it could be handled in the flow graph building code: Once hot-reload is done we clear out all code. Then we continue running, which will invoke the closure. That will trigger unoptimized compile of closure. The flow graph builder can then see that the call to the constructor (or any static method) is invalid due to mismatch in signatures (or types) and can call NSM directly.

/cc @alexmarkov @mraleph WDYT?

@mraleph
Copy link
Member

mraleph commented Apr 15, 2020

This looks like an easier version of TEST_CASE(IsolateReload_PendingStaticCall_DefinedToNSM). This was definitely handled before, but likely broken by kernel.

Out of curiosity I did a bit of archeology: IsolateReload_PendingStaticCall_DefinedToNSM had Fail, Crash expectation when hot reload was initially landed 48c8ffa, then it morphed into Fail, Crash, Timeout, about which I complained. Then it became Skip and SkipSlow(unclear why?). Then @bkonyi triaged these tests as aspirational and reinstated Fail, Crash expectations for them.

So it seems we have never really supported the case when the code is not recompiled from scratch.

The flow graph builder can then see that the call to the constructor (or any static method) is invalid due to mismatch in signatures (or types) and can call NSM directly.

Checking number of parameters is easy (though it would still require changing flow graph builder to stop trusting Kernel, it's a bit of a slippery slope) - but with types things would get complicated, we would essentially need to duplicate some of the front end code (especially because static types of expressions are not serialised into kernel).

How about a different approach: in unoptimized code our static calls are already going through a stub. There is even a code which is supposed to rebind static target on reload, and this code even does some argument descriptor validation, which does not actually do anything if argument descriptors does not match (there is TODO to redirect to NSM).

Can it be that fixing this TODO would actually fix the crash?

@mkustermann
Copy link
Member Author

@mraleph What about types? If we keep number of positional/named parameters but change the types of parameters, it becomes unsound, because we have old callers not providing the right type the new callee expects.

@mraleph
Copy link
Member

mraleph commented Apr 15, 2020

@mkustermann rebinder has access to the old target - so it can compare the signature and check if new one is a valid override for the old one.

@alexmarkov
Copy link
Contributor

alexmarkov commented Apr 15, 2020

It is not sufficient to check static calls. Old code was compiled with different class hierarchy and different surrounding code, and it may not be valid with hot-reloaded code. Executing old code of a closure is totally unsound and may cause crashes.

For example, consider the following code:

class A {}
class B extends A {}

main() {
  A foo() => new B();
  // hot-reload here
  A a = foo();
}

In the closure code the instance of class B is returned as a value of type A without any casts.

It could be hot-reloaded to

class A {}
class B {}
...

Executing old code of a closure breaches soundness, although no signatures changed.

Some time ago we discussed this problem in the context of hot-reloading bytecode, and I remeber that @rmacnak-google had a prototype to match closure instances to the new hot-reloaded code if possible, otherwise throw an error if unmatched closure is executed. This would guarantee that we will not attempt to execute invalid old code. @rmacnak-google could you remind what was the result of that experiment?

@rmacnak-google
Copy link
Contributor

@mraleph Right, IsolateReload_PendingStaticCall_DefinedToNSM never passed. It is a more difficult version of the test Martin provided, which historically did pass in the VM's frontend.

Executing old code, either through a stored closure or activations still on the stack, or executing additional code, such as loaded through eval or loadUri, can breach soundness because they assume but do not verify the world of types is the same.

The experiment @alexmarkov mentioned is in https://dart-review.googlesource.com/c/sdk/+/111316/ It seemed okay for the few cases I tried in Flutter Gallery, but my intuition is still that this will reject more valid changes that than it will prevent invalid changes, which is par for Dart 2. Soundness will also require disallowing reload except at the event loop boundary, which means fix-and-continue debugging will be impossible.

@franklinyow
Copy link
Contributor

Following.
We should try to have a fix before Friday if we want to cherrypick this to Beta.

dart-bot pushed a commit that referenced this issue Apr 15, 2020
Does not verify argument types, which remains a source of unsoundness.

Bug: #37517
Bug: #41482
Change-Id: I2ebfd36370bc86f9f8a0e2c2463a2f5b72a50388
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/143541
Commit-Queue: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Alexander Markov <alexmarkov@google.com>
@franklinyow franklinyow added the P1 A high priority bug; for example, a single project is unusable or has many test failures label Apr 16, 2020
dart-bot pushed a commit that referenced this issue Apr 20, 2020
Does not verify argument types, which remains a source of unsoundness.

Bug: #37517
Bug: #41482
Change-Id: I2ebfd36370bc86f9f8a0e2c2463a2f5b72a50388
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/143541
Commit-Queue: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Alexander Markov <alexmarkov@google.com>
@a-siva
Copy link
Contributor

a-siva commented Jul 30, 2020

Is this issue fixed now after 847a127 ? can it be closed ?

@mkustermann
Copy link
Member Author

Is this issue fixed now after 847a127 ? can it be closed ?

No - there are still potential issues with old closures. They were compiled to kernel under the old static knowledge (e.g. class hierarchy). Even after hot-reload, the re-compilation of old closures in the VM will use the old kernel and therefore assume the old static knowledge is still valid, which can be no longer the case after hot-reload. (Alex provided an example in #41482 (comment) )

We should find someone to look into this, possibly in collaboration with CFE team.

Ideally the CFE (which knows about the static knowledge) could give the vm a hot-reload diff, containing additional information about closures. The VM could then refuse the reload if there is an old closure in the heap which the CFE told us is no longer valid.

@franklinyow
Copy link
Contributor

@mkustermann
Please re-evaluate this and do one of the following:

  1. Add it to a milestone if it is still P1
  2. Change priority label and add milestone if no longer P1
  3. Close, if this issue is no longer needed

@a-siva a-siva added the area-front-end Use area-front-end for front end / CFE / kernel format related issues. label Aug 11, 2020
@a-siva
Copy link
Contributor

a-siva commented Aug 11, 2020

@johnniwinther is going to take a look at the suggestion made in #41482 and see if it is feasible for CFE to provide that information. Based on that investigation we will decide on how to proceed on this issue.

@johnniwinther
Copy link
Member

@jensjoha Can you look into whether we can provide information as suggested in #41482 (comment) ?

@jensjoha
Copy link
Contributor

Either I don't understand, or it's also a problem for non-closures. E.g. if I have this flutter library:

import 'package:flutter/material.dart';

void main() {
  runApp(MyApp());
}

class MyApp extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return MaterialApp(
      title: 'Flutter Demo',
      theme: ThemeData(
        primarySwatch: Colors.blue,
        visualDensity: VisualDensity.adaptivePlatformDensity,
      ),
      home: MyHomePage(title: 'Flutter Demo Home Page'),
    );
  }
}

class MyHomePage extends StatefulWidget {
  MyHomePage({Key key, this.title}) : super(key: key);

  final String title;

  @override
  _MyHomePageState createState() => _MyHomePageState();
}

class A {
  String foo() {
    print("Hello from A!");
    return "foo!";
  }
}

class B extends A {}

A Function() lookAtMe;
A lookAtMe2;

foo() {
  print("Setting data to B's");
  lookAtMe = () => new B();
  lookAtMe2 = new B();
}

bar() {
  A a = lookAtMe();
  print(a);
  if (a is A) {
    print("Closure returns an A as promised :)");
  } else {
    print("Closure doesn't return an A as promised :(");
  }
  print("");

  print(lookAtMe2);
  if (lookAtMe2 is A) {
    print("Field is an A as promised :)");
  } else {
    print("Field is not an A as promised :(");
  }
  print("");

  try {
    print(a.foo());
  } catch (e) {
    print(" => trying to print a.foo crashed :(");
  }
  try {
    print(lookAtMe2.foo());
  } catch (e) {
    print(" => trying to print lookAtMe2.foo crashed :(");
  }
}

class _MyHomePageState extends State<MyHomePage> {
  int _counter = 0;

  void _incrementCounter() {
    if (_counter.isEven) {
      foo();
    } else {
      bar();
    }

    setState(() {
      _counter++;
    });
  }

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      appBar: AppBar(
        title: Text(widget.title),
      ),
      body: Center(
        child: Column(
          mainAxisAlignment: MainAxisAlignment.center,
          children: <Widget>[
            Text(
              'You have pushed the button this many times:',
            ),
            Text(
              '$_counter',
              style: Theme.of(context).textTheme.headline4,
            ),
          ],
        ),
      ),
      floatingActionButton: FloatingActionButton(
        onPressed: _incrementCounter,
        tooltip: 'Increment',
        child: Icon(Icons.add),
      ),
    );
  }
}

and run it, pressing the + 3 times I get this terminal output:

flutter: Setting data to B's
flutter: Instance of 'B'
flutter: Closure returns an A as promised :)
flutter: 
flutter: Instance of 'B'
flutter: Field is an A as promised :)
flutter: 
flutter: Hello from A!
flutter: foo!
flutter: Hello from A!
flutter: foo!
flutter: Setting data to B's

(looking at the code, notice that lookAtMe contains a closure claiming to return an A returning a B (which at this point is an A) and a field of claimed type A containing a B (which at this point is an A).

Now apply these changes:

diff --git a/lib/main.dart b/lib/main.dart
index e73638d..8dfb383 100644
--- a/lib/main.dart
+++ b/lib/main.dart
@@ -34,15 +34,15 @@ class A {
   }
 }
 
-class B extends A {}
+class B /*extends A*/ {}
 
 A Function() lookAtMe;
 A lookAtMe2;
 
 foo() {
-  print("Setting data to B's");
-  lookAtMe = () => new B();
-  lookAtMe2 = new B();
+  print("Setting data to A's");
+  lookAtMe = () => new A();
+  lookAtMe2 = new A();
 }
 
 bar() {

hot reload, press the + 1 time and the terminal now says this:

Performing hot reload...                                                
Reloaded 1 of 526 libraries in 225ms.
flutter: Instance of 'B'
flutter: Closure doesn't return an A as promised :(
flutter: 
flutter: Instance of 'B'
flutter: Field is not an A as promised :(
flutter: 
flutter:  => trying to print a.foo crashed :(
flutter:  => trying to print lookAtMe2.foo crashed :(

Isn't this the situation - both for closures and fields - that is expressed in #41482 (comment) and thus a problem not centered around closures? (neither of these seem to crash the VM though --- is that the only concern?)

(pressing it twice more gives

flutter: Setting data to A's
flutter: Instance of 'A'
flutter: Closure returns an A as promised :)
flutter: 
flutter: Instance of 'A'
flutter: Field is an A as promised :)
flutter: 
flutter: Hello from A!
flutter: foo!
flutter: Hello from A!
flutter: foo!

i.e. the old stuff is cleared and the new stuff is set and everything is fine again)

@vsmenon vsmenon added this to the October Release 2020 milestone Aug 18, 2020
@franklinyow
Copy link
Contributor

What's the status on this?

@a-siva
Copy link
Contributor

a-siva commented Sep 28, 2020

After further discussions with @rmacnak-google and @alexmarkov it is clear that fixing this would require some prototyping to evaluate the options available and the fix is going to take longer so moving this issue out of the October milestone and the plan is to complete this in Q4.

@vsmenon
Copy link
Member

vsmenon commented Aug 4, 2021

Any updates here? Is this issue still coming up in practice? (I.e., should this be a P1?)

@a-siva
Copy link
Contributor

a-siva commented Aug 4, 2021

I don't think this needs to be a P1.

@a-siva a-siva added P2 A bug or feature request we're likely to work on and removed P1 A high priority bug; for example, a single project is unusable or has many test failures labels Aug 9, 2021
@a-siva
Copy link
Contributor

a-siva commented Nov 4, 2022

//cc @rmacnak-google @alexmarkov do we still think this is something that we will plan to fix in the near term, should it still be a 'p2' ?

@a-siva a-siva added triaged Issue has been triaged by sub team and removed vm-triaged labels Dec 20, 2022
@a-siva a-siva removed the P2 A bug or feature request we're likely to work on label Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-front-end Use area-front-end for front end / CFE / kernel format related issues. area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. crash Process exits with SIGSEGV, SIGABRT, etc. An unhandled exception is not a crash. triaged Issue has been triaged by sub team vm-hot-reload
Projects
None yet
Development

No branches or pull requests

9 participants