Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AOT illegally removes write-only fields #50571

Open
rmacnak-google opened this issue Nov 28, 2022 · 8 comments
Open

AOT illegally removes write-only fields #50571

rmacnak-google opened this issue Nov 28, 2022 · 8 comments
Labels
area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends.

Comments

@rmacnak-google
Copy link
Contributor

Consider

class Node {
  ...
  var leftWeak;
  var leftStrong;
  var rightWeak;
  var rightStrong;
  Node get left => leftWeak?.target;
  set left(Node value) {
    leftWeak = value == null ? null : new WeakReference(value);
    leftStrong = value;
  }
  Node get right => rightWeak?.target;
  set right(Node value) {
    rightWeak = value == null ? null : new WeakReference(value);
    rightStrong = value;
  }
}

Whole-program analysis currently observes that leftStrong and rightStrong are written to but never read, and concludes it can eliminate them. This is incorrect because these fields also have an effect on the reachablity of their value. With the fields removed, left and right sometimes return null depending on GC timing.

@rmacnak-google rmacnak-google added the area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. label Nov 28, 2022
@alexmarkov
Copy link
Contributor

@rmacnak-google Could you clarify what part of the spec is violated?

My understanding is that when WeakReference was introduced, we were careful enough to not prevent optimizations like tree-shaking of write-only fields. So API specification doesn't actually tell anything about reachability, roots etc.

From the API specification:

/// A _weak_ reference to the [target] object which may be cleared
/// (set to reference `null` instead) at any time
/// when there is no other way for the program to access the target object.

In the program above, leftStrong and rightStrong are never read so program can access objects stored in those fields only via WeakReferences. So, my understanding that it is correct to clear those WeakReference objects.

At the same time, there's no guarantee that they would be ever cleared, so WeakReferences can always return their targets in JIT mode; or sometimes, depending on the GC timing in AOT mode:

/// There are no guarantees that a weak reference will ever be cleared
/// even if all references to its target are weak references.

It looks like it is working as intended.
@dcharkes @mraleph Please correct me if I'm wrong.

@mraleph
Copy link
Member

mraleph commented Nov 28, 2022

Yes, the intent was to phrase WeakReference API contract in such a way that it makes such optimisation possible. My reading of the comment is the same as @alexmarkov's: there is no execution of the program that could observe this value without derefencing the weak reference itself.

We could clarify the wording if it is not clear. ECMA-262 makes it more explicit and says that the value is considered live only there is a possible execution of the program, which does not dereference weak references and observes this value.

@mraleph
Copy link
Member

mraleph commented Dec 1, 2022

@lrhn Any opinion on making the contract more explicit in the doc-comment? Otherwise I am inclined to close as WAI.

@rmacnak-google
Copy link
Contributor Author

there is no execution of the program that could observe this value without derefencing the weak reference itself.

Plenty of other ways: a different weak reference, an ephemeron, Finalizer, NativeFinalizer, OutOfMemoryError, Dart_NewWeakPersistentHandle, Dart_Invoke, getObject, etc. What WeakReference's doc claims is not relevant to most of these.

I anticipate you will claim these are unimportant and that allowing this class of optimizations is more important. I claim the straightforward semantics are more important. Global fields are reachable. Local fields that are in scope are reachable. The fields of a reachable object are reachable. Activation pruning is an illegal optimization. Closure environment pruning is an illegal optimization. Field pruning is an illegal optimization.

@dcharkes
Copy link
Contributor

dcharkes commented Dec 1, 2022

I claim the straightforward semantics are more important.

What we found important in practise is not having premature finalization (which leads to use after free bugs).

We introduced Finalizable for this.

If you want to keep things alive, use Finalizable, this prevents certain optimizations, including tree-shaking its fields.

class Node implements Finalizable {
  ...
  var leftWeak;
  var leftStrong;
  var rightWeak;
  var rightStrong;
  Node get left => leftWeak?.target;
  set left(Node value) {
    leftWeak = value == null ? null : new WeakReference(value);
    leftStrong = value;
  }
  Node get right => rightWeak?.target;
  set right(Node value) {
    rightWeak = value == null ? null : new WeakReference(value);
    rightStrong = value;
  }
}

Specifically for fields, see:

We do require using Finalizables for NativeFinalizers. So that should prevent premature finalization there.

With Finalizers, the desired semantics (I don't remember by who, we would need to dig through issues) was to allow premature finalization. The thinking here is that if you're still using an object, it will not be GCed. This is not true for native resources, of which the wrapper object can be GCed while the C function is still running.

Of course, that doesn't prevent people from using Finalizable for uses with Finalizer, WeakReference etc.

@rmacnak-google Did you run into a specific use case where this is an issue?

@alexmarkov
Copy link
Contributor

Yes, we should probably clarify docs to say that WeakReference can be cleared when "there is no other way for the program to access the target object except through a target of a WeakReference, Expando or a handle created with Dart_NewWeakPersistentHandle". Otherwise, according to the current wording of the spec it is not correct to clear a WeakReference if there is another WeakReference or Expando pointing to the same object.

How can you reach to an object via an OutOfMemoryError? Also, Finalizer and NativeFinalizer do not provide an access to the original object being finalized.

Dart_Invoke/Dart_GetField can access members only if they are annotated with @pragma('vm:entry-point') which disables tree-shaking of these members, so they should be fine.

@lrhn
Copy link
Member

lrhn commented Dec 2, 2022

I agree that it's working as intended. Mainly because the "intended" is very, very loose.

It's very hard to describe the intended behavior. The shortest I can come up with is:

If dereferencing a weak reference does not return the object it was created with, then there must be no future execution of the program where any expression evaluates to that object.

(That is, observing the weak reference to have been emptied, after previously having held an object o, must mean that there is no way for the program to ever reference o again.)

Finalizers are even trickier, because they act when the object goes away, not just when you try to observe the weak reference. So, for a finalizer, it must be:

If a finalizer invokes the finalization associated to a specific object, then there must be no future execution of the program where any expression evaluates to that object.

It's prescriptive, not descriptive, but it's hard to be anything but when abstracting over any possible garbage collection techniques.
Users must be ready for finalization whenever they are not absolutely certain that they'll ever look at that object directly again. Or than some other code will.

But even if we take that literally, it still kind-of assumes that we execute every expression in the source program.
Optimizations may change that. Write-only variables is one thing, but you can optimize away reads of variables too.
Take

/*final*/ class Resource {
  final ExternalResource _realResource;
  final Something otherStuff;
  Resource(this._realResource, this.otherStuff);
  void use() {
    staticHelper(_realResource);
  }
}
class Actor {
  final Resource _resource;
  Actor(this._resource);
  void act() {
     _resource.use();
  }
  ...
}

The Actor class doesn't use _resource in any other way. Because the Resource class is final (doesn't have any subclasses, which we can detect to day using global analysis, and might be able to just declare in the future) ... it seems like a safe optimization to inline the use method and unbox the Resource and only store the _realSource value. We don't need that otherStuff anyway.
It's indistinguishable from the outside, because nobody else can access the _resource field, so they can't see that the entire object has been optimized away.

That optimization would wreck havoc if there is a finalizer on the Resource object which disposes the _realResource stored in it.

The source has an expression which evaluates to the resource object, the _resource of act(), but we optimized that expression away, so the actual running program does not have the same "expressions" as the source.
Which makes it under-defined to talk about values of expressions at runtime.

So it's not easy to predict when a value is dead. That's usually not a problem, because if it is dead, you can't see that value anyway, and if you can, it's not dead. It doesn't matter whether it's dead when you're not looking.
Finalizers change that. Because finalization happens at a specific time and can have side effects, it needs some kind of predictability.

We can't just say that any optimization is fair game, and finalizers should be ready for their objects to die at any time. Object existence isn't that observable to begin with. If you don't call identical with the object, there are lots of possible optimizations that won't preserve an object with that identity. Being completely unpredictable is not useful.

We may want to make some kind of promise about which optimizations are off the table.

That is, I think we should say something about how much, or little, you can actually rely on. And again, it might just be "don't assume anything if the class isn't Finalizable".

What is it that Finalizable does?
If a type is Finalizable, we promise to not optimize away any field with that type, its life-time will always be that of the containing object. Any local variable of that type will have a life time that is its entire scope. Any static/top-level variable will stay alive for the duration of the entire isolate.

I noticed that ECMAScript promises that any value you read out of a weak reference will stay alive for the duration of the current execution, which is probably until you get back to the event loop.
Our (non-native) finalizers will also only run between events (so as events). Do we make any promises about weak references not being cleared in the middle of code?

@mraleph
Copy link
Member

mraleph commented Dec 12, 2022

@rmacnak-google

What WeakReference's doc claims is not relevant to most of these.

The contract in WeakReference documentation is precisely what defines which optimizations we can and can't do. Prior to us adding WeakReference and Finalizer to the dart:core pure Dart programs did not have any way to observe GC behaviour and consequently could not observe reachability properties and how they change at different optimisation levels.

I anticipate you will claim these are unimportant and that allowing this class of optimisations is more important.

I do think that unused field removal is an important optimisation. It allows you to have debug only fields which are stripped in release mode (see e.g. flutter/flutter#22915). Dart does not really provide any language feature to achieve the same thing (i.e. a preprocessor).

I claim the straightforward semantics are more important. Global fields are reachable. Local fields that are in scope are reachable. The fields of a reachable object are reachable. Activation pruning is an illegal optimization. Closure environment pruning is an illegal optimization. Field pruning is an illegal optimization.

I understand where you are coming from, but I think in reality we have to tread a fine line here. For example, currently we base our closure contexts on scopes. It's a very straightforward model. However we constantly see users being puzzled about this - because they expect closure to only close over referenced local variables.

As I see it, there are two forces here pulling in opposite direction: for runtime observability and debuggability you would prefer a very simplistic execution model (similar to what you describe), but for production deployment you would actually want an optimised execution model which is less straightforward and gives compiler more leeway to optimise things - but within well defined bounds.

e.g. for fields specifically this means that any field that is not read and does not have @pragma('vm:entry-point') can disappear in a deployed binary. I think this is straightforward enough and well-defined enough.

@lrhn

Do we make any promises about weak references not being cleared in the middle of code?

No we did not make such promise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends.
Projects
None yet
Development

No branches or pull requests

5 participants