Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rules for when deinit is called #11534

Closed
mppf opened this issue Oct 31, 2018 · 11 comments
Closed

rules for when deinit is called #11534

mppf opened this issue Oct 31, 2018 · 11 comments

Comments

@mppf
Copy link
Member

mppf commented Oct 31, 2018

To the extent the language specification says anything about it, I think it says that variables local to a block are de-initialized at the end of the block. This issue proposes that we adjust the rule.

Here is what I want the rule to be:

  • Local variables, including temporaries are deinitialized at or before the end of the enclosing block
    • the exception is values that are returned; these are instead transferred to the caller
  • Global variables (possibly including temporaries - see module-scope variables and temporaries #11526) are deinitialized at or before program shutdown
  • What does "or before" mean in the above? The compiler can move the deinitialization from the end of the enclosing block / end of the program to an earlier point if it is valid to do so. A valid point for a variable has the following properties:
    • the variable is no longer used directly (i.e. the variable itself is dead)
    • there are no references potentially pointing to that variable
    • there are no borrows potentially from that variable
  • Because deinitializers can have user-visible side effects (such as flushing a buffer, printing something to standard out, updating a file), the exact point at which these run (as well as the program behavior) can vary across Chapel versions.

Implementing such analysis would allow the resolution of #11492.

Examples to think about:

I/O

For the case in #11492, we have a statement like f.writer().write(A); and would like the channel that is the result of writer to be deinited sooner than it is today. This approach would allow that.

Array Slices

proc makeArray() {
  var A:[1..100] int;
  return A;
}
proc main() {
  ref slice = makeArray()[1..10];
  // point 1
  writeln(slice);
  writeln();
  // point 2
}

In this example, there is a reference to a call-expression temporary storing the slice, which in turn refers to a call-expression temporary storing the whole array. As a result, the compiler may not deinitialize either the array or the slice array-view of it at point 1. It must perform this deinitialization after these are no longer used, at point 2.

Owned and Borrows

proc main() {
  var instance: borrowed C = new owned C(1);
  // point 1
  writeln(instance);
  writeln();
  // point 2
}

This case is similar to the array slice case. Here instance is a borrow from an unnamed temporary storing an owned C. The owned C may not be deinitialized until after instance is dead; that is only at point 2.

User Variables vs Temporaries

Many of the important cases to call deinit earlier apply to temporaries, but I'm proposing that this behavior also apply to user variables. E.g.:

var someRecord = new MyRecord();
... code not mentioning someRecord ...

Here someRecord could be deinitd immediately after it is created.

RAII Lock

Chapel doesn't have a C++-style RAII lock type and it's unclear if we will add that in the future. However if we did add it, the lock type could use an attribute (see #7355) to communicate to the compiler that deinit for that type should not be moved. What would a C++-style RAII lock look like in Chapel? E.g.

{
   var lock = takeLock();
   // lock deinit function releases lock, so lock released at the end of this block
}

I suspect that there are clearer ways of writing this pattern. Python with contexts come to mind as a good alternative.

@mppf mppf changed the title rules for deinitializing rules for when deinit is called Oct 31, 2018
@benharsh
Copy link
Member

Maybe instead of the array example you could use this code:

class Data {
  var x : [1..10] int;
}

record MyArr {
  var data = new owned Data();

  proc this(r : range(?,?,?)) {
    return new MySlice(r, data.borrow());
  }
}

record MySlice {
  var dom : domain(1);
  var data : borrowed Data;

  proc init(r : range(?,?,?), data : borrowed Data) {
    this.dom = {r};
    this.data = data;
  }

  proc writeThis(f) {
    f.write(data.x[dom]);
  }
}

proc arrayMaker() {
  return new MyArr();
}

proc main() {
  const ref slice = arrayMaker()[1..5];
  writeln(slice);
}

Which I think demonstrates the same issue but with less internal/special behind-the-scenes code (i.e. array views).

@benharsh
Copy link
Member

benharsh commented Oct 31, 2018

If we're using the rule "at or before" can you actually rely on f.writer().write(A); doing what you want it to do? It seems like we should either say that temporaries are deinit'd at the end of the statement (with the ref-to-call-expr case being an exception), or say they're deinit'd at the end of the scope.

Edit: Otherwise I think it would become difficult for users to reason about the lifetime of their temporary/user variables by reading the code.

@bradcray
Copy link
Member

bradcray commented Nov 1, 2018

I had the same reaction / question as @benharsh just above, but he typed it faster than I managed to.

@mppf
Copy link
Member Author

mppf commented Nov 1, 2018

If we're using the rule "at or before" can you actually rely on f.writer().write(A); doing what you want it to do? It seems like we should either say that temporaries are deinit'd at the end of the statement (with the ref-to-call-expr case being an exception), or say they're deinit'd at the end of the scope.

Yes, I think it could be relied upon in that case, because the compiler would do a certain minimal level of analysis once at the time we add this feature, and that would be enough to cover that case. In more complex cases, I'd expect the behavior of programs to potentially change over time.

We could certainly consider specifying in more detail beyond what I put there. In particular, one reasonable approach would be to specify that the temporaries are either deleted at the end of the statement or at the end of the block. (I.e. without necessarily saying which one the compiler will choose in a given case).

I think it'd be interesting to be able to say something like:

  • in a ref/borrow case, the a temporary is destroyed at the end of the block
  • otherwise it's destroyed at the end of the statement

but the question is, can we describe in enough detail to a user what "a ref/borrow case" actually means? I certainly am confident that we can get the compiler to detect such cases conservatively, but what about the boundary of analysis, where the compiler thinks it might be a ref/borrow case but it's actually not? It's easy to create aliasing situations, for example.

So, is there a simple conservative rule you'd use to differentiate between the two, that we can specify and explain to users? If not, I think we should allow some variability depending on the strength of the analysis in the implementation.

(One common trick from alias analysis is the "address not taken" rule. Maybe there's something similar we can do here. But, in the I/O example, the channel temporary is passed by const ref to write, so it's not going to be that simple).

@mppf
Copy link
Member Author

mppf commented Nov 1, 2018

Maybe instead of the array example you could use this code:

I'm happy that you posted that, but I wanted to emphasize problems users can run in to with simple use of the built-in types and functions.

@benharsh
Copy link
Member

benharsh commented Nov 1, 2018

I was thinking about a rule like "free temporaries at end of statement unless the statement is variable declaration". It seems like that would be enough for records*, but maybe not for owned/borrowed classes? But maybe those need to be a special case anyways? For example:

class C {
  var x : int;

  proc deinit() {
    writeln("destroy: ", x);
  }
}

proc makeOwnedC() {
  return new owned C(5);
}

proc store(ref lhs : borrowed C, rhs : borrowed C) {
  lhs = rhs;
}

proc main() {
  var x : borrowed C;
  writeln("---- before ----");
  store(x, makeOwnedC().borrow());
  writeln("---- after ----");
  writeln(x);
}

This program prints:

---- before ----
---- after ----
{x = 5}
destroy: 5

* "seems like that would be enough for records": if we can ignore passing a record to an extern function by ref...

@mppf
Copy link
Member Author

mppf commented Nov 1, 2018

@benharsh - your owned/borrow example is an interesting one. I'd personally lean against special-casing owned/borrowed because records can contain owned/borrowed and then we have to special-case the records and then it doesn't feel so special as much as what happens half the time :)

Anyway, for this code:

proc main() {
  var x : borrowed C;
  writeln("---- before ----");
  store(x, makeOwnedC().borrow());
  writeln("---- after ----");
  writeln(x);
}

we can observe that there was a borrow from the owned temporary. Maybe that would trigger destroying that temporary at the end of the block. Maybe the idea you have is sufficient for refs.

@mppf
Copy link
Member Author

mppf commented Nov 2, 2018

As far as I can tell at the moment - Brad, BenH, Vass and I are on the same page for the general direction here.

However Brad and BenH would like more specific rules that can be in the specification (and read, understood, and relied upon) than what I put in the issue description.

What remains to do is to come up with the exact specific rules.

@vasslitvinov
Copy link
Member

I suggest that "valid points" must be at statement boundaries. This should implicitly include the end of the block.

mppf added a commit that referenced this issue Jan 3, 2019
Destroy vars in scopeless blocks at end of parent block

PR #11917 added blocks around .type expressions so that these expressions
could be eliminated in general. However this caused certain uses of
runtime types to have the runtime type be destroyed before it could be
used. This problem is visible with a valgrind run of the test

       arrays/dinan/init_arraymember/test7

This PR adjusts the logic in addAutoDestroyCalls / walkBlock to add
destruction of variables in a scopeless block at the end of the parent
block (as if the scopeless block were not present). This matches the
scopeless nature of these blocks. If the block needed to be scopeless so
a variable declared within it could be used later, then that variable
could be used later in the parent block, and so it shouldn't be destroyed
until the end of the parent block.

Testing of this branch revealed that error handling constructs within
scopeless blocks do not function correctly. This is addressed by adding
an assert for that case and adjusting the build.cpp code for begins to
use a regular block instead of a scopeless one.

Note that we have some desire to adjust where variables are destroyed -
see #11534.

- [x] full local testing
- [x] test7 passes with -valgrindexe


Reviewed by @benharsh - thanks!
@mppf mppf self-assigned this Jan 23, 2020
@mppf
Copy link
Member Author

mppf commented Feb 13, 2020

So, we have several options identified:

I'd like to add another option:

  • deinit user variables at last mention. For call temporaries in a variable declaration, deinitialize just after the variable declared would be deinitialized (so e.g. for a slice captured by ref, the slice is deinitialized after the last mention of the ref). For call temporaries in other calls, deinit them after the statement.

mppf added a commit that referenced this issue Mar 4, 2020
Use simpler rule for deinit point

This PR updates the compiler to use simpler rules for when a variable is
dead. The expiring value analysis (proposed in #13704) did not end up
being as useful as initially expected for copy elision. As a result we
are shifting to a simpler design for when copy elision can occur -
#14874. 

That leaves the expiring value analysis impacting the deinitialization
point (#11534 #11492) and the ownership transfer from non-nilable
checking (#13600). However the ownership transfer checking is not
significantly improved with expiring value analysis (it is approximately
as effective when handled by the must-alias analysis in the nil checker).
That leaves the deinit points.

Since a copy elision can cause a variable to be effectively dead in the
current scope after it occurs, there is some relationship between deinit
points and copy elision. That is why #13704 tried to address these two
together.

For example:

``` chapel
proc acceptsInIntent( in arg ) { ... }

proc test() {
  var x: MyRecord;
  ...
  acceptsInIntent(x); // copy elision occurs here
  // x is effectively dead in this scope
  // accesses through refs/borrows hopefully are compilation errors
  // but could be use-after-frees.
}
```

However since the copy elision rule is relatively simple, this PR changes
the rule for when a variable is dead to simply include the copy elision
rule. Besides that, it is based upon the simpler rule in #13704 /
#11534 (comment) .

Here is the rule as of this PR:

> A variable is dead:
>  * after copy elision if it occurred (after the last mention is
>    used to copy-initialize a variable or in intent argument)
>  * for the results of nested call expressions not involved in initializing
>    a user variable or reference after the end of the statement
>  * otherwise, at the end of the scope in which it is declared

There are two important things to note about the above rule:
 1. "involved in initializing a user variable or reference" includes
    through split-init
 2. the above rule applies to module-scope variables (aka global
    variables) as well as local variables. For global variables

Also note that `--no-early-deinit` is available to make all
temps deinited end-of-block.

This PR takes the following steps to implement the change to
deinitialization points:
 * add a shared helper isInitOrReturn to CallExpr.cpp; use it in
   possiblyInitsDestroyedVariable
 * removes --report-expiring flag and the entire expiring value analysis
 * adjusts the normalizer to leave temps in module init functions
   (instead of making them global variables) so that this can be
   determined later (after split-inits are known)
 * fixes a few bugs with nil checking
 * adds MarkTempsVisitor / gatherTempsDeadLastMention /
   markTempsDeadLastMention which are called from fixPrimInitsAndAddCasts
   in resolveFunction.cpp. This analyzes temp variables by how they are
   used to determine if they are used to initialize a user variable.
   Besides adding `FLAG_DEAD_LAST_MENTION` or `FLAG_DEAD_END_OF_BLOCK`,
   if the variable is in a module initialization function, it will pull
   it out into global scope.
 * insertReturnTemps marks the new temps since this runs after
   MarkTempsVisitor
 * adjust several tests that have use-after-frees according to the new
   rules:
     * nbody_orig_1.chpl
     * assignRecord.chpl
     * owned-class-instantiation-types-user-init.chpl
     * owned-class-instantiation-types.chpl
 * (and these also which were added after checking in PR  #15072 detected an issue with them)
    * iter-return-first-borrowed-array-element.chpl
    * owned-record-instantiation-types-user-init.chpl 
    * classes/initializers/generics/inheritance/inherited.chpl 

Reviewed by @benharsh - thanks!

- [x] full local testing
@mppf
Copy link
Member Author

mppf commented Mar 10, 2020

PR #15041 implements a simpler rule for deinit points that enables this program to behave as originally expected.

A variable is dead:

  • after copy elision if it occurred (after the last mention is
    used to copy-initialize a variable or in intent argument)
  • for the results of nested call expressions not involved in initializing
    a user variable or reference after the end of the statement
  • otherwise, at the end of the scope in which it is declared

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants