Skip to content

Evaluation of field initializers in constructor execution. #4572

@lrhn

Description

@lrhn

The current execution order of invoking an initializing constructor of a class C with an argument list to initialize an object o (which is an instance of a subclass of C) is:

  • Bind actuals to formals (part of invocation).
    • Introducing the default value as the value for parameters with no argument.
    • Creating local variables for parameters with the parameter values.
    • Creating the parameter-scope and initializer-list bindings with those variables.
  • Evaluate initializer expressions of instance variables of C (in source order) and initialize the variables to the value.
  • Execute initializing formals (assign the parameter value to the instance variable).
    • May overwrite the value of the initializer expression of a non-final instance variable.
  • Execute the initializer list entries in source order (in the initializer-list scope).
    • Check asserts for assert entries if asserts are enabled.
    • Initialize instance variables for initializer entries.
      • May overwrite the value of the initializer expression of a non-final instance variable.
  • Evalute super-constructor argument list expressions, build actual argument list including super-parameters
    and invoke the super-constructor with this argument list to initialize o.
  • Execute the body in the parameter scope with this bound to o.

The current specification, and implementation, allows initializing a non-final varible twice, once by initializer expression and once by the constructor itself (initializing formal or initializer list entry). The specified order is that the constructor value overwrites the initializer expression value.

The order is visible. Instance variable initializer expressions can have side effects, and so can initializer list entries.
It's possible to see the order of the expressions being evaluated, and it's possible to see which assignment was last, since that's the value of the instance variable when this becomes available.
This only matters for classes with no const constructors. You can only initialize a variable twice if the variable is not final, and a const constructor class can only have final variables.

With primary constructors, instance variable initializers expressions can refer to the initializer-list scope.

The way it's currently specified, by desugaring, it moves initializer expressions to be initializer-list entries.
That would immediately make it an error if a moved initializer is now initialized twice inside the constructor.
That may itself be surprising. If that error is suppressed, then it may move initializer expressions past the execution of initializing formals. If it only moves initializer expressions that do not refer to constructor parameters, then it changes order between initializing expressions.

In any case, we need to specify the new execution order, preferably explicitly and without using desugaring.

The current specification works only be desugaring, which means it doesn't, and cannot, have the same evaluation order as a similar non-primary constructor.

That can be enough: We preserve the existing behavior, then we specify the primary constructor behavior is the behavior of a desugaring. I fear that the result may show artifacts of that desugaring that can confuse users who think of instance variables by themselves, not as something that is moved into the constructor. I don't think it gives us the best language, but it may be the most easily implemented language.

We can also specify that we keep the current evaluation order, initializer expressions before initializing formals, allowing double-initialization. Then we can't specify it using desugaring, but nothing ever said we had to.

Or we can introduce restrictions (probably only on primary constructor classes) that ensures that there can't be double-initialization at all, in which case moving initializer expressions past initializing formals is unobservable.

If we try to specify a new general evaluation order for initializing constructors, one that includes primary constructors instead of having different evaluation orders for primary and non-primary constructors, differrent issues may arise.

  • If we move the instance variable initializer expression evaluation to after "executing initializing formals", we change the order of initialization, which would be a breaking change for any class that does initialize an instance variable twice.

  • If we move only instance variable initializers that refer to parameters into the initializer list (like desugaring), it changes evaluation order between intance variables, which can be confusing.

  • We may specify an order that doesn't actually correspond to a desugaring, it makes "lowering" of the feature harder.

To actually make desugaring of a primary constructor into a non-primary constructor valid, we need to either specify an order that is backwards compatible, even if it's a little more complicated than today, or we need to change the general behavior to make order less relevant. (Not invisible, there are still side effects.)

Here are some possible generals initializing constructor execution orders, which should apply to all initializing constructors, including primary constructors, and the pros/cons of each.

Goals that I think we should aim for:

  • Desugarable. The specified behavior of a primary constructor should be achievable by a normal constructor.
  • Initializer expression order. If a class contains instance variable initializer expressions, those should be evaluated in source order.
  • Backwards compatible (within reason). Shouldn't break or invalidate existing code (unless it's code we consider bad).
  • Consistent. Should use the same steps for primary and non-primary constructors. It's not two different semantics.
  • Understandable. Shouldn't be too complicated for users to understand.
  • Efficiently implementable. As always.

Not certain that we can get all of them.

1: Maximal backwards compatibility - Keep current order.

(I proposed this in another issue.)

  • Bind actuals to formals
  • Evaluate all instance variable initializer expressions (in the initializer list scope if class has primary constructor, class scope if not), and initialize the variables.
  • Execute initializing formals
  • Then execute initializer list, etc.

Pros:

  • Backwards compatible: Behaves exactly the same as today for existing classes.
  • Consistent. Same for primary constructors and non-primary constructors.
  • Single initialization step for initializer expressions. Doesn't split the initialization step.

Cons:

  • Not desugaring compatible.
    • Either primary constructor desugaring moves all initializer expressions into the initializer list, but then it's an error to intialize a variable whose initializer expression doesn't refer to constructor parameters again (and even if not, it moves them past the initializing formals initialization).
    • Or they move only initializer expressions that refer to constructor parameters into the initializer list, but then it doesn't preserve the evaluation order.
      No semantics that allows a primary constructor to initialize twice is compatible with desugaring and preserves order. (Or at least, it'll be a non-trivial desugaring.)

2: Maximal Desugaring Compatibility - Simulate a desugaring

  • Bind actuals to formals.
  • Evaluate all instance variable initializer expressions that do not refer to constructor parameters, and initialize those variables.
  • Execute initializing formals.
  • Then evaluate and initialize all instance variables that do refer to constructor parameters.
    • It's a compile-time error if such a variable is also initialized by an initializing formal or initializer list entry
  • Then continue with initializer list, etc.

This respects the desugaring, effectively making instance variables that refer to constructor parameters into
initializer list entries, adding them at the head of the initializer list.

Pros:

  • Matches desugaring. Effectively makes instance variables that refer to constructor parameters into
    initializer list entries, adding them at the start of the initializer list. The compile-time error for double-initialization-by-constructor is retained as well.
  • Backwards compatible. There are no existing instance variable initializer expressions referring to constructor parameters, and nothing changes other than handling those.
  • Migration safety (primary). If you make the (only) existing initializing constructor into a primary constructor, that alone doesn't change the behavior of anything.
  • Migration safety (initializer). If you move an initializer list initialization onto an instance variable, you'll have to replace any existing expression explicitly. (And hopefully realize that the expression was already useless.)

Cons:

  • Evaluation order: Splits initializer expressions into two groups and evaluate them out of syntactic order, which may (will!) confuse users.
  • (With augmentations, if an augmenting class does not repeat the primary constructor, we could allow initializer expressions to run in a scope without those variables, and refer to outside declarations of the same name. But most likely, we should just consider the variables to be in scope, and make it an error to refer to them without repeating the primary constructor.)
  • Const asserts: No way to put asserts in front of initializer expression evaluation. In a const constructor, no way to put asserts into those (potentially constant) expressions.

A highly hypothetical class:

class Counters(int x) {
  static int _idCtr = 0;
  final _counterIdsStart = _idCtr;
  final fromX = MutableVariable<int>(x, id: _idCtr++);
  final fromZero = MutableVariable<int>(0, id: _idCtr++);
}

Here a reader would expect that fromX has id == _idCtr and fromZero has id == _idCtr + 1,
but the two are not evaluated in order because only one refer to a constructor parameter.

3: Make evaluation order irrelevant for primary constructors.

First make it a compile-time error for a primary constructor to initialize any instance variable twice.
That is: It's a compile-time error for a primary constructor to have an initializing formal or initializer list for an instance variable which has an initializer expression.

That's not restricting anything useful. If you had more than one initializing constructor, then some could overwrite an instance variable's initializer expression values, and some could choose not to. With a primary construtor, there is only one inintializing constructor. If that one overwrite the initializer expression value, that expression could exists only for its side effect, but most likely it's just a mistake, and removing it is the right thing to do. If you want a side effect, put it into the initializer list instead.

Then execution order becomes:

  • Bind actuals to formals.
  • Evaluate instance variable initializers in the class scope and initialize the variables.
  • Executing initializing formals
  • (If a primary constructor, swapping the two prior steps is unobservable, since they initialize non-overlapping sets of variables.)
  • Execute initializer list, etc.

Pros

  • Matches desugaring. Effectively makes all instance variables into initializer list entries, also those which don't refer to the constructor parameters.
  • Backwards compatible. There are no existing primary constructors.
  • Migration safety (primary). If you make the (only) existing initializing constructor into a primary constructor, that may introduce an error, but it's for an instance variable initializer whose value is always overwritten. Having to remove it should be considered a feature.
  • Migration safety (initializer). If you move an initializer list initialization onto an instance variable, you may change evaluation order, but nothing else.
  • Evaluation order: All initializer expressions are evaluated in source order, and in the same scope.

Cons

  • Not sure there are any. Winner!

4: Make evaluation order irrelevant for all constructors.

First change semantics of all initialzing constructors so that they do not evaluate initializer expressions of variables that the constructor would overwrite anyway, by an initializing formal or initializer list entry for the same variable.
This is technically breaking, if someone has an initializer expression with a side effect, and the class relies on overwriting that value.
Very hypthetical, and I don't mind forcing those people to rewrite the code into something actually understandable.
(Alternative is to always make it an error to initialize twice, but that is likely to be more breaking.)

Then execution order becomes:

  • Bind actuals to formals.
  • Evaluate instance variable initializers of variables not initialized by this constructor, and initialize those variables.
  • Execute initializing formals.
  • (Swapping the two prior steps is unobservable, since they initialize non-overlapping sets of variables.)
  • Execute initializer list, etc.

This preserves the resulting instance variable values of an existing class, even if swapping initializing formal and initializer expression order, by not evaluating initializer expressions that would overwrite the initializing formal's value.

Pros:

  • Avoids unnecessary computation. If an existing class has an initializer expression and also initializes the same field in a constructor, that constructor will waste time evaluating the expression before overwriting its value.
    • (Arguably, the author should just be told that they're doing something wasteful, so they can fix it themselves.)
  • Matches desugaring. Making all existing evaluated initializer expressions into initializer list entries preserves behavior.
  • Migration safety (primary). Making the (only) existing initializing constructor into a primary constructor changes nothing.
  • Migration safety (initializer). If you move an initializer list initialization onto an instance variable, you may change evaluation order, but nothing else.
  • Evaluation order: All initializer expressions are evaluated in source order, and in the same scope.

Cons:

  • Not backwards compatible. Technically a breaking change, even if it's unlikely to ever matter.
  • Makes field initialization be per-constructor. Currently a class can use the same code path to run initializer expressions
    of the class for every initializing constructor. With this change, each initializing constructor may need its own code path
    to run only some of the initializer expressions. (Probably rarely an issue, but the compiler complexity needs to be there
    for the cases where it would matters.)

TL;DR: I recommend number 3, disallowing double initialialization of instance variables in classes with a primary constructor, then specifying the same evaluation order as today.
(Aka: It's a compile-time error if a primary constructor initializes an instance variable with an initializing formal or initializer list entry, and that variable's declaration has an initializer expression. For non-primary constructors, that's still only an error if the variable is final.)

Then we can safely desugar instance variable initializers into initializer list entries at the start of the initializer list without changing observable behavior, because executing initializing formals cannot have side effects other than initializing variables.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureProposed language feature that solves one or more problemsprimary-constructorsFeature for less verbose constructors, otherwise known as declaring constructors.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions