Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: init/deinit annotations for containers of non-nilables #14273

Open
BryantLam opened this issue Oct 11, 2019 · 8 comments
Open

Request: init/deinit annotations for containers of non-nilables #14273

BryantLam opened this issue Oct 11, 2019 · 8 comments

Comments

@BryantLam
Copy link

BryantLam commented Oct 11, 2019

Related #14271 #14104 #8792
Feature request. Improvement. (This is more of a brainstorm than a concrete proposal.)

Provide a means to annotate locations of runtime-initializing a container of non-nilable types that the compiler cannot prove is initialized at compile-time.

Today, this code works when it arguably shouldn't. The compiler ignores the non-nilability of the element type.

var x: [0..999] owned MyClass;

// Initialize `x`, possibly in some complicated way
// that may depend on other arrays of non-nilables.
x[0] = new owned MyClass();
x[1] = new owned MyClass();
x[2] = new owned MyClass();
// ... Whoops. User should have fully initialized `x`.

for i in x.domain {
  writeln((i, x[i].borrow())); // halt at x[3] - argument to ! is nil
}

I propose that this idiom could continue working in a prototype module.

I propose that in a normal, production module, the code needs more annotations because it becomes difficult to reason about where something is broken.

var x: [0..999] owned MyClass = noinit;
var y: [0..999] owned MyClass = noinit;

// At this point, `x` and `y` are `noinit`. Using these arrays is illegal.

// We introduce the `init` block. This annotation tells the compiler to
// trust that the user has fully initialize `x` and `y` with non-nilable
// element types.
init(x, y) {
  // `x` and `y` could have an intermixed, complicated init order
  x[0] = new owned MyClass();
  x[1] = new owned MyClass();
  x[2] = new owned MyClass();
  ...
  y[0] = new owned MyClass();
  y[1] ...
}

// This code will halt or be UB if user doesn't fully initialize `x`.
for i in x.domain {
  writeln((i, x[i].borrow()));
}

// Modify `x` in some way, but we can't use assignment anymore because
// it's ambiguous with initialization and the compiler needs a way to
// detect between initialization and assignment. So we swap.
{
  var newElement = new owned MyClass();
  x[0] <=> newElement;
}

// The `deinit` block is implicitly called at end of scope,
// similar to a `defer` block.
deinit(x, y) {
  // deinit also needed because we might need to deinit the container in a
  // particular way that requires setting some elements to `nil` while we're
  // processing the deinit.
  x[100] = nil;
  y[1] = nil;
  ...
}
// at end of deinit block:
// (a) deinit the rest, or
// (b) do nothing and possibly leak

The overall goal is to allow the compiler to do more compile-time analysis of the validity of containers with non-nilable elements. If the user forgets to init the array x in its entirety, the compiler could treat it as undefined behavior. If the code breaks, there's fewer places to look for why it broke (namely, the init blocks and possibly the deinit blocks).

@mppf
Copy link
Member

mppf commented Oct 14, 2019

I would expect that in the near term, var x: [0..999] owned MyClass; will become an error, and you'd have to write var x: [0..999] owned MyClass = f();.

@BryantLam
Copy link
Author

BryantLam commented Oct 14, 2019

That's a reasonable approach for initializing one array, though one case I'm interested in resolving is when you have to initialize more than one non-nilable array at the same time in some dependent way. It isn't obvious how to do that with a function return value.

One workaround today is to create them as nilable arrays, do the initialization/assignment, and then copy them into non-nilable arrays (2x storage overhead) with e.g.,
var x_: [0..999] owned MyClass = for x_i in x do (x_i :owned MyClass);.

@mppf
Copy link
Member

mppf commented Oct 14, 2019

@BryantLam - can you show a better motivating example with two arrays that need to be initialized in some dependent way?

One workaround today is to create them as nilable arrays, do the initialization/assignment, and then copy them into non-nilable arrays (2x storage overhead)

The 2x storage overhead would only need to be during initialization, right?

@BryantLam
Copy link
Author

BryantLam commented Oct 14, 2019

The 2x storage overhead would only need to be during initialization, right?

Correct. This could be resolved by the expiring-values design #13704, but is still a concern if one array is approx. the size of a compute node's memory capacity. The second array couldn't be allocated unless #13704 was able to elide the non-nilable array's allocation with a move from the nilable array (this doesn't sound trivial to me even with #13704, but maybe it is).

@BryantLam
Copy link
Author

BryantLam commented Oct 14, 2019

Here's a contrived example that I came up with that seems relevant for compilers/interpreters:

class Token {
    var firstName: string;
    var lastName: string;
}

class FirstName {
    var name: string;
}

class LastName {
    var name: string;
}

class Database {
    var D;
    var firstNames: [D] owned FirstName;
    var lastNames: [D] owned LastName;

    proc init(in tokens: [?D] owned Token) {
        this.D = D;
        this.complete();

        for (token, idx) in zip(tokens, tokens.domain) {
            firstNames[idx] = new owned FirstName(token.firstName);
            lastNames[idx] = new owned LastName(token.lastName);
        }
    }
}

proc main() {
    var tokens = [ new Token("John", "Smith"), new Token("Bryant", "Lam") ];
    var database = new Database(tokens);
    writeln(database);
}

Despite the identifier names that I chose, it effectively does an array-of-structs to struct-of-arrays conversion.

@BryantLam
Copy link
Author

BryantLam commented Oct 17, 2019

Here's the closest Rust implementation.

It's quite ugly compared to the Chapel code. This Rust code uses the `generic-array` crate for a statically sized array because Rust doesn't have stable support for generic constants. A similar implementation using `std::vec::Vec` is *significantly* easier. For what it's worth, this code demonstrates a non-trivial use of `generic-array` traits.

In Chapel, a non-nilable class doesn't have a default value. One way that Rust handles the code above is to implement the Default trait on a custom type to give it a default value. The default value is then used in the initializing expression (... = Default::default()).

It's not always ideal if the arrays are large because all the elements are initialized with a default value only to be immediately replaced when the user starts initializing with their own values. Nevertheless, this is safer and doesn't use 2x memory. For performance, there's the unsafe MaybeUninit trait that would be notionally equivalent to Chapel's noinit #8792.

@BryantLam
Copy link
Author

Cross-linking #14362 (comment)

@mppf
Copy link
Member

mppf commented Aug 18, 2020

I've made #16250 to discuss pros/cons for different approaches to this general problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants