Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement noinit for arrays #15703

Open
bradcray opened this issue May 19, 2020 · 9 comments
Open

Implement noinit for arrays #15703

bradcray opened this issue May 19, 2020 · 9 comments

Comments

@bradcray
Copy link
Member

bradcray commented May 19, 2020

As a Chapel programmer who is sometimes more concerned with performance than safety, I'd like the ability to declare my arrays to be noinit to avoid the overhead required to initialize them in cases when I know that I will safely write to them before reading from them (this is arguably a specialized instance of issue #8792, particularly as arrays become less and less special).

For example, I'd like to be able to do:

var D = {1..10};
var A: [D] real = noinit;  // A's elements won't be initialized here
forall a in A do
  a = 0.0;
D = {1..20};  // the new elements, A[11..20], would not be initialized here
forall i in 11.20 do
  a = i / 10.0;
@ronawho
Copy link
Contributor

ronawho commented May 19, 2020

This is probably mentioned in some other thread, but note that in configurations where we do dynamic registration (ugni) we'll still want noinit to at least fault in the memory to get good numa affinity. Today we do:

  • allocate memory from the comm layer
  • have the module code to do parallel init (first touch)
  • register memory in the comm layer

If we don't do any init we'll end up with memory registration faulting in memory, which leaves everything on numa domain 0. So we'll probably want to touch the first byte in each page to fault in still.

edit: yeah, Greg mentioned this in #8792 (comment)

@mppf
Copy link
Member

mppf commented Jul 12, 2020

For example, I'd like to be able to do:

var D = {1..10};
var A: [D] real = noinit;  // A's elements won't be initialized here
forall a in A do
  a = 0.0;

This part makes sense to me.

D = {1..20};  // the new elements, A[11..20], would not be initialized here
forall i in 11.20 do
  a = i / 10.0;

I am not so sure about this part. It would mean that the array knows (at runtime) that it was initialized with noinit and then the D = {1..20} command would need to be aware of that. What is worrying to me about this is that the noinit at the array declaration seems to have more to do with how it is initialized than how it should behave arbitrarily long into the future. What if there is a whole lot of code / time passing between the array declaration and the domain assignment?

The second concern that I have with it is that generally speaking we would like to know at compile time whether or not something is default initialized. But, I wouldn't want to store that information in the array type - because arrays should generally behave the same / be passable to the same functions etc - regardless of how they were initialized. As a result, the way the array would know at runtime that it was initialized with noinit would be some sort of non-param field in the array implementation. But then, in the code supporting D = {1..20} using this field, it would not know at compile-time whether or not the new elements should be default-initialized.

Should code doing this kind of thing use a different type (like list)? Or, could DefaultRectangular arrays support a function to resize the array and its domain at the same time? Such a function could just have an argument about whether to initialize the elements. E.g.

var D = {1..10};
var A: [D] real = noinit;  // A's elements won't be initialized here
A.resizeArrayAndDomain(D, {1..20}, initNewElts=false);


// A.resizeArrayAndDomain might be declared like this
// MyDomain must match (pointer-identity wise) the array's domain
proc _array.resizeArrayAndDomain(ref MyDomain, NewDomain, param initNewElts=true)

@mppf
Copy link
Member

mppf commented Jul 12, 2020

Copied from #15673 (comment) -

  • When no-initing an array of non-POD records, the array "noinit" function will not take any action on the individual elements; it'll just allocate memory and leave it uninitialized. The user will call an explicit, low level "move initialize" function, to move a record into the noinited space for an element. If it is known to an array of POD types, the user can use = to initialize elements.

The reason that I am proposing this direction is that, without it, the noinit support cannot help to solve issue #15673. In particular, a data structure author might want to use a noinited array to implement part of the data structure, but the data structure author would probably not want using noinit to also add the requirement that the elements of the data structure support noinit. For the array case specifically, the noinit patterns come up in situations of move initialization, and we wouldn't want move initialization to fail in some way for arrays of records that don't support noinit. (Where the ways it might fail would be compilation errors "this type doesn't support noinit" or default initialize elements).

Note that the direction of leaving the allocated array memory totally uninitialized leads us to having 2 different things:

  1. variables declared like var x: SomeType = noinit; - these are opting in to "less initialization"
  2. storage that is entirely uninitialized (such as array elements after the array is no-inited).

Now, I would expect that how one operates on these would differ in general. For variables declared with noinit, the operations supported would need to be described by the type author. In particular, something like var x: MyType = noinit; x = y; may or may not be allowed.

In contrast, for storage that is uninitialized, I am proposing that the user explicitly use low-level moveInitialize functions that could always be used to initialize such memory. For POD types, it should also be possible to use = to initialize the memory (which is what the example in the issue description does, since int is POD).

Another way to put it is that a pattern like var x: SomeType = noinit; moveInitialize(x, something); would lead to a memory leak. This pattern confuses two different things - uninitialized memory and noinited memory.

@mppf
Copy link
Member

mppf commented Jul 12, 2020

To address #8792 (comment) and #15703 (comment) - I am proposing that the array implementation simply have a function that can be called to indicate that initialization is complete.

For example:

{  var A:[1..n] R = noinit;

  for i in 1..n {
    moveInitialize(A[i], new R(i));
  }

  A.elementInitializationComplete();
  // above call does any memory registration that is necessary.
  // Additionally, indicates that now when A is out of scope, it
  // should deinitialize the individual elements.

  // R(1) and R(2) are deinitialized here, then array memory freed
}

If the call is not made, the user is responsible for explicitly deinitializing the elements. For example, in this case, the array is created and used without ever actually initializing all elements.

{
  var A:[1..2] R = noinit;

  moveInitialize(A[2], new R(2));
  explicitDeinit(A[2]);

  // here array memory is freed but no R elements are deinitialized
}

We could consider having 2 different calls (for "register the memory now" and for "mark all elements initialized") but as far as I know, typical usage will initialize the entire array. Anyway the 2 different calls could be added later if needed.

@mppf
Copy link
Member

mppf commented Jul 12, 2020

While I understand that we have been reserving the noinit keyword for this, I don't think it's particularly important that this pattern be short in the code. In particular, we could just have a different function for constructing the array without initializing the elements.

What if, instead of writing:

var D = {1..10};
var A: [D] real = noinit;  // A's elements won't be initialized here

we wrote:

var D = {1..10};
D.buildArray(int, initElts=false);

(This exact pattern happens to compile and run now).

Arguably this would reduce confusion with the difference between a noinited variable and totally uninitialized memory (discussed in #15703 (comment) )

@bradcray
Copy link
Member Author

I am not so sure about this part [having new elements in later array reallocations also be no-inited]

My motivation for wanting this behavior was to have a simple way to avoid spending time initializing array elements when growing an array in cases where I knew that I didn't need the new elements to be initialized and didn't want to spend the time initializing them. Having the noinit become part of the array's ongoing property seemed like a natural way to get it, and the behavior also didn't seem particularly surprising to me. But I could also argue against it pretty easily if motivated to (e.g., "= noinit; seems to be saying something about the initial value of the array at its declaration point, not for all time").

But then, in the code supporting D = {1..20} using this field, it would not know at compile-time whether or not the new elements should be default-initialized.

I'm not seeing why this would need to be known at compile-time. I was imagining the array descriptor would store a non-param field indicating whether or not it was noinit, and then passing that flag along to the various ddata allocation/reallocation routines to say whether or not to default initialize any new elements.

but the data structure author would probably not want using noinit to also add the requirement that the elements of the data structure support noinit.

Could the presence of a noinitializer mean "call it since it's there" and the absence of it mean "there's no need to call it for this data structure (e.g., "I am a POD record and don't need anything special to be done to me to support noinit"?)

I don't think it's particularly important that this pattern be short in the code.

I can see that argument, though there are cases where I'd like to use noinit (like the shootout codes) where I probably wouldn't use the longer form (because I think it's less attractive). We've also had DOE users request noinit from the early days, where I think having a solution that feels natural / integrated into the language seems preferable if it's tractable (e.g., I believe some users proposed things equivalent to the = noinit; syntax without knowing that we were already considering it).

All that said, I'm not opposed to having longer-form / less-sugared ways of creating arrays, and at times have definitely wanted them for other reasons. I'm just not convinced that we should support this form instead of noinit or as the only way of getting to noinit.

Arguably this would reduce confusion with the difference between a noinited variable and totally uninitialized memory

I understand the difference between those two behaviors, but don't understand how the different syntactic form reduces the confusion. In both cases, isn't the user essentially choosing between two behaviors, simply using syntax in one case and a boolean argument in the other?

@mppf
Copy link
Member

mppf commented Jul 15, 2020

@bradcray - thanks for your thoughts.

I understand the difference between those two behaviors, but don't understand how the different syntactic form reduces the confusion. In both cases, isn't the user essentially choosing between two behaviors, simply using syntax in one case and a boolean argument in the other?

That's right. The potential for confusion I was seeing is one of terminology. If noinit and noinited mean It ran the proc noinit() to do less initialization then that might be confused with the English meaning of "no init" which is "not initialized" - i.e. totally uninitialized memory - which is more what the array elements would be (in the current designs I have been exploring, anyway). Since someone using noinit on an array would have to simultaneously be aware of both concepts, I am worried that the term noinit will contribute to confusion in a way that initElements=false would not.

but the data structure author would probably not want using noinit to also add the requirement that the elements of the data structure support noinit.

Could the presence of a noinitializer mean "call it since it's there" and the absence of it mean "there's no need to call it for this data structure (e.g., "I am a POD record and don't need anything special to be done to me to support noinit"?)

It might be the case that noinit needs to be a different feature from creating arrays that are totally uninitialized to solve issue #15673. However I am inclined to think that issue #15673 is a good motivating example for noinit. Anyway, for move initializing one array from another, we need the new array to allocate space for the elements but not initialize them at all - because the elements will be move initialized from the old array. So in that case I don't see how calling proc noinit() would help.

More broadly though, calling proc noinit() on the array elements for a noinit array is something that has implications for how the elements are set after the array is noinited. In the totally uninitialized array elements design, these elements are set by something like the call to moveInitialize shown in previous comments. Here the LHS is entirely replaced. In contrast, in a design where the array elements themselves are noinited, we would expect to set them with a call to =, which generally speaking will both read and write fields in the LHS.

Now, if for certain types, we call proc noinit() on the elements and use = to initialize; but for other types we call moveInitialize; doesn't that make it hard to write generic code? (Yes, we could make moveInitialize call = for types that provide proc noinit() - but that seems to me to be really complicating the behavior and I don't understand what we would gain from it. #8792 (comment) explores this kind of question a bit more)

@mppf
Copy link
Member

mppf commented Jul 30, 2020

PR #16064 makes some progress here by enabling no-init for arrays of non-POD only.

This is probably not enough as we have users requesting a way to more easily initialize an array of non-nilable class values. I personally think that supporting owned C as well as any other non-POD record (or arrays of arrays) is not particularly hard if we use the moveInitialize approach.

mppf added a commit that referenced this issue Jul 30, 2020
Adjust domain map implementations to no longer require pragma

For issue #15673.
Also for issues #15703 and #8792.

This PR:
 * adjusts the domain map interface to include
   `dsiElementDeinitializationComplete`
 * removes `pragma "no auto destroy"` from domain map implementations
 * adds a user-facing capability to `noinit` arrays of POD elements

The goal of this PR is to resolve issue #15673 by adjusting the domain
map implementations to no longer require the use of a `pragma "no auto
destroy"`. Originally I was expecting to do that with a `noinit`
syntactic construction but it turns out that `noinit` is not strictly
necessary in this case since the `buildArray(..., initElts=false)`
pattern already covers it. Either way, the array in question has memory
allocated for elements but the elements are totally uninitialized.

In particular, to remove the need for `pragma "no auto destroy"`, this PR
adds a `deinitElts:bool` field to DefaultRectangular to indicate if the
elements should be deinitialized when the array is destroyed. It adjusts
`dsiElementInitializationComplete` to set `deinitElts` to `true`. It adds
`dsiElementDeinitializationComplete` to set `deinitElts` to `false` (and
also allow similar behavior for other array types). It adjusts
`dsiDestroyArr` for to check `this.deinitElts`.

Even though `noinit` is not used in the distribution implementations,
this PR implements noinit for arrays of POD elements. For example,

``` chapel
var A:[1..10] int = noinit; // allocates space for elements but does not initialize them
for i in 1..10 {
  A[i] = i; // `=` is sufficient for initializing trivially copyable types
}
```

Users should also indicate when (if ever) the array's elements are
completely initialized so that the memory can be registered to support
communication. That is the subject of issue #16173.

In order to easily test no-init of arrays, this PR adds a flag
`--allow-noinit-array-not-pod` that will allow noinit of arrays of
non-POD elements. It uses this in several tests along with a version of a
low-level move module added to the test system. Issue #16172 discusses
the design of a user-facing low-level move module.

Reviewed by @vasslitvinov - thanks!

- [x] full local futures testing
- [x] primers pass with verify/valgrind and do not leak
@mppf
Copy link
Member

mppf commented Oct 1, 2020

// the new elements, A[11..20], would not be initialized here

PR #16525 adds a future for this case which does not work as this issue requests yet. I imagine it won't be too hard to adjust; the array resizing code can check to see if the array has been initialized yet.

mppf added a commit that referenced this issue Oct 1, 2020
Add noinit technote

Follow-up to PR #16064.

The 1.23 release includes a very basic and restricted version of `noinit`
because we were able to agree on the behavior of that case.

The `noinit` feature is discussed in #15703 and #8792. The future work is
discussed in #16172, #16173.

Reviewed by @lydia-duncan - thanks!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants