Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior of task intents with distributed forall loops #19864

Open
twesterhout opened this issue May 23, 2022 · 7 comments
Open

Unexpected behavior of task intents with distributed forall loops #19864

twesterhout opened this issue May 23, 2022 · 7 comments

Comments

@twesterhout
Copy link
Contributor

Consider the following example:

use CyclicDist;

record R {
  var payload : int;

  proc init=(const ref other : R) {
    if other.locale == here then
      writeln("copy constructor called with a local object");
    else
      writeln("copy constructor called with a remote object");
    this.payload = other.payload;
  }
}

proc main() {
  const D = {0 ..# 4} dmapped Cyclic(startIdx=0);
  var xs : [D] int = [1, 2, 3, 4];
  var r = new R(123);

  writeln("Test #1");
  forall x in xs with (in r) {
    assert(r.payload == 123);
  }

  writeln("Test #2");
  forall x in xs {
    var localR = r;
    assert(localR.payload == 123);
  }
}

The idea here is to emulate the case when the user wants to implement their custom marshaling of objects. I expected tests one and two to behave very similarly (modulo the actual number of calls to the copy constructor), but they do not.
In the first test, the second branch in the constructor is never taken. The copy constructor is always invoked with a local object.
In the second test, however, the behavior is trivial: depending on the number of locales on which you run the test, up to 3 calls to the copy constructor will be made with remote data.

@bradcray provided some explanation on Gitter:

Without looking, my guess is that the copy being done for the task is happening locally and then being copied to a remote locale as a separate step. Most parallel iterators in Chapel have a structure of:

coforall loc in Locales do
  on loc do
    coforall tid in 0..<here.maxTaskPar do

So in a straightforward implementation, the in intent on the forall will translate into an in intent on the two coforall loops—but notably not the on-clause because (currently, at least) on-clauses do not support task intents (though this example suggests that perhaps they should).
Now, what I’m less certain about is why the inner coforall would think that the record it was copying was local—for example, does something else copy it across the on-clause so that it is local? Or is it copying the original record and mistaking it as local? Of these two, I’d guess the former. Unfortunately, the two people on the team who’d have the best chance of knowing this off the tops of their head are both out today.

@bradcray
Copy link
Member

@vasslitvinov / @mppf: My sense is that the two of you might have the most immediate grasp on what is going on here, though both of you are out today.

@mppf
Copy link
Member

mppf commented May 26, 2022

See also #15675 #15676. Our current expectation is that records can be moved across locales without notifying the record implementation in any way. (Unlike in C++, Chapel records can be moved without calling a move constructor or anything like that, and this happens all the time with the code generated by Chapel, such as when returning a record). In the future, however, there would be some way for a record to opt in to taking some action (like serialize/deserialize) in when moving/copying across locales. So, from that standpoint, this isn't necessarily a bug.

Note that we currently have chpl__serialize and chpl__deserialize methods that a type can implement to do custom marshalling / unmarshalling. For example here are the implementations of these for the string type:

proc chpl__serialize() {
var data : chpl__inPlaceBuffer;
if buffLen <= CHPL_SHORT_STRING_SIZE {
chpl_string_comm_get(chpl__getInPlaceBufferDataForWrite(data), locale_id, buff, buffLen);
}
return new __serializeHelper(buffLen, buff, buffSize, locale_id, data,
cachedNumCodepoints);
}
proc type chpl__deserialize(data) {
if data.locale_id != chpl_nodeID {
if data.buffLen <= CHPL_SHORT_STRING_SIZE {
return chpl_createStringWithNewBufferNV(
chpl__getInPlaceBufferData(data.shortData),
data.buffLen,
data.size,
data.cachedNumCodepoints);
} else {
var localBuff = bufferCopyRemote(data.locale_id, data.buff, data.buffLen);
return chpl_createStringWithOwnedBufferNV(localBuff,
data.buffLen,
data.size,
data.cachedNumCodepoints);
}
} else {
return chpl_createStringWithBorrowedBufferNV(data.buff,
data.buffLen,
data.size,
data.cachedNumCodepoints);
}
}

So far these are undocumented but I think that we are hoping to build upon the approach of them to resolve #15675 and #15676 (by making something similar that is user facing).

A different question is - in this particular case, does the compiler need to generate a "move" across locales before running the provided init=? In other words, how did r end up on the other locale, and if it was going to copy it anyway, why didn't it run init=? I think the way this behaves today is likely related to optimizations we have for arrays. But it might be possible for the user-defined records to do a full copy to the remote locale in this situation.

I.e., judging from your description of what you are seeing (I haven't run anything yet), for

  forall x in xs with (in r) {
    assert(r.payload == 123);
  }

the in intent seems to be behaving as:

  • move-initialize something on the target locale from the original record
  • copy-initialize to a per-task variable from that

But, it could be doing this instead:

  • copy-initialize something on the target locale from the original record
  • copy-initialize to a per-task variable from that

Arguably, it is not valid for the generated code to do the "move" above, because the original r still exists. So, in that way, I do think this is a bug.

@bradcray
Copy link
Member

@twesterhout : I always forget about chpl__[de]serialize. It would definitely be interesting to implement those for your record to see whether that would be a way to make progress with this pattern.

@twesterhout
Copy link
Contributor Author

@mppf & @bradcray sorry for the delay! Your explanation, Michael, was really helpful, but I'm still a bit confused about the semantics of chpl__[de]serialize. I've tried implementing these functions for my record, but the compiler seems to ignore them...

The following is a copy-paste of my actual code. Don't worry about the specific functions, the purpose is to illustrate the general idea:

    proc chpl__serialize() {
      logDebug("Calling chpl__serialize(" + this.locale:string + ") ...");
      return (this.locale.id, payload:c_void_ptr);
    }

    proc type chpl__deserialize(data) {
      const (loc, payload) = data;
      logDebug("Calling chpl__deserialize(" + data:string + ") ...");
      var json_string : string;
      on Locales[loc] {
        logDebug("To JSON ...");
        const c_str = ls_hs_basis_to_json(payload:c_ptr(ls_hs_basis));
        defer ls_hs_destroy_string(c_str);
        json_string = c_str:string;
      }
      return new Basis(json_string);
    }

What I'm trying to do is to persuade Chapel to use my custom ls_hs_basis_to_json and Basis.init (taking a JSON string) as functions for marshaling. It seems to work if I do:

var basis : Basis = ...;
forall x in <some distributed array> {
  const localBasis = basis;
  // use localBasis
}

Messages such as "Calling chpl__serialize" are displayed and the program behaves as expected. However, if I do

var basis : Basis = ...;
forall x in <some distributed array> with (in basis) {
  // use basis
}

chpl__[de]serialize functions are not invoked at all. Do you have an idea of why this is happening and how I should restructure my code to obtain the desired behavior?

@e-kayrakli
Copy link
Contributor

Does const in make a difference in the intent case above?

Remote value forwarding (RVF, the optimization that should fire serialize/deserialize) needs to know the thing that its forwarding is constant. There's no such guarantees with in intent. However, one can argue that we can expand RVF to handle non-const things as long as they come from an in-intent shadow variable.

@twesterhout
Copy link
Contributor Author

No, const in does not change the behavior. chpl__[de]serialize functions are still not invoked :(

@e-kayrakli
Copy link
Contributor

This still looks like an RVF limitation to me, and potentially an arbitrary one. But sometimes I'm too much of an optimist. I'm interested in looking into it, but can't promise much for next week :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants