Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support "auto-read" on objects? #18872

Open
bradcray opened this issue Dec 15, 2021 · 6 comments
Open

Support "auto-read" on objects? #18872

bradcray opened this issue Dec 15, 2021 · 6 comments

Comments

@bradcray
Copy link
Member

bradcray commented Dec 15, 2021

This issue is asking whether there should be a way for an object to say "When used in a value/read context, should there be a way for me to give a value other than myself?"

Background / Motivation

I'm spinning this issue off of #17999 (comment) where I reported on a team's ability to express some nontrivial data access patterns in Chapel by having an object that served as an intermediary for the real data value. As background and motivation, imagine creating an array implementation in Chapel that is stored on disk, or in some other medium where implementing the array's accessor function (proc dsiAccess(indices: ...) ref) is complicated by the inability to return a Chapel reference since the element isn't "in memory".

The approach being taken by this team is to:

  • create a Chapel object that acts as a stand-in for the unavailable memory location
  • return such an object through dsiAccess() instead of the element itself
  • do all such returns through non-ref versions of dsiAccess() since a newly-created local class/record can't be returned by reference

This is generally working in the current prototype, but has the downside that if the object returned by the dsiAccess() is in a RHS / read context, a method needs to explicitly be called on it in order to read the value. In the write context (the focus of #17999), the = operator can be overloaded for the object to make those work transparently. In the read context, the current prototype uses a 0-argument this() method to minimize the syntactic overhead that needs to be added. So for example, they can write things like:

myArrayOnDisk[i] = ...;  // no problem, calls `=` overload with object as LHS expression
... = myArrayOnDisk[i];  // problematic since it returns an object rather than an `eltType`
... = myArrayOnDisk[i]();  // the workaround: call a 0-argument `this()` method

This makes the prototype work but is obviously problematic in that (a) it makes uses of this array type different than any other which means that (b) existing generic array code can't be applied to these arrays.

This leads to the question in the title and top of the OP: Should there be a way for an object to say "When I'm referenced in a value/read context, call this method / take this action rather than referring to me directly?"

Possible Solutions and Other Contexts

Paren-less this method?

My first "so crazy it just might work" idea here was to permit objects to support paren-less variants of the this method that would indicate exactly this. E.g., if my record were:

record R {
  var wrappedVal: t;
}

then I could write:

proc R.this { return wrappedVal; }

which would cause ... = myR; to essentially become ... = myR.wrappedVal. From an interface perspective this seems attractive in its orthogonality to paren-ful this, though the challenge is when the compiler would apply it.

Utility for Atomics

My next thought was to realize that, when using atomics recently, I've been thinking about our long-term hope to have simpler ways of applying operations to atomics (#16238) as well as potentially supporting direct reads/writes (#16237). Namely, I find myself wanting these all the time, and particularly recently. This made me wonder how hard they would be to implement, where I mostly think "quite easy" except for the read case, which feels like another instance of this pattern: When I have an atomic in a read situation, I simply want it to evaluate to its value, through its .read() method. So maybe a solution to this issue would help move that forward as well.

Wait, what about Syncs?

Next, I realized that, up until very recently, we've supported direct reads of syncs, which are similarly implemented under the hood using objects, which made me wonder whether we could leverage that approach here as well. That said, while it worked, that approach always felt a bit clunky and heavyweight (and wasn't designed to be user-facing). Specifically, IIRC, we wrapped every "read" expression with a routine that became a no-op for all types other than syncs(?) (right? Typing that, it sounds so ridiculously heavyweight that I find myself doubting that I'm remembering correctly...).

User-defined coercions?

So next, I started thinking about cleaner ways of addressing this issue without relying on the sync trick and realized that the big challenges in the paren-less this case seems to be "When does one apply this?" E.g., "try to resolve a function, and if you fail, see whether any of the arguments have paren-less this methods, and if they do, see whether that makes things resolve better"? And once I thought of it that way, it made me realize that this is effectively exactly what the compiler's coercion logic does, and that perhaps user-defined coercions are the way of expressing this pattern (#5054).

I know there's been reluctance around opening the door to user-defined coercions in the past because of its potential for abuse, but patterns like this seem to necessitate it—or something so similar to it that it's indistinguishable—making me think we should reconsider that reluctance.

Other ideas?

That's where my current thinking is, but I'm curious about other ideas as well, obviously. This is quickly becoming the main barrier to proceeding with this work.

Other use cases?

  • At various times in the project's history, we've believed it would be somewhere between useful and reassuring to consider implementing basic scalar types like int as a Chapel record rather than a primitive type. For example, imagine:
record int {
  param bits: int = 64;  // must be 8, 16, 32, 64 to match current behavior
  var val: bitWidthToCIntType(bits);
}

Such an approach would also require the ability to "read" the object in order to get at the underlying C value.

@bradcray
Copy link
Member Author

Tagging @mppf and @aconsroe-hpe on this because of their involvement on #17999, @mppf because of his prior work on user-defined coercions, and @ronawho on it because of his interest (I believe) in direct reads-from/writes-to atomics.

@mppf
Copy link
Member

mppf commented Dec 15, 2021

In my opinion here, the main challenge here is, as you said, deciding when to do the "read".

If we have a type (call it wrapper) that can turn into another type (let's just say it is int for now), then we have to be able to, as programmers, predict where the wrapper type will turn in to the other type. Just some examples to get the mind going:

  // suppose myWrapper has type wrapper
  var a = myWrapper; // should y have type `wrapper` or `int`?

  var b: int = myWrapper; // "implicit conversion" to int?
  
  proc f(arg: int) { }
  f(myWrapper); // "implicit conversion" to int?
  
  ... are there other interesting cases here?

Other than the var a = myWrapper case, I think all of these fit neatly into an user-defined implicit conversions story. I think the var a = myWrapper is interesting to consider as well and is related to issue #14213.

And once I thought of it that way, it made me realize that this is effectively exactly what the compiler's coercion logic does, and that perhaps user-defined coercions are the way of expressing this pattern (#5054).

That is my view.

I know there's been reluctance around opening the door to user-defined coercions in the past because of its potential for abuse, but patterns like this seem to necessitate it—or something so similar to it that it's indistinguishable—making me think we should reconsider that reluctance.

IMO even those reluctant to add them acknowledge that some patterns are impossible to implement without them.

That's where my current thinking is, but I'm curious about other ideas as well, obviously

I'm curious if you have any reaction to the later comments in #17999 -- starting from #17999 (comment) . But that is probably for that issue.

@aconsroe-hpe
Copy link
Contributor

I'm intrigued and while I do find this related to implicit conversions because you are changing one type to another, I think I find it more related to the ideas of a lazy value or a future especially because we've pointed out the temporal and side-effecting aspect of this all.

I like the paren-less this because it is limited to only returning a single type (possibly more than one intent but only a single type (I think)). This makes the magic not too magic which is where a lot of the implicit conversion hesitation comes from.

I can imagine two different implementations for the arrayOnDisk example. I think they would both be possible with the paren-less this, but I'm just trying to wrap my head around how it would be to work with this.Will you get bitten by accidentally reading the array from disk/network a bunch of times? For an array, it seems you would want Impl 1 so that you only read once; but for atomics you'd want something more like Impl 2. Do you need error handling if your file system is down (would this throws?)

/* Impl 1 */
record ArrayOnDiskView {
  var filename;
  var bounds;
  var arr;
  var fetched = false;

  proc this {
    if fetched {
      return arr;
    } else {
      arr = readFromDisk(filename, bounds);
      fetched = true;
      return arr;
    }
  }
}

/* Impl 2 */
record ArrayOnDiskView {
  var filename;
  var bounds;

  proc this {
    return readFromDisk(filename, bounds);
  }
}

I definitely see the motivation in a generic programming context.

Would there be a way to explicitly call the paren-less this if you end up in a situation where you're passing this thing to an overloaded function

proc foo(x:ArrayOnDiskView) ...
proc foo(x:[])...

foo(myArrayOnDiskView) // ambiguous? can I force one or the other?

@mppf
Copy link
Member

mppf commented Dec 16, 2021

proc foo(x:ArrayOnDiskView) ...
proc foo(x:[])...

foo(myArrayOnDiskView) // ambiguous? can I force one or the other?

If we view it as an implicit conversion, the proc foo(x:ArrayOnDiskView) is preferred because it needs fewer conversions.

@bradcray
Copy link
Member Author

Would there be a way to explicitly call the paren-less this if you end up in a situation where you're passing this thing to an overloaded function

I think you could say myArrayOnDiskView.this.

That said, though it's cute, I think my main hesitation about taking the paren-less this approach is that it would prevent overloads of paren-ful this accessors. So I might not get caught up on the precise name / signature right now, and instead view this approach as "maybe there's a well-defined method that, if you support it, enables these conversions." When I think about other signatures, I also wonder whether we'd want its signature to match that of : for symmetry, since it's a similar operation, simply implicit rather than explicit. I keep trying to think of a variant of : as the operator name that would indicate the "automatic" / "implicit" aspect of the conversion. E.g., operator :: or :? or <: or ...

because it is limited to only returning a single type

That's an interesting point that makes me worry that it would be too limiting (so potentially another reason to avoid paren-less this as an approach). E.g., real(32) can coerce to either real(64) or complex(64) (two real(32) values, one real, one imaginary).

@mppf
Copy link
Member

mppf commented Dec 16, 2021

I might not get caught up on the precise name / signature right now, and instead view this approach as "maybe there's a well-defined method that, if you support it, enables these conversions."

The last effort I made towards language design of implicit conversions was #16729. Some of these questions we have already discussed there (and to some extent in #5054 -- #5054 (comment) being my favorite comment on that issue :) ).

Anyway I am still happy with the proposal in #16729 -- namely proc R.canImplicitlyConvertTo(type t) param which returns true if this can convert into t and false otherwise. Note that this can support generic type arguments passed to t and then allow the cast implementation to instantiate a version of the generic type. This conversion + instantiation case is tricky but IMO important (e.g. if you have a type that can implicitly convert to an array, it should work when passing to an argument like arg: [ ] which is generic -- this came up in one of our original motivating examples for implicit conversions -- Matrix -> 2D array).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants