-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for modestly redefining .indices
on arrays
#17883
Comments
.indices
on arrays.indices
on arrays
@gbtitus , @redhatturtle, @ben-albrecht , @e-kayrakli , @ronawho: You were all negative to skeptical to neutral on the proposal in #15103. Would you let me know whether this one resonates more positively with you? |
My objection in #15103 was entirely about getting rid of |
I think I am on the fence for this proposal. Whereas, like Greg, I was/am against retiring My main worry is around performance, which may not be the best criterion for deciding on interface matters. Brad already listed this as a disadvantage. But I'll add that I can add this bullet as another potential performance pitfall:
In general, I like to promote type-neutral ways of writing code, and this seems like a good example for this. But somehow it scares me that a user might write Another potential scenario that scares me is similar: for a single-locale application, the user uses All that being said, from the interface standpoint, I see the appeal of this proposal. One alternative would be to remove |
To Greg's points:
I agree with this, but don't worry about it a lot. Lots of queries aren't necessarily self-defining without reading the docs, and I think we can make the docs clear in this respect.
The downside of that is that it wouldn't be possible to write a generic routine across collections like this: proc printSerially(X) {
for i in X.indices do
write(X[i]);
} and have it work across arrays, lists, strings, tuples, etc. because if |
To Engin's:
I feel like there are a lot of things in Chapel already that can be this sort of multi-locale performance trap, so don't find this particularly compelling. We've talked about adding a
I find myself wanting to query the |
That could help, for sure. Although, I don't know whether we can do anything smart about it, and would worry that we'd have to report anything
I totally agree with this. As a dmap author, it is quite important. I am not so sure what an end user would want. While I am still OK with proceeding with this proposal; we can ask users, too. |
Thinking about this overnight, I'm realizing that in order to change the behavior of So my current thought is to deprecate it for this release for simplicity (maybe referring to these issues in the deprecation warning?). If nothing else, this removes the current redundancy and may cause users who have started using |
See chapel-lang#17883 (comment) for rationale --- Signed-off-by: Brad Chamberlain <bradcray@users.noreply.github.com>
So far, I've only run into one case where |
I'll add that |
That's good to know, thanks Ben. Do you have thoughts about this proposal in general (independent of how we get there?). If |
Your guess is correct. Cray HPO would be unaffected.
Like others, my biggest concern with the previous proposal was with deprecating |
OK, so then popping back to where I was before my comment above: As a proponent of this approach, if I were the only Chapel designer-developer, I'd propose that we update the behavior similar to what we did for .size on ranges, domains, and arrays:
I think the second best option (maybe the only one now?) would be to continue to make Advantages:
Disadvantages:
So then, of those who've weighed in on this, I think the questions are:
Thanks! |
I am in favor of what has been outlined above. It doesn't seem too disruptive to early adopters of indices and offers sufficient benefits. |
Nope. I am OK with this proposal. And prefer it over the status quo. |
My problems with #15103 were all about removing the The benefits of this proposal all seem quite reasonable and worthy. I also agree that while this new definition is useful the differences from The performance pitfall on the other hand worries me considerably less. While I can imagine people making this mistake I think it's reasonably easy to explain why it causes a performance problem (you 'lost' the array distribution so this loop running on a single locale) and having good examples on the documentation can alleviate the occurrence of these problems significantly.
I very much like this idea independently of whatever turns out from the rest of this proposal. |
I've posted asking for comments on this proposal here: https://chapel.discourse.group/t/design-indices-queries-on-arrays-local-representation-of-indices/6730 |
See PR #18274 for a draft of the proposal sketched out in #17883 (comment). In the end, I made the warning for all |
Start process of distinguishing array.indices from array.domain [reviewed by @e-kayrakli] In issue #17883, I proposed redefining `array.indices` to return/yield the indices of an array locally rather than simply being an alias for `array.domain` as it is today. This proposal was generally reviewed favorably and hasn't had any significant detractors. This PR starts taking the step of redefining it by creating a warning for current users of `array.indices`, informing them of the change, and suggesting that they either update to `array.domain` (or a copy thereof) if they want to preserve the current behavior, or to throw the `-sarrayIndicesAlwaysLocal` config param flag at compile-time if they want to opt into the new behavior now. This provides a gentle way to ease people into the new world without breaking their current code (in the event that it depends on getting a reference to the domain in question). It's expected that this change won't actually impact many users because `array.indices` was introduced relatively recently whereas `array.domain` is longstanding. The overall approach taken here was fairly simple: * added a warning to the current array.indices routine * added the config param and added a where clause to the current routine when it's false (the default) * created new overloads of array.indices for dense rectangular vs. irregular cases that are used when it's set to true * updated our internal and package module code, mason, and a few tests to use `array.domain` to preserve the current behavior; in one(?) case (test/types/collections/indices.compopts), threw the config param to opt into the new behavior One thing done here that almost certainly deserves more discussion is that for the irregular case, I used an iterator to yield the indices locally since—in general—if we have a distributed sparse or associative array, there's no guarantee that its domain's indices could be stored in a single local sparse/associative domain. So it seemed more appropriate to use an iterator for this case (where the user could then use that iterator to populate a local sparse/associative domain, or array, or whatever as they wish) rather than to presume a local storage format for the indices. However, I've forked off issue #18353 to discuss this further (or even question the use of a procedure returning a domain in the regular case). A few cases in this PR that perhaps deserve special mention: * the UnitTest.chpl code was one of the few places that was relying on using `.indices` across a multitude of types, but since it's a package module rather than a test, we wouldn't want it to generate a warning; meanwhile, the code itself seemed a bit questionable in that it assumed that if two things were being compared, they'd necessarily have the same indices (which might be true, but it wasn't obvious to me why it would be). So instead of splitting this code to handle arrays and other types distinctly, I switched it to use a zippered loop instead (which could have performance implications, but since it's currently used a serial loop for the comparison anyway, I'm guessing performance isn't the biggest concern here). The one other change I needed to be made is that the code to print out an array when there is an issue needed to be updated to use `.domain`, but could not apply `.domain` to other types, so I switched the string (!) argument that indicates what type is being compared to a param to make it safe. * test/types/collections/arrIndicesAreLoc.chpl: We used to have an `arrIndicesAreDist.chpl` test which make sure that querying `.indices` on an array would return a distributed result. Since this is no longer true, I renamed the test and expanded it to make sure that the indices received back are local to wherever we're running. * test/types/collections/indices.good: Because I've changed `.indices` on sparse domains to yield indices, the output format changes * test/studies/shootout/reverse-complement/bradc/revcomp-dowhile-bug.chpl: Here, I could've changed from `buf.indices` to `buf.domain` to avoid the warning, but the domain is declared just a few statements above the use-case, so it seemed simpler just to use that.
@vasslitvinov - I see in your range spreadsheet and in the 2.0 scheduling issue that we think this is below the bar and so should just get marked unstable. Do you or @ShreyasKhandekar have a task for doing that already? |
I actually think this could be considered stable and potentially even closed. I believe we did end up agreeing on and implementing the proposal in the OP for rectangular domains which led to #18274. There's a bit of a question in #18353 about behavior on sparse and associative, and I'm imagining I left this open due to that. But since they're unstable for 2.0 and that issue has its own issue now, I don't know that there's more to do here for 2.0. |
That'd be great! I just came across #17909, which looks related. If we remove the 2.0 label from this issue, would we close that one? |
I am game stabilizing the current behavior and closing this issue. |
As of PR #15039 or so, we added
.indices
queries to most indexable types as a means of writing code for them that refers to their indices abstractly rather than assuming the indices are, say,1..size
or0..<size
. At that time,.indices
became a near-synonym* for.domain
for arrays. Issue #15103 proposed that we retire.domain
in favor of.indices
on arrays, but was received fairly negatively.[* = a near synyonym? how so? Well, it actually returned a copy of the original domain rather than an alias to the original domain. Note that I didn't realize this when originally filing this issue in June, and only became aware when finalizing #18274 ].
[For historical purposes, this issue captures a fork in the discussion of #15103 that started at this point ]
This issue proposes another way to remove the redundancy: What if, rather than returning the array's actual domain, this query returned (or perhaps yielded in some cases?) the array's index set locally. For example, for an n x n Block-distributed array named
B
,B.domain
would return a (const
) reference toB
's actual Block-distributed domain as it does today, whereasB.indices
would return{1..n, 1..n}
(or{0..<n, 0..<n}
or whatever indicesB.domain
had) as a local default rectangular domain.This would have the benefits of:
.domain
and.indices
do the same thingB
's index set without getting a block-distributed thing back or querying on a per-dimension basis—the only two ways we can do so today I believe.An obvious disadvantage would be:
.indices
vs..domain
are similar, so learning to distinguish them is subtle and could have a significant performance impact when making the wrong choice (e.g.,var A: [B.indices] real;
vs.var A: [B.domain] real;
orforall i in B.indices
vs.forall i in B.domain
).One open question is whether
.indices
would have to return the indices in a closed form (say as a sparse domain for a sparse array or an associative domain for an associative array) or whether it would be reasonable for it to be implemented in a closed form in some cases (say, when the storage required is O(1), as for dense rectangular arrays) but as an iterator in others (say, when it's not, as for sparse / associative arrays). See #18353 for more on this topic.The text was updated successfully, but these errors were encountered: