Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Square bracket expressions can be ambiguous/confusing #21722

Open
bradcray opened this issue Mar 1, 2023 · 6 comments
Open

Square bracket expressions can be ambiguous/confusing #21722

bradcray opened this issue Mar 1, 2023 · 6 comments

Comments

@bradcray
Copy link
Member

bradcray commented Mar 1, 2023

In discussing #21633, one of the interesting cases that came up was the expression [1..n][1..n]. This is not an expression that any of us would be proud to write or consider exemplary Chapel style, but it is a legal expression, and one which has two possible interpretations:

  • [_ in 1..n] [1..n,]: That is, a square-bracket loop expression in which the index isn't needed, so can be omitted along with the in keyword, and where the body of the loop expression is a 1-element array of ranges
  • [1..n,][1..n]: That is, a 1-element array of ranges that is being sliced by the range 1..n (which would only be legal if n was <= 1, presumably, to avoid OOB errors)

At present, the compiler treats this expression more like the former, though its handling of it is inconsistent. A program like:

writeln([1..n][1..n]);

seems to work and behaves like writeln([i in 1..n] [1..n,]);. Yet a program like:

var A = [1..n] [1..n];

generates an error about the language not supporting arrays of arrays today (which I believe is very similar to issue #11039) and the futures:

  • test/arrays/indexing/arrLitIndexing.chpl
  • test/arrays/vass/literal-indexing.chpl

It was noted that if we encouraged/required 1-element array literals to have trailing commas, as implemented in #21337 before backing it out in #21633, this would break the ambiguity since the two cases would need to be written as [1..n][1..n,] and [1..n,][1..n], respectively. But while this is unambiguous, it doesn't seem to make the intention of the expressions particularly clearer. And the other downsides of requiring the trailing comma for one-element arrays (namely, the lack of need for it, and the divergence from what users coming from other languages like Python are likely to be accustomed to) seemed to outweigh the benefits, at least for us to not feel good about sticking with #21337 in the 1.30 timeframe.

Other techniques we discussed for clarifying or helping to resolve this ambiguity, which could be spun off into their own issues with their own champions if desired include:

  • We could require the in keyword for square bracket loops, with or without the index expression like i or _. The main downside to this is approach is that, today at least, array types and square-bracket loop expressions intentionally share the same syntax and parser rules to emphasize the relationship between "forall indices i in 1..n, evaluate this expression" and "forall indices in 1..n, store this element type." So we'd have to decide to either break that symmetry (and implement it) or to require changes to cases like var A: [1..n] int;.
  • We could have the compiler detect and complain about cases like this being poor style / in need of clarification (where the rewrites in the bullets at the top of this issue might be considered potential candidates for better clarity). This seems attractive, but some questions might be:
    • What is the level of effort for writing these warnings / determining which cases in the language deserve them?
    • Are users sufficiently likely to write patterns like this? (i.e., would the benefit be helping anyone given that most people don't necessarily even realize that [1..n] 10 is a legal expression)

Note that the choice of [1..n][1..n] in this issue is just a f'r'instance and that there are plenty of other cases that could be confusing, by putting anything iterable in the first part and anything that serves as a legal index into the second.

Also, note that with the change in #21633, [1..n][1..n] will warn by default for the second expression until rewritten as [1..n][1..n,] or the flag to squash the warning is thrown.

@e-kayrakli
Copy link
Contributor

I wanted to capture my current mental framework here mostly for posterity. I don't intend to push for any immediate changes today, and agree with Brad's summary above.

Today, [x..y] could mean too many things. It could be:

  1. part of an array type. e.g., var A: [x..y] int;
  2. a forall expression. e.g., var A = [x..y] 1;
  3. an indexing expression. e.g. var A = B[x..y];
  4. an array literal of ranges that are represented by literals. e.g. var A = [x..y].

One may argue that (1) and (2) are the same thing, but that interpretation is not very intuitive to me today. I believe that's the symmetry @bradcray is referring to above. While I am a big fan of (1) and (3), I am not very fond of (2) and (4). (4) seems like an easier target here as it is a bit of a wildcard. All the other cases are where [x..y] has a clearer meaning in context. And (4) is a more standalone case for this expression.

From a much more vague standpoint, the number of potential meanings scare me, because there might be more "uncool but legal" incantations like [1..10][1..10] that we aren't even aware of.

@bradcray
Copy link
Member Author

bradcray commented Mar 1, 2023

One may argue that (1) and (2) are the same thing, but that interpretation is not very intuitive to me today. I believe that's the symmetry @bradcray is referring to above.

I wouldn't say "same thing", but would agree that these forms were intentionally designed to use symmetric syntax (including the [i in x..y] form: var A: [i in x..y] [D[i]] int; and var A = [i in x..y] i; being examples). That said, if we wanted to consider making the in-less form of 2 be considered bad style and have the compiler warn about it by default (with the ability to squash), I'd be open to that (pending understanding the impact on existing code).

This probably won't make you any happier, but case 3 could arguably be viewed as a specialization of "any call expression". That is, I could write proc foo(x: int /* or range(?)*/) { ... } foo[x..y]; because Chapel treats parens and square brackets as being interchangeable in call expressions (and indexing expressions are just a syntactically special call expression).

Note that case 4 will generate a warning once #21633 is merged.

because there might be more "uncool but legal" incantations like [1..10][1..10] that we aren't even aware of.

I could tell you some of my favorite ones, but I'm afraid they would just terrify you... :D

@e-kayrakli
Copy link
Contributor

That said, if we wanted to consider making the in-less form of 2 be considered bad style and have the compiler warn about it by default (with the ability to squash), I'd be open to that (pending understanding the impact on existing code).

You know this already, but this is in alignment with the way I think about this. But much like other points here, not nearly the biggest fish that we need to fry today.

This probably won't make you any happier, but case 3 could arguably be viewed as a specialization of "any call expression". That is, I could write proc foo(x: int /* or range(?)*/) { ... } foo[x..y]; because Chapel treats parens and square brackets as being interchangeable in call expressions (and indexing expressions are just a syntactically special call expression).

OK, since you're the one who brought this up... :) I thought about this, but it lead to different ambiguities while I was thinking about it.

proc foo(r: range) { return "range arg foo"; }
proc foo { var A: [1..10] int; return A; }

writeln(foo[1..3]); // what do you expect to see?

First, this is tangential as it relates more to the equivalence of [] and (), but a problem nonetheless. Second, it is parsed as slicing the returned array today. Makes me wonder whether we break the tie intentionally towards interpreting [] as indexing expression as that's the more common way. Which would make "equivalence of [] and ()" an over-simplification. I can spin off an issue for this, unless it has been something discussed.

I could tell you some of my favorite ones, but I'm afraid they would just terrify you... :D

Don't jingle those keys in front of me, please.

@mppf
Copy link
Member

mppf commented Mar 1, 2023

proc foo(r: range) { return "range arg foo"; }
proc foo { var A: [1..10] int; return A; }

AFAIK, this should be a multiply-defined-symbol error.

@bradcray
Copy link
Member Author

bradcray commented Mar 1, 2023

Michael beat me to it by seconds! (edit: Though I'm remembering now that we don't generate the error as we should: #18177)

@mppf
Copy link
Member

mppf commented Mar 1, 2023

(Yes, and I am currently working on improving the multiply-defined symbol checking, for the dyno scope resolver).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants