Wrong error message in CFE when parsing `>?..` #47060

sgrekhov · 2021-08-31T15:51:31Z

The following test fails on CFE

void f(x, [y]) {}

main() {
  int a = 1;
  int b = 2;
  int c = 3;
  f(a<b, // fails here
      c>?..toString());
//      ^^^
// [analyzer] unspecified
// [cfe] unspecified
}

Error message is the following

- Unexpected error at line 45, column 4: Too many positional arguments: 2 allowed, but 3 found.

Why parser founds 3 positional arguments here when we have only one comma here?

Tested on Dart SDK version: 2.15.0-edge.8f9113d9f1cb11400c384a4ac68fc050636cf573 (be) (Tue Aug 31 00:10:08 2021 +0000) on "linux_x64"

The text was updated successfully, but these errors were encountered:

sgrekhov · 2021-09-03T09:40:43Z

Seems that the following failure caused by the same issue

void f(x, [y]) {}

main() {
  int a = 1;
  int b = 2;
  int c = 3;

  f(a<b,
      c> as int);
//       ^^
// [analyzer] unspecified
// [cfe] unspecified
}

Output is

- Unexpected error at line 47, column 4: Too many positional arguments: 2 allowed, but 3 found.

jensjoha · 2021-09-08T10:56:36Z

As for

f(a<b, c>?..toString());

it should - seen in regards to the Constructor-tear-offs feature in isolation - be parsed as

f((a<b), (c>?..toString()));

as the test also says. c>?..toString() doesn't parse though, so we go into recovery.
Ultimative it's recovered as something like

f((a<b), (c>*synthetic*), (toString()));

which is why the analyzer and/or CFE complains about 3 arguments vs 2 expected.
The parser itself - at least when I run it - gives these errors:

tmp.dart:7:12: Expected an identifier, but got '?..'.
  f(a<b, c>?..toString());
           ^^^

tmp.dart:7:15: Expected ',' before this.
  f(a<b, c>?..toString());
              ^^^^^^^^

The second example

f(a<b, c> as int);

is similar.
Seen in isolation it should be parsed as

f((a<b), (c> as int));

but c> as int doesn't make sense, it goes into recovery and recovers as

f((a<b), (c> as), (int));

The parser itself gives this error:

tmp.dart:8:16: Expected ',' before this.
  f(a<b, c> as int);
               ^^^

I'll say this is working as expected.

Johnni, Brian, Erik --- any comments on this? (/cc @johnniwinther @bwilkerson @eernstg )

Cf. also #47045.

eernstg · 2021-09-08T12:33:00Z

As far as I can see, the behavior of the parser is fine: We don't require anything specific about the nature of the recovery, and hence we can't complain because recovery introduces an extra comma.

Pragmatically, it may or may not be the best choice (isn't it just creating extra confusion to invent a comma? --- and is it likely that the real error is that the developer forgot a comma?), but it is certainly not obvious to me how the recovery could proceed if it doesn't make an attempt to reduce the big, wrong expression to a number of smaller (wrong or correct) expressions, so it makes a lot of sense to introduce that comma.

However, this would imply that a syntax error test should in general use a function declaring many optional positional parameters (void f(a, [b, c, d, e, f, g, h])) {}), such that the syntax error occurs on its own, rather than being paired up with a non-syntax compile-time error that complains about the number of arguments. It doesn't help anybody if we have that error as well as the "real" error about the syntax.

Tools like dart (or whatever it was that produced the error message mentioned here) should definitely report the syntax error, mentioning that ?.. cannot be parsed at that position. As far as I can see, this is also how the CFE behaves (looking at the example in dart-pad).

So, @jensjoha, unless you can see a way to reduce the confusion caused by unexpected extra arguments from parser recovery, I believe that you could close this issue.

@sgrekhov, it would be great if you could create a co19 issue to adjust the test such that it will tolerate some additional arguments.

bwilkerson · 2021-09-08T14:49:43Z

As I'm sure you're aware, I have strong opinions when it comes to recovery. :-) In large part that's because recovery has a huge impact on the UX.

The guiding principle for recovery should be that of least surprise. That is, the best recovery is to interpret the code the way the user is most likely to interpret it and/or intended it to be. Given something like the first example:

f(a<b, c>?..toString());

I expect that most users would interpret that to be an invocation of f with a single argument, where the argument is an invocation of toString that uses the ?.. operator and where the receiver of the invocation is a<b, c>.

In addition to being closest to what a user would expect, it has the advantage that we wouldn't need to add or remove any tokens in order to recover that way.

Unfortunately, unless we make it possible for the parser to backtrack as part of recovery (which I would love to see, but don't expect), it's probably too late to parse it that way by the time the parser has figured out that it needs to recover. (I'd love to find out that one or both of those isn't true.)

It doesn't help anybody if we have that error as well as the "real" error about the syntax.

I'd actually argue that that depends on the nature of the test. For the purposes of the co19 tests I suspect that you're right. But I do think it's worthwhile having tests that capture the behavior of recovery because I think that they help us improve it over time.

eernstg · 2021-09-08T20:46:49Z

As I'm sure you're aware, I have strong opinions when it comes to recovery. :-)

That's well justified, that's a topic where I'll just try to understand what's going on. ;-)

The difficulty here is that we have specified how to disambiguate terms where < and > are relational operators from the terms where they are the angle brackets at both ends of <typeArguments>, and the type arguments are only considered when the token after the > belongs to a very short list (stopToken and continuationToken here). In all other cases the < and > are parsed as relational operators.

The main reasons for adopting this kind of rule is that (1) we wish to avoid gobbling up too much syntactic language design space, and (2) terms like List<int> and myFunction<String> are rare enough to justify that they must occur in a limited set of syntactic contexts (in any other syntactic context they can be parenthesized).

So, following these rules, the parser has already decided that > is a relational operator way back when it looked at <, which makes it inconvenient to view a<b, c> as a single term.

However, the parser is free to have additional rules (that may need to change if we ever introduce an expression that starts with +, or .., and a bunch of other tokens that don't turn <...> into a <typeArguments>). For instance, with the current language it is guaranteed that '>' '?..' is a syntax error: > is a relational operator, and no expression starts with '?..'.

This means that it might be possible for the parser to lump all the tokens of a<b, c> together into a "wrong stuff" basket which is treated as an expression whose type is an ErrorType, and then ?.. could be treated as the connector of a conditional cascade (which is the only job done by that token).

I don't know whether it would give better results, though, or even how to measure such things...

bwilkerson · 2021-09-08T21:25:43Z

... we have specified how to disambiguate terms ...

Yes, I understand and agree with the rules for the disambiguation, but they only apply to valid code.

This means that it might be possible ...

I like that suggestion. I think the effect of doing that would be equivalent to what I was talking about in terms of backtracking. The way I was thinking of it was that the parser should use the disambiguation rules to determine what's going on, but when an error is found (prior to some point, such as the end of the expression), then it should back up and try making the other choice in spite of the disambiguation rules to see whether that yields better results.

The big problem with backtracking in the current parser is that we've generally already invoked methods on the listener that have committed the parser to a specific interpretation of the tokens. I don't know whether that would impact the approach you suggested or not.

I don't know whether it would give better results, though, or even how to measure such things...

One way to measure something like this is through a user study: ask users questions to determine whether they understand the diagnostics and whether the diagnostics seem right.

Unfortunately we don't always have the resources to run user studies, so we tend to use our own perceptions to try to guess how users would feel about the product. That isn't terrible because we are also users, though we're not always typical users.

There are also some heuristics that can be applied. One of them that often helps me measure such things is the number of diagnostics being produced for a single error. Ideally there will be one diagnostic per error. For example, if I type if x == null) {}, I expect to see one diagnostic (informing me that I'm missing a parenthesis). If the parser produces three diagnostics (for example "if statement missing condition", "missing semicolon", "extra parenthesis"), then it probably hasn't recovered well because none of those is likely to be the real problem.

eernstg · 2021-09-09T07:27:37Z

About backtracking: That shouldn't be necessary (for the approach that I was suggesting): At the point where the token sequence derived from <primary> <selector>* '<' has been consumed, the matching > is looked up. If it doesn't exist then the < is a relational operator. If it exists then we check the next token after that >; let's call it t. If t is a stopToken or a continuationToken, and the <...> can be parsed as a <typeList>, then we commit to parsing <...> as a <selector> which is a <typeArguments>.

When t is not a stop/continuation token, we know that there are several values for t that will inevitably give rise to a syntax error, in particular: Every token which cannot be the first token of an expression, including ?., .., ?.., and several binary operators. For some subset of these tokens, the best approach to recovery might be to gather the tokens of <...> into an "error token sequence" bucket (don't even bother trying to create a normal AST for <int, Map<int, 2>>), and then report the error based on the structure where everything up to the .. is considered to be an expression.

This is about parsing recovery (not about the grammar of the language), and it needs to change if we ever introduce an expression that starts with .., but that should be doable.

And even if we can't run full-fledged user studies, there are lots of developers out there, and many of them are willing to speak up, which is also a great resource!

sgrekhov added the area-front-end Use area-front-end for front end / CFE / kernel format related issues. label Aug 31, 2021

sgrekhov mentioned this issue Aug 31, 2021

Failures on [co19] Roll co19 to cc4df792d0f02f15bfe04911c1f856b90cb7cbf4 dart-lang/co19#1179

Closed

johnniwinther assigned jensjoha Aug 31, 2021

johnniwinther added the P2 A bug or feature request we're likely to work on label Aug 31, 2021

lrhn added the front-end-fasta-recovery label Sep 1, 2021

sgrekhov mentioned this issue Sep 3, 2021

Failures on [co19] Roll co19 to a3af05fb4ee10b065bf0f5bfe4e9546bc077d5e8 dart-lang/co19#1187

Closed

jensjoha mentioned this issue Sep 9, 2021

Parser doesn't support a<b>..c() #47045

Closed

johnniwinther added P3 A lower priority bug or feature request and removed P2 A bug or feature request we're likely to work on labels Oct 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong error message in CFE when parsing `>?..` #47060

Wrong error message in CFE when parsing `>?..` #47060

sgrekhov commented Aug 31, 2021

sgrekhov commented Sep 3, 2021

jensjoha commented Sep 8, 2021

eernstg commented Sep 8, 2021

bwilkerson commented Sep 8, 2021

eernstg commented Sep 8, 2021

bwilkerson commented Sep 8, 2021

eernstg commented Sep 9, 2021 •

edited

Wrong error message in CFE when parsing >?.. #47060

Wrong error message in CFE when parsing >?.. #47060

Comments

sgrekhov commented Aug 31, 2021

sgrekhov commented Sep 3, 2021

jensjoha commented Sep 8, 2021

eernstg commented Sep 8, 2021

bwilkerson commented Sep 8, 2021

eernstg commented Sep 8, 2021

bwilkerson commented Sep 8, 2021

eernstg commented Sep 9, 2021 • edited

Wrong error message in CFE when parsing `>?..` #47060

Wrong error message in CFE when parsing `>?..` #47060

eernstg commented Sep 9, 2021 •

edited