Parentheses without quantifier in parser rule lead to syntax error on non-root rule parsing #1545

rslemos · 2016-12-23T14:54:56Z

Consider the following grammar:

grammar OpenDeviceStatement;

program : statement+ '.' ;

statement : 'OPEN' ( 'DEVICE' (  OPT1  |  OPT2  |  OPT3  )? )+ ;

//statement : 'OPEN' ( 'DEVICE' ( (OPT1) |  OPT2  |  OPT3  )? )+ ;
//statement : 'OPEN' ( 'DEVICE' (  OPT1  | (OPT2) |  OPT3  )? )+ ;
//statement : 'OPEN' ( 'DEVICE' (  OPT1  |  OPT2  | (OPT3) )? )+ ;
//statement : 'OPEN' ( 'DEVICE' ( (OPT1) | (OPT2) |  OPT3  )? )+ ;
//statement : 'OPEN' ( 'DEVICE' ( (OPT1) |  OPT2  | (OPT3) )? )+ ;
//statement : 'OPEN' ( 'DEVICE' (  OPT1  | (OPT2) | (OPT3) )? )+ ;
//statement : 'OPEN' ( 'DEVICE' ( (OPT1) | (OPT2) | (OPT3) )? )+ ;

OPT1 : 'OPT-1';
OPT2 : 'OPT-2';
OPT3 : 'OPT-3';

WS : (' '|'\n')+ -> channel(HIDDEN);

When parsing the text OPEN DEVICE DEVICE (directly through statement(), and not through program()) everything works fine.

Trace output:

enter   statement, LT(1)=OPEN
consume [@0,0:3='OPEN',<2>,1:0] rule statement
consume [@2,5:10='DEVICE',<3>,1:5] rule statement
consume [@4,12:17='DEVICE',<3>,1:12] rule statement
exit    statement, LT(1)=<EOF>

Tree output:

(statement OPEN DEVICE DEVICE)

Exercise 1

Now consider substituting any of the alternative statement definitions (where one or more OPTx tokens are parenthized).

Tree output is still the same. But now some syntaxError() get reported on the ANTLRErrorListener. Here the trace along with ConsoleErrorListener output:

enter   statement, LT(1)=OPEN
consume [@0,0:3='OPEN',<2>,1:0] rule statement
consume [@2,5:10='DEVICE',<3>,1:5] rule statement
consume [@4,12:17='DEVICE',<3>,1:12] rule statement
line 1:18 no viable alternative at input '<EOF>'
exit    statement, LT(1)=<EOF>

Maybe ANTLR4 is expecting either: any OPTx (an optional argument to the previous DEVICE), another DEVICE, OPEN (for a new statement) or . (period, to close the program). I understand that <EOF> is not to be expected if we were to parse the text from the root rule program (here I just assume that ANTLR4 breaks its promise to be able to parse from any rule, giving no rule the special quality of being the root rule or the main rule).

Exercise 2 (on top of exercise 1)

Going one step further, lets then parse the text OPEN DEVICE DEVICE. (notice the period). Again our parsing starts at statement rule.

What trace do we get?

enter   statement, LT(1)=OPEN
consume [@0,0:3='OPEN',<2>,1:0] rule statement
consume [@2,5:10='DEVICE',<3>,1:5] rule statement
consume [@4,12:17='DEVICE',<3>,1:12] rule statement
line 1:18 extraneous input '.' expecting {<EOF>, 'DEVICE', 'OPT-1', 'OPT-2', 'OPT-3'}
line 1:19 no viable alternative at input '<EOF>'
exit    statement, LT(1)=<EOF>

Weirdly it is expecting <EOF> (and not OPEN nor . because they are outside statement rule).

The output tree wrongly incorporates the .:

(statement OPEN DEVICE DEVICE .)

(well, at least I expected it to stop at the second DEVICE, leaving the . unparsed)

I think there are 3 bugs here:

the superfluous parentheses around OPTx should have no effect in the generated parser;
the parsing through the non-root rule statement should not see the <EOF> as unexpected (after DEVICE or any OPTx token);
the parsing through the non-root rule statement should not consume the . token (not sure about this one; this may only apply to lexers, not to parsers).

What do you think?

The text was updated successfully, but these errors were encountered:

parrt · 2016-12-23T17:54:13Z

Hi. 1. is suspicious!. 2. that is expected. when you call statement as root rule, EOF can follow possibly but it must look at next token to see if it should keep looping, hence, not expecting '.'. (it's not called from program). 3. it consumes . as an error token so no problem. thanks for detailed report!

sharwell · 2016-12-23T18:24:03Z

Starting with #118 (a known issue), this seems to be a more detailed and slightly broader example of cases which can fail.

📝 I'm able to reproduce all three bugs in my fork as described, suggesting that this bug is related to a pretty fundamental understanding for how we treat both the end of the start rule and the EOF symbol.

parrt · 2016-12-23T19:32:44Z

Where is the emoji for "holy crap!"?

sharwell · 2016-12-23T20:46:31Z

😮 ?

⛪️ 💩 ?

sharwell · 2016-12-23T21:42:01Z

More information: For the scenario where the input is OPEN DEVICE DEVICE ., the error is actually getting reported by the call to sync. The implementation of DefaultErrorStrategy.sync doesn't understand decisions with an epsilon transition to the end of a start rule that doesn't end with an explicit EOF.

By using BailErrorStrategy instead of DefaultErrorStrategy, the first and third bugs disappear. This isn't a long-term solution but it should substantially constrain the scope of a fix.

The second bug appears to be caused by a failure in the template for LL1OptionalBlock, which contains code I can't explain and I'm guessing needs to be modified as follows:

 LL1OptionalBlock(choice, alts, error) ::= <<
 setState(<choice.stateNumber>);
 _errHandler.sync(this);
 switch (_input.LA(1)) {
 <choice.altLook,alts:{look,alt| <cases(ttypes=look)>
 	<alt>
 	break;}; separator="\n">
 default:
-	<error>
+	break;
 }
 >>

See antlr#1545

This change updates the default sync() strategy to match the strategy used for selecting an alternative when prediction leaves the decision rule prior to reaching a syntax error. Closes antlr#1545

See antlr#1545

sharwell · 2016-12-27T23:27:06Z

@rslemos Thanks for taking the time to post such an excellent set of repro steps. Your examples allowed us to fix two different bugs in the prediction algorithm implementation. 🥇

parrt added grammars parsers labels Dec 23, 2016

parrt added the type:bug label Dec 23, 2016

sharwell added a commit to sharwell/antlr4 that referenced this issue Dec 23, 2016

Add regression tests for antlr#1545

3361365

sharwell added a commit to sharwell/antlr4 that referenced this issue Dec 23, 2016

Fix code generation for LL1OptionalBlock bypass alternative

7a83cd4

See antlr#1545

sharwell mentioned this issue Dec 23, 2016

Fix multiple problems with optional block bypass at end of rule #1546

Merged

sharwell added a commit to sharwell/antlr4 that referenced this issue Dec 26, 2016

Fix test cases affected by recent changes

1d066e0

See antlr#1545

parrt closed this as completed in #1546 Dec 28, 2016

sharwell mentioned this issue Oct 4, 2018

Generate RuleStack tunnelvisionlabs/antlr4cs#309

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parentheses without quantifier in parser rule lead to syntax error on non-root rule parsing #1545

Parentheses without quantifier in parser rule lead to syntax error on non-root rule parsing #1545

rslemos commented Dec 23, 2016

parrt commented Dec 23, 2016

sharwell commented Dec 23, 2016 •

edited

parrt commented Dec 23, 2016

sharwell commented Dec 23, 2016 •

edited

sharwell commented Dec 23, 2016

sharwell commented Dec 27, 2016

Parentheses without quantifier in parser rule lead to syntax error on non-root rule parsing #1545

Parentheses without quantifier in parser rule lead to syntax error on non-root rule parsing #1545

Comments

rslemos commented Dec 23, 2016

Exercise 1

Exercise 2 (on top of exercise 1)

parrt commented Dec 23, 2016

sharwell commented Dec 23, 2016 • edited

parrt commented Dec 23, 2016

sharwell commented Dec 23, 2016 • edited

sharwell commented Dec 23, 2016

sharwell commented Dec 27, 2016

sharwell commented Dec 23, 2016 •

edited

sharwell commented Dec 23, 2016 •

edited