Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parentheses without quantifier in parser rule lead to syntax error on non-root rule parsing #1545

Closed
rslemos opened this issue Dec 23, 2016 · 6 comments

Comments

@rslemos
Copy link

rslemos commented Dec 23, 2016

Consider the following grammar:

grammar OpenDeviceStatement;

program : statement+ '.' ;

statement : 'OPEN' ( 'DEVICE' (  OPT1  |  OPT2  |  OPT3  )? )+ ;

//statement : 'OPEN' ( 'DEVICE' ( (OPT1) |  OPT2  |  OPT3  )? )+ ;
//statement : 'OPEN' ( 'DEVICE' (  OPT1  | (OPT2) |  OPT3  )? )+ ;
//statement : 'OPEN' ( 'DEVICE' (  OPT1  |  OPT2  | (OPT3) )? )+ ;
//statement : 'OPEN' ( 'DEVICE' ( (OPT1) | (OPT2) |  OPT3  )? )+ ;
//statement : 'OPEN' ( 'DEVICE' ( (OPT1) |  OPT2  | (OPT3) )? )+ ;
//statement : 'OPEN' ( 'DEVICE' (  OPT1  | (OPT2) | (OPT3) )? )+ ;
//statement : 'OPEN' ( 'DEVICE' ( (OPT1) | (OPT2) | (OPT3) )? )+ ;

OPT1 : 'OPT-1';
OPT2 : 'OPT-2';
OPT3 : 'OPT-3';

WS : (' '|'\n')+ -> channel(HIDDEN);

When parsing the text OPEN DEVICE DEVICE (directly through statement(), and not through program()) everything works fine.

Trace output:

enter   statement, LT(1)=OPEN
consume [@0,0:3='OPEN',<2>,1:0] rule statement
consume [@2,5:10='DEVICE',<3>,1:5] rule statement
consume [@4,12:17='DEVICE',<3>,1:12] rule statement
exit    statement, LT(1)=<EOF>

Tree output:

(statement OPEN DEVICE DEVICE)

Exercise 1

Now consider substituting any of the alternative statement definitions (where one or more OPTx tokens are parenthized).

Tree output is still the same. But now some syntaxError() get reported on the ANTLRErrorListener. Here the trace along with ConsoleErrorListener output:

enter   statement, LT(1)=OPEN
consume [@0,0:3='OPEN',<2>,1:0] rule statement
consume [@2,5:10='DEVICE',<3>,1:5] rule statement
consume [@4,12:17='DEVICE',<3>,1:12] rule statement
line 1:18 no viable alternative at input '<EOF>'
exit    statement, LT(1)=<EOF>

Maybe ANTLR4 is expecting either: any OPTx (an optional argument to the previous DEVICE), another DEVICE, OPEN (for a new statement) or . (period, to close the program). I understand that <EOF> is not to be expected if we were to parse the text from the root rule program (here I just assume that ANTLR4 breaks its promise to be able to parse from any rule, giving no rule the special quality of being the root rule or the main rule).

Exercise 2 (on top of exercise 1)

Going one step further, lets then parse the text OPEN DEVICE DEVICE. (notice the period). Again our parsing starts at statement rule.

What trace do we get?

enter   statement, LT(1)=OPEN
consume [@0,0:3='OPEN',<2>,1:0] rule statement
consume [@2,5:10='DEVICE',<3>,1:5] rule statement
consume [@4,12:17='DEVICE',<3>,1:12] rule statement
line 1:18 extraneous input '.' expecting {<EOF>, 'DEVICE', 'OPT-1', 'OPT-2', 'OPT-3'}
line 1:19 no viable alternative at input '<EOF>'
exit    statement, LT(1)=<EOF>

Weirdly it is expecting <EOF> (and not OPEN nor . because they are outside statement rule).

The output tree wrongly incorporates the .:

(statement OPEN DEVICE DEVICE .)

(well, at least I expected it to stop at the second DEVICE, leaving the . unparsed)


I think there are 3 bugs here:

  1. the superfluous parentheses around OPTx should have no effect in the generated parser;
  2. the parsing through the non-root rule statement should not see the <EOF> as unexpected (after DEVICE or any OPTx token);
  3. the parsing through the non-root rule statement should not consume the . token (not sure about this one; this may only apply to lexers, not to parsers).

What do you think?

@parrt
Copy link
Member

parrt commented Dec 23, 2016

Hi. 1. is suspicious!. 2. that is expected. when you call statement as root rule, EOF can follow possibly but it must look at next token to see if it should keep looping, hence, not expecting '.'. (it's not called from program). 3. it consumes . as an error token so no problem. thanks for detailed report!

@sharwell
Copy link
Member

sharwell commented Dec 23, 2016

Starting with #118 (a known issue), this seems to be a more detailed and slightly broader example of cases which can fail.

📝 I'm able to reproduce all three bugs in my fork as described, suggesting that this bug is related to a pretty fundamental understanding for how we treat both the end of the start rule and the EOF symbol.

@parrt parrt added the type:bug label Dec 23, 2016
@parrt
Copy link
Member

parrt commented Dec 23, 2016

Where is the emoji for "holy crap!"?

@sharwell
Copy link
Member

sharwell commented Dec 23, 2016

😮 ?

⛪️ 💩 ?

@sharwell
Copy link
Member

More information: For the scenario where the input is OPEN DEVICE DEVICE ., the error is actually getting reported by the call to sync. The implementation of DefaultErrorStrategy.sync doesn't understand decisions with an epsilon transition to the end of a start rule that doesn't end with an explicit EOF.

By using BailErrorStrategy instead of DefaultErrorStrategy, the first and third bugs disappear. This isn't a long-term solution but it should substantially constrain the scope of a fix.

The second bug appears to be caused by a failure in the template for LL1OptionalBlock, which contains code I can't explain and I'm guessing needs to be modified as follows:

 LL1OptionalBlock(choice, alts, error) ::= <<
 setState(<choice.stateNumber>);
 _errHandler.sync(this);
 switch (_input.LA(1)) {
 <choice.altLook,alts:{look,alt| <cases(ttypes=look)>
 	<alt>
 	break;}; separator="\n">
 default:
-	<error>
+	break;
 }
 >>

sharwell added a commit to sharwell/antlr4 that referenced this issue Dec 23, 2016
sharwell added a commit to sharwell/antlr4 that referenced this issue Dec 23, 2016
sharwell added a commit to sharwell/antlr4 that referenced this issue Dec 23, 2016
This change updates the default sync() strategy to match the strategy used
for selecting an alternative when prediction leaves the decision rule prior
to reaching a syntax error.

Closes antlr#1545
sharwell added a commit to sharwell/antlr4 that referenced this issue Dec 23, 2016
This change updates the default sync() strategy to match the strategy used
for selecting an alternative when prediction leaves the decision rule prior
to reaching a syntax error.

Closes antlr#1545
sharwell added a commit to sharwell/antlr4 that referenced this issue Dec 26, 2016
@sharwell
Copy link
Member

@rslemos Thanks for taking the time to post such an excellent set of repro steps. Your examples allowed us to fix two different bugs in the prediction algorithm implementation. 🥇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants