Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Superfluous rule needed to make parser work (4.7.2, Java target) #2689

Open
andreasabel opened this issue Nov 24, 2019 · 6 comments
Open

Superfluous rule needed to make parser work (4.7.2, Java target) #2689

andreasabel opened this issue Nov 24, 2019 · 6 comments

Comments

@andreasabel
Copy link

The parser generated from the following grammar fails, unless I activate the commented-out superfluous rule:

parser grammar RightRecParser;
options { tokenVocab = RightRecLexer; }

listInteger returns [ int result ]
  : /* empty */
      { $result = 0; }

  | n=INTEGER m=listInteger
      { $result = Integer.parseInt($n.getText()) + $m.result; }
;

// // The following irrelevant rule makes the parser succeed:
// foo returns [ int result ]
//   : m=listInteger
//       { $result = $m.result; }
// ;

Behavior without foo:

echo "1 2" | java -cp .:/Users/abel/bnfc/testing/data/antlr-4.7.2-complete.jar RightRec.Test
line 1:0 no viable alternative at input '1'

Behavior with foo:

echo "1 2" | java -cp .:/Users/abel/bnfc/testing/data/antlr-4.7.2-complete.jar RightRec.Test
3

I would expect that an unused rule is not needed to generate a working parser. These grammars are originally auto-generated by the BNFC tool, --antlr backend. Adding redundant rules by hand is thus not a viable workaround.

Lexer is unsurprisingly:

lexer grammar RightRecLexer;

// Whitespace
WS             : (' ' | '\r' | '\t' | '\n' | '\f')+ ->  skip;

// Numbers
fragment DIGIT : [0-9] ;
INTEGER        : DIGIT+;

The full MWE is attached (including Makefile and Test wrapper).
bug.zip

@sharwell
Copy link
Member

Your entry rule is missing an explicit EOF, so this is likely a duplicate of #118.

@ericvergnaud
Copy link
Contributor

ericvergnaud commented Nov 25, 2019 via email

@sharwell
Copy link
Member

@ericvergnaud this issue is related to a bug in the code.

@andreasabel
Copy link
Author

andreasabel commented Nov 25, 2019

@sharwell Thanks for the pointer!

Typical LR parser generators add, in a preprocessing step, a rule

   StartX : X EOF;

to the grammar if X is supposed to be usable as an entrypoint. (See, e.g., the happy parser generator for Haskell, or just standard textbooks on LR parsing, such as the Dragon Book.)
Users never talk about EOF in the LR parser specifications, it is handled automatically.

Thanks to your pointer I found the section in the ANTLR documentation that talks about EOF and start rules, https://github.com/antlr/antlr4/blob/master/doc/parser-rules.md#start-rules-and-eof . However, it does not say that adding an EOF rule might be necessary to give successful parses, only to reports errors in case only a proper prefix of the input matches the grammar.

I can work around this issue by emitting EOF rules for entrypoints when translating a grammar to ANTLR. But it comes as a surprise that ANTLR deviates so much from standard conventions of parser generators.

Maybe ANTLR could also create these new entrypoint symbols StartX automatically?

And the OP still exists: why does adding a bogus rule mentioning X fix the problem for X? There is still no mention of EOF.
(Maybe I should not be so surprised, since typical LR parser generators are also non-compositional in ways one only understands when studying the parsing algorithm in detail.)

@sharwell
Copy link
Member

Technically it's a bug that the missing EOF scenario doesn't work. The bug isn't fixed because we haven't found a way to make the scenario work without a big performance impact that would affect scenarios where the EOF is present (i.e. the bug fix slows down cases that don't need the bug fix).

andreasabel added a commit to BNFC/bnfc that referenced this issue Nov 25, 2019
This works around an issue in ANTLR that parsers sometimes do not work
if not created with such a start rule.

See antlr/antlr4#2689
@andreasabel
Copy link
Author

I fixed the ANTLR backend of BNFC to always emit EOF rules for entrypoints now (BNFC/bnfc#272), meaning that I am not dependent on the outcome of this issue any more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants