Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange bug that produces 'mismatched input' always error #3506

Closed
msoler75 opened this issue Jan 23, 2022 · 3 comments
Closed

Strange bug that produces 'mismatched input' always error #3506

msoler75 opened this issue Jan 23, 2022 · 3 comments

Comments

@msoler75
Copy link

msoler75 commented Jan 23, 2022

I tried at many hours trying to get running a simple grammar but always telling me same error at line 1:
line 1:0 mismatched input 'hola' expecting 'hola'

I just isolated reproducible error. Here they are lexer and parser:

HolaLexer.g4:

lexer grammar HolaLexer;

fragment IdentifierStart
    : [\p{L}]
    | [$_]
    ;

fragment IdentifierPart
    : IdentifierStart
    | [\p{Mn}]
    | [\p{Nd}]
    | [\p{Pc}]
    | '\u200C'
    | '\u200D'
    ;

Identifier:  IdentifierStart IdentifierPart*;

Hola : 'hola' ;

HolaParser.g4:

parser grammar HolaParser;

options {
    tokenVocab=HolaLexer;
}

program : Hola;


test.js:

import antlr4 from 'antlr4';
const {
  CommonTokenStream,
  InputStream
} = antlr4;
import HolaLexer from '../HolaLexer.js';
import HolaParser from '../HolaParser.js';

var input = "hola";
var chars = new InputStream(input, true)
var lexer = new HolaLexer(chars);
var tokens = new CommonTokenStream(lexer);
var parser = new HolaParser(tokens);

parser.buildParseTrees = false;
const tree = parser.program();

If I remove the Identifier rules and fragments in lexer, it works.

HolaLexer.g4 (modified):

lexer grammar HolaLexer;

Hola : 'hola' ;

I got Identifier rules from javascript grammar.

I'm using latest antlr4: antlr-4.9.3-complete.jar

@msoler75 msoler75 changed the title Strange bug that produces 'unexpected token' always error Strange bug that produces 'mismatched input' always error Jan 23, 2022
@ericvergnaud
Copy link
Contributor

Hi,
support is on the google discussion group.

Please close

@KvanTTT
Copy link
Member

KvanTTT commented Jan 23, 2022

It's a quite frequent mistake, you have to put Hola : 'hola' ; before Identifier: IdentifierStart IdentifierPart*; but not after:

Hola : 'hola' ;
Identifier:  IdentifierStart IdentifierPart*;

Also, see #1072 and read how lexer rules precedence works on StackOverflow.

ANTLR reports the warning One of the token B values unreachable. x is always overlapped by token A for only string literals:

A: 'x';
B: 'x';

I suppose ANTLR should also handle cases like yours but I haven't yet come up with a fast and quite general solution. It's the problem related to DFA subtraction. But not all rules (even not recursive) can be simply represented as DFA.

@msoler75
Copy link
Author

Thanks! I close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants