-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error using MANY with OR alternatives: NoViableAltException #867
Comments
Executable example:(standalone: copy, paste, run) const {
createToken,
Lexer,
Parser,
} = require('chevrotain') // Version 4.1.0
const Whitespace = createToken({
name: 'Whitespace',
pattern: /\s+/,
group: Lexer.SKIPPED,
})
const SectionName = createToken({
name: 'SectionName',
pattern: /section|subsection|paragraph|subparagraph|clause|subclause/i,
})
const SectionRef = createToken({
name: 'SectionRef',
pattern: /\d+/,
})
const SubRef = createToken({
name: 'SubRef',
pattern: /\(\w+\)/,
})
const Repealed = createToken({
name: 'Repealed',
pattern: /is repealed/i,
})
const Replaced = createToken({
name: 'Replaced',
pattern: /is replaced/i,
})
const tokenList = [
Whitespace,
SectionName,
SectionRef,
SubRef,
Repealed,
Replaced,
]
const tokenVocabulary = tokenList.reduce((map, tokenType) => {
map[tokenType.name] = tokenType
return map
}, {})
class CommandParser extends Parser {
constructor() {
super(tokenVocabulary, { outputCst: false })
/**
* The root rule.
*/
this.command = null
/**
* Main rules.
*/
this.replace = null
this.repeal = null
/**
* Subrules.
*/
this.pointRef = null
/**
* The entry rule.
*/
this.RULE('command', () => {
const command = this.OR([
{ ALT: () => this.SUBRULE(this.replace) },
{ ALT: () => this.SUBRULE(this.repeal) },
])
return command
})
/**
* First alternative.
*/
this.RULE('replace', () => {
const sectionType = this.CONSUME(SectionName).image
const pointRef = this.SUBRULE(this.pointRef)
this.CONSUME(Replaced)
return {
commandType: 'replace',
sectionType,
pointRef,
}
})
/**
* Second alternative.
*/
this.RULE('repeal', () => {
const sectionType = this.CONSUME(SectionName).image
const pointRef = this.SUBRULE(this.pointRef)
this.CONSUME(Repealed)
return {
commandType: 'repeal',
sectionType,
pointRef,
}
})
// Section, Subsection, etc.
this.RULE('sectionType', () => {
const token = this.CONSUME(SectionName).image
return token
})
// 2, 5(3), 4(7)(b)
this.RULE('pointRef', () => {
const pointRef = []
// Point refs begin with a section ref.
pointRef.push(this.CONSUME(SectionRef).image)
/**
* A section ref is followed by zero or more subrefs.
* Causes NoViableAltException error.
*/
this.MANY(() => {
const subref = this.CONSUME(SubRef).image
pointRef.push(subref)
})
/**
* Same error as MANY above (NoViableAltException).
*/
// this.OPTION(() => {
// this.AT_LEAST_ONE(() => {
// const subref = this.CONSUME(SubRef).image
// pointRef.push(subref)
// })
// })
return pointRef
})
this.performSelfAnalysis()
}
}
// The single parser instance.
const parser = new CommandParser()
/**
* Lex and parse the input, return the parsed command.
*/
function parseCommand(input) {
const commandLexer = new Lexer(tokenList)
const lexingResult = commandLexer.tokenize(input)
const tokenNames = lexingResult.tokens.map((token) => token.tokenType.name)
const parser = new CommandParser()
parser.input = lexingResult.tokens
const command = parser.command()
if (parser.errors.length > 0) {
console.log('error tokens:', tokenNames)
console.log(parser.errors)
return
}
console.log('success tokens:', tokenNames)
return command
}
// Only run from CLI
if (!module.parent) {
const inputs = [
'Section 5 is replaced', // success
'Subsection 2(2) is repealed', // success
'Paragraph 4(a)(i) is replaced', // failure
]
for (const input of inputs) {
const result = parseCommand(input)
if (result) {
console.log('success result:', result)
}
}
} Output
|
With either of the alternative rules alone, it works fine: this.RULE('command', () => {
const command = this.OR([
{ ALT: () => this.SUBRULE(this.replace) },
// { ALT: () => this.SUBRULE(this.repeal) },
])
const inputs = [
'Section 5 is replaced', // success
// 'Subsection 2(2) is repealed',
'Paragraph 4(a)(i) is replaced', // success
]
/* Output
success tokens: [ 'SectionName', 'SectionRef', 'Replaced' ]
success result: {
commandType: 'replace',
sectionType: 'Section',
pointRef: [ '5' ]
}
success tokens: [ 'SectionName', 'SectionRef', 'SubRef', 'SubRef', 'Replaced' ]
success result: {
commandType: 'replace',
sectionType: 'Paragraph',
pointRef: [ '4', '(a)', '(i)' ]
}
*/ or: this.RULE('command', () => {
const command = this.OR([
// { ALT: () => this.SUBRULE(this.replace) },
{ ALT: () => this.SUBRULE(this.repeal) },
])
const inputs = [
// 'Section 5 is replaced',
'Subsection 2(2) is repealed', // success
// change to repealed ---v
'Paragraph 4(a)(i) is repealed', // success
]
/* Output
success tokens: [ 'SectionName', 'SectionRef', 'SubRef', 'Repealed' ]
success result: {
commandType: 'repeal',
sectionType: 'Subsection',
pointRef: [ '2', '(2)' ]
}
success tokens: [ 'SectionName', 'SectionRef', 'SubRef', 'SubRef', 'Repealed' ]
success result: {
commandType: 'repeal',
sectionType: 'Paragraph',
pointRef: [ '4', '(a)', '(i)' ]
}
*/ |
That's for reporting this @jabney particularly for providing an easy to reproduce example. 👍 I believe there are several issues here: 1. Grammar AmbiguityChevrotain is an LL(K) parseing toolkit. This means that it can look upto K tokens e.g: main:
alt1 |
alt2
;
alt1:
A (B)* C
alt2:
A (B)* D So for any K = N we could create an input with n-1 "B" tokens To resolve this you could either:
2. Failed ambiguity detection.This ambiguity should be detected, and the parser should throw an error during initialization it is a bug, see: #854 3. This kind of ambiguity is not documented on the website:See: #853 4. The lookahead calculation is missing one possible path.This looks like a bug too.
So while by default (K=4) so we would expect the list of possible tokens to include.
of course either alternative could start with this sequence, so in the case of the ambiguity Suggestion / WorkaroundAt this point you will need to refactor the grammar to be LL(K). I will prioritize investigating the bugs described here, because their combination |
@bd82 your thoughtful explanation has helped me understand some issues with my grammar that would have bitten me later, namely that the I will try the LL(k) refactoring you suggested and see how that goes with the default lookahead. My actual grammar is quite a bit more complex and arbitrary than the provided example suggests, with several disparate and non-regular input types, and this might be tricky. Alternately, I suspect that I can move the
and then handle extracting the values a little differently in the logic. However I will study your refactoring example and see if I can think more in LL(k) when constructing the grammar, as this is really the best approach. Thanks again for your detailed answer, and do let me know if there's anything else I can help with for reproducing the bugs. All the best, |
If the common prefix could be infinitely long than no amount of lookahead would suffice. I am using backtracking myself in the development of a Java Parser because
I tend to agree with the statement, while some parsing tools have greater capabilities (e.g handling left recursion or LL(*) grammars) there is an advantage to keeping things simple particularly when considering the ease of others to develop tools for a language you have created. For example see the Python approach to grammar design:
|
I couldn't keep from laughing at that.
In theory it could be infinite, but empirically the depth goes up to 6 known repetitions. Even still, I don't want to jack up lookahead as I need to process potentially hundreds of inputs as quickly as possible unless I decide to implement some sort of preprocessing and caching mechanism as a last resort. In some of my early performance checks I was able to process 100's of inputs in 20-30 ms, iirc, and that is quite good for my use case.
I'll keep an eye out for an updated version and look into how to make use of some backtracking if needed. Cheers! |
😄
Do you start a new process for each input or use the same process and parser instance?
Version 4.1.1 should be on npm once the travis build finishes. I will check this later to ensure it succeeded. |
I create a single instance of the parser and then throw all the inputs at that single instance: class CommandParser extends Parser { ... }
// The single parser instance.
const parser = new CommandParser()
export function parseCommand(input: string): ICommand {
const { tokens } = tokenize(input)
parser.input = tokens
const command = parser.command()
if (parser.errors.length > 0) {
throw new ParseError(input, parser.errors)
}
return command
}
...
// Create some inputs and parse them for quick debugging.
if (!module.parent) {
const inputs: string[] = [ ... ]
for (const input if inputs) {
const result = parseCommand(input)
console.log(result)
}
} BTW I just ran 676 inputs in ~24 milliseconds using my previous attempt at a grammar:
For my purposes this exceeds expectations. 😀
That was fast! 🥇 |
I am glad the performance is adequate enough. Chevrotain is quite fast, see: https://sap.github.io/chevrotain/performance/
That bug seems to have produced multiple strange behaviors so it was time to squash it 🐞 |
Fair enough. For me it means processing 1,000 inputs in roughly 36 ms depending on my final grammar. 😂
That's one of the reasons I decided to use it. I need fast. The other reasons were it's easy to get up and running, and the online docs and tutorial are excellent especially for one just starting out. BTW I can confirm that with
Thanks much for the bug fix and all the help. It's greatly appreciated. 😎 |
Great, glad to hear 😄
Thanks for confirmation.
You are welcome. |
I'm having an issue with a
NoViableAltException
error when using two different (but similar) alternatives in anOR
rule. It looks like a bug but it's quite possibly user error.Chevrotain version
4.1.0
Node version
8.9.2
I've simplified my grammar to highlight the issue:
Input:
The result is a
NoViableAltException
on the third input.Result:
Tokens (token names):
However if I remove one of the alternatives in the
command
rule, the third input works fine:I'll include a complete executable example in a follow-up comment.
Apologies in advance if it's not a bug and I'm just missing something that's already documented.
The text was updated successfully, but these errors were encountered: