Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recovery Re-Sync logic ignores Token Categories #1055

Closed
bd82 opened this issue Oct 15, 2019 · 5 comments · Fixed by #1756
Closed

Recovery Re-Sync logic ignores Token Categories #1055

bd82 opened this issue Oct 15, 2019 · 5 comments · Fixed by #1756
Labels

Comments

@bd82
Copy link
Member

bd82 commented Oct 15, 2019

It seems the tokenMatcher function is not used which means
the token categories are not handled properly.

Example:

Possibly also affects this:

This may cause the Parser to re-sync (throw out) more input than needed to recover.

@bd82 bd82 added the Bug 🪲 label Oct 15, 2019
@AlansCodeLog
Copy link

AlansCodeLog commented Jan 23, 2021

Edit: Nevermind, I think the problem was something else and also maybe I misunderstood how the recovery is supposed to work*, but, I got things working again and recovering how I wanted.

Just ran into this. I'm writing a parser that takes both symbol and non-symbol operators, so they are distinct token types (since the word operators must use longer_alt but not the symbol ones), but the operators have a category to be able to consume them since MANY_SEP won't take rules, unfortunately using it leads to this issue.

The following workaround works but is very wordy since DEF must be written twice.

// $.MANY_SEP({ SEP: OPERATOR_AND, DEF })
$.OR([
	{
		GATE: () => $.LA(2).tokenType === SYM_AND,
		ALT: () => $.MANY_SEP({ SEP: SYM_AND, DEF }),
	},
	{
		GATE: () => $.LA(2).tokenType !== SYM_AND,
		ALT: () => $.MANY_SEP2({ SEP: WORD_AND, DEF2 })
	},
])

Is there any danger in just using js control flows like this if it seems to work?

if ($.LA(2).tokenType === SYM_AND) {
	$.MANY_SEP({ SEP: SYM_AND, DEF })
} else {
	$.MANY_SEP({ SEP: WORD_AND, DEF })
}

@medihack
Copy link
Contributor

medihack commented Jan 30, 2022

I am writing an XML-like parser where I need some recovery for something like < <Foo />. I had a hard time, as I could not get the re-sync to work. Then I found a hint to this issue in the xml-tools repository. It seems I could fix this with overwriting findReSyncTokenType, too. But it feels more like a hack as it is not even a public API (no type definitions). From the source code of xml-tools it seems very easy to fix it upstream in Chevrotain. Is there a reason why it is not fixed yet?
I also wonder a bit if more of those recovery APIs should be public and with type definitions (especially as I already hat the problem with #1753).

@bd82
Copy link
Member Author

bd82 commented Jan 31, 2022

it seems very easy to fix it upstream in Chevrotain. Is there a reason why it is not fixed yet?

Priorities / not affecting my use cases so much / leaving some easy to fix items for contributors / ...

I also wonder a bit if more of those recovery APIs should be public and with type definitions

Yeah, that sounds like an improvement for one of the big (and somewhat unique) features of fault tolerance.

@medihack
Copy link
Contributor

Chevrotain is really a nice project and makes my life so much easier as I create a quite complex medical template language (planned to be released later this year under an Open Source license). I also create the complete tooling around it (language service with Monaco and VS Code integration). So along the way, I will see where I can contribute.

@bd82
Copy link
Member Author

bd82 commented Jan 31, 2022

I will see where I can contribute.

Awesome 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants