Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for an array of token types in longer_alt property #1602

Closed
msujew opened this issue Aug 8, 2021 · 4 comments · Fixed by #1605
Closed

Support for an array of token types in longer_alt property #1602

msujew opened this issue Aug 8, 2021 · 4 comments · Fixed by #1605

Comments

@msujew
Copy link
Collaborator

msujew commented Aug 8, 2021

Currently, longer_alt on token types only accepts a single other token type. However, let's say we have a division token like / in our grammar and single tokens for multiline and singleline comments, starting with /* and // respectively. This is common for every language that derives its syntax from C. Using longer_alt in the / token type currently allows us to select only one of the comment token types.

Of course, we could try to merge their definitions, thereby losing some important information. For example, in Langium, we plan to implement the LSP hover API by looking at the last multiline comment that directly precedes the AST element in question. Doing this generically requires us to rely on the token type name. However, this behavior breaks as soon as we have to merge the token type definitions. That outlines my use case for an array of token types in the longer_alt property.

This improvement should be non-breaking, as the API only has to be extended to look like longer_alt: TokenType | TokenType[]. Are you interested in seeing this change? I would gladly contribute a PR.

@bd82
Copy link
Member

bd82 commented Aug 10, 2021

Hello @msujew

Are you interested in seeing this change? I would gladly contribute a PR.

Yes, it seems like a good upgrade.

Note the lexer runtime code is super optimized for performance (and thus "ugly").

But I think the performance implications of this feature would not be severe as
the added conditional/loop would only appear inside a pre-existing and uncommon condition.
Meaning that extra "work" that is only performed for "LONGER_ALT" tokens is probably fine.

Comparing dev vs latest release performance can be done using this benchmark:

The array of token types capability should also be mentioned in these docs

@msujew
Copy link
Collaborator Author

msujew commented Aug 10, 2021

Note the lexer runtime code is super optimized for performance (and thus "ugly").

Yeah, I already looked into what needs to be changed previously. I'm used to writing ugly code though ;)

Thanks for setting me up with the required processes (especially the performance benchmark). Expect a PR from me within the next days.

msujew added a commit to msujew/chevrotain that referenced this issue Aug 10, 2021
msujew added a commit to msujew/chevrotain that referenced this issue Aug 10, 2021
@bd82
Copy link
Member

bd82 commented Aug 15, 2021

Thanks @msujew I will review it sometime this week.

msujew added a commit to msujew/chevrotain that referenced this issue Aug 15, 2021
msujew added a commit to msujew/chevrotain that referenced this issue Aug 26, 2021
msujew added a commit to msujew/chevrotain that referenced this issue Aug 26, 2021
@bd82 bd82 closed this as completed in #1605 Oct 8, 2021
@bd82
Copy link
Member

bd82 commented Oct 9, 2021

release in 9.1.0
https://www.npmjs.com/package/chevrotain/v/9.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants