Support for an array of token types in `longer_alt` property #1602

msujew · 2021-08-08T10:20:06Z

Currently, longer_alt on token types only accepts a single other token type. However, let's say we have a division token like / in our grammar and single tokens for multiline and singleline comments, starting with /* and // respectively. This is common for every language that derives its syntax from C. Using longer_alt in the / token type currently allows us to select only one of the comment token types.

Of course, we could try to merge their definitions, thereby losing some important information. For example, in Langium, we plan to implement the LSP hover API by looking at the last multiline comment that directly precedes the AST element in question. Doing this generically requires us to rely on the token type name. However, this behavior breaks as soon as we have to merge the token type definitions. That outlines my use case for an array of token types in the longer_alt property.

This improvement should be non-breaking, as the API only has to be extended to look like longer_alt: TokenType | TokenType[]. Are you interested in seeing this change? I would gladly contribute a PR.

The text was updated successfully, but these errors were encountered:

bd82 · 2021-08-10T17:21:45Z

Hello @msujew

Are you interested in seeing this change? I would gladly contribute a PR.

Yes, it seems like a good upgrade.

Note the lexer runtime code is super optimized for performance (and thus "ugly").

https://github.com/Chevrotain/chevrotain/blob/master/packages/chevrotain/src/scan/lexer_public.ts#L566-L605

But I think the performance implications of this feature would not be severe as
the added conditional/loop would only appear inside a pre-existing and uncommon condition.
Meaning that extra "work" that is only performed for "LONGER_ALT" tokens is probably fine.

Comparing dev vs latest release performance can be done using this benchmark:

https://github.com/Chevrotain/chevrotain/tree/master/packages/chevrotain/benchmark_web

The array of token types capability should also be mentioned in these docs

https://github.com/Chevrotain/chevrotain/blob/e6c1f2a600ac0a31384b426ea6591c480c4a4b91/packages/website/docs/features/token_alternative_matches.md

msujew · 2021-08-10T19:25:44Z

Note the lexer runtime code is super optimized for performance (and thus "ugly").

Yeah, I already looked into what needs to be changed previously. I'm used to writing ugly code though ;)

Thanks for setting me up with the required processes (especially the performance benchmark). Expect a PR from me within the next days.

bd82 · 2021-08-15T08:51:44Z

Thanks @msujew I will review it sometime this week.

bd82 · 2021-10-09T20:17:44Z

release in 9.1.0
https://www.npmjs.com/package/chevrotain/v/9.1.0

msujew added a commit to msujew/chevrotain that referenced this issue Aug 10, 2021

feat: support multiple longer_alt tokens (Chevrotain#1602)

b64c165

msujew added a commit to msujew/chevrotain that referenced this issue Aug 10, 2021

feat: support multiple longer_alt tokens (Chevrotain#1602)

952b298

msujew mentioned this issue Aug 10, 2021

Support multiple longer_alt tokens #1605

Merged

msujew added a commit to msujew/chevrotain that referenced this issue Aug 15, 2021

feat: support multiple longer_alt tokens (Chevrotain#1602)

e9d6c40

bd82 pushed a commit that referenced this issue Aug 20, 2021

feat: support multiple longer_alt tokens (#1602)

a94b30f

msujew added a commit to msujew/chevrotain that referenced this issue Aug 26, 2021

feat: support multiple longer_alt tokens (Chevrotain#1602)

6655066

msujew added a commit to msujew/chevrotain that referenced this issue Aug 26, 2021

feat: support multiple longer_alt tokens (Chevrotain#1602)

7f79191

bd82 closed this as completed in #1605 Oct 8, 2021

bd82 pushed a commit that referenced this issue Oct 8, 2021

feat: support multiple longer_alt tokens (#1602) (#1605)

2b1f214

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for an array of token types in `longer_alt` property #1602

Support for an array of token types in `longer_alt` property #1602

msujew commented Aug 8, 2021 •

edited

Loading

bd82 commented Aug 10, 2021

msujew commented Aug 10, 2021

bd82 commented Aug 15, 2021

bd82 commented Oct 9, 2021

Support for an array of token types in longer_alt property #1602

Support for an array of token types in longer_alt property #1602

Comments

msujew commented Aug 8, 2021 • edited Loading

bd82 commented Aug 10, 2021

msujew commented Aug 10, 2021

bd82 commented Aug 15, 2021

bd82 commented Oct 9, 2021

Support for an array of token types in `longer_alt` property #1602

Support for an array of token types in `longer_alt` property #1602

msujew commented Aug 8, 2021 •

edited

Loading