Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lexer optimizations using the next charCode in the remaining text. #679

Closed
bd82 opened this issue Mar 10, 2018 · 2 comments
Closed

Lexer optimizations using the next charCode in the remaining text. #679

bd82 opened this issue Mar 10, 2018 · 2 comments

Comments

@bd82
Copy link
Member

bd82 commented Mar 10, 2018

If we know which characters can start each tokenType match.
We can drastically reduce the number of of regExp we need to attempt after every token has been matched.

This can be done by building a map between the start characters and an array of allowed tokenTypes.

@bd82 bd82 added this to the 4.0 milestone Mar 24, 2018
@bd82 bd82 removed this from the 4.0 milestone Apr 13, 2018
@bd82 bd82 changed the title Investigate Lexer optimizations when there are many tokenTypes. Lexer optimizations using the next charCode in the remaining text. Apr 13, 2018
@bd82
Copy link
Member Author

bd82 commented Apr 13, 2018

Benefits

Full flow benchmark has improved by 25% for a simple grammar (JSON),
60% for a medium sized grammar (CSS) and 100% for a grammar with a very large lexer (JDL).
In general the more TokenTypes there are the greater the benefit.

When can optimization be applied?

In general the Lexer always attempts to apply the optimization.
However, in some cases it cannot and will silently revert back to the old (slower) behavior.
See section 2 in How do I Maximize my parser's performance? FAQ. For details how to ensure a Lexer enjoys these optimizations.

How can the optimization be disabled?

As with every new logic there may be bugs, If there exists a suspision that the
optimization has caused incorrect behavior, simply enable "safeMode" to disable it and re-check.

     const { Lexer } = require("chevrotain")
     const myLexer = new Lexer([/* tokens */], { safeMode : true })

@bd82
Copy link
Member Author

bd82 commented Apr 14, 2018

Released in 3.1.0

@bd82 bd82 closed this as completed Apr 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant