Lexer optimizations using the next charCode in the remaining text. #679

bd82 · 2018-03-10T23:32:53Z

If we know which characters can start each tokenType match.
We can drastically reduce the number of of regExp we need to attempt after every token has been matched.

This can be done by building a map between the start characters and an array of allowed tokenTypes.

bd82 · 2018-04-13T16:02:53Z

Benefits

Full flow benchmark has improved by 25% for a simple grammar (JSON),
60% for a medium sized grammar (CSS) and 100% for a grammar with a very large lexer (JDL).
In general the more TokenTypes there are the greater the benefit.

When can optimization be applied?

In general the Lexer always attempts to apply the optimization.
However, in some cases it cannot and will silently revert back to the old (slower) behavior.
See section 2 in How do I Maximize my parser's performance? FAQ. For details how to ensure a Lexer enjoys these optimizations.

How can the optimization be disabled?

As with every new logic there may be bugs, If there exists a suspision that the
optimization has caused incorrect behavior, simply enable "safeMode" to disable it and re-check.

     const { Lexer } = require("chevrotain")
     const myLexer = new Lexer([/* tokens */], { safeMode : true })

bd82 · 2018-04-14T09:43:17Z

Released in 3.1.0

bd82 added the Performance label Mar 10, 2018

bd82 added this to the 4.0 milestone Mar 24, 2018

bd82 removed this from the 4.0 milestone Apr 13, 2018

bd82 changed the title ~~Investigate Lexer optimizations when there are many tokenTypes.~~ Lexer optimizations using the next charCode in the remaining text. Apr 13, 2018

bd82 mentioned this issue Apr 13, 2018

Lexer performance boost by inspecting the first character #682

Merged

bd82 closed this as completed Apr 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lexer optimizations using the next charCode in the remaining text. #679

Lexer optimizations using the next charCode in the remaining text. #679

bd82 commented Mar 10, 2018

bd82 commented Apr 13, 2018 •

edited

Loading

bd82 commented Apr 14, 2018

Lexer optimizations using the next charCode in the remaining text. #679

Lexer optimizations using the next charCode in the remaining text. #679

Comments

bd82 commented Mar 10, 2018

bd82 commented Apr 13, 2018 • edited Loading

Benefits

When can optimization be applied?

How can the optimization be disabled?

bd82 commented Apr 14, 2018

bd82 commented Apr 13, 2018 •

edited

Loading