Use RegExp sticky flag for lexer performance boost. #380

bd82 · 2017-03-11T15:13:25Z

bd82 · 2017-03-12T22:25:56Z

A small POC showed 8-10% improvement in an E2E (lex + parse) JSON benchmark.
So that may be as much as 20% boost for the lexer itself.
Quite amazing that 1/5 of the lexer time may have been spent chopping up strings with substr...

getting this to productive quality requires handling some issues:

Custom Token patterns API may have to be modified.
How to avoid explosion of Lexer variants.
How to support this capability optionally only on newer JS engine where the sticky option is available.

leeoniya · 2017-03-12T22:58:04Z

avoiding string allocation in hot code via substr, substring, slice is a good perf optimization. you're also better off checking string prefixes (if needed) via array access to individual chars and plain for loops or [1].

btw, native Array.forEach loops are slow.

[1] http://stackoverflow.com/a/4579228

bd82 · 2017-03-12T23:18:06Z

Thanks for the feedback @leeoniya

also better off checking string prefixes (if needed) via array access to individual chars

The Chevrotain Lexer is based on Regular expressions, so there is no access to individual chars.
The lexer just tries matching against the provided patterns in a loop until it find a match.

avoiding string allocation in hot code via substr, substring, slice is a good perf optimization.

I'm already using substring, The optimization I'm talking about here is to completely avoid using substring. By using the RegExp new sticky flag combined with changing the regExp lastIndex property it is possible to match from any point in the string.

Currently Chevrotain is chopping off string prefixes using substring.
So if there are 10,000 tokens, substring would have been called 10,000 times during lexing...
That is quite expensive...

btw, native Array.forEach loops are slow.

Aye, I use the good old fashioned for loops in hot spot code.
For less important (performance wise) code I use a small set of utility functions
similar to lodash/underscore style.

leeoniya · 2017-03-12T23:23:18Z

I'm already using substring, The optimization I'm talking about here is to completely avoid using substring.

right, i agreed that avoiding it is the right way to go. i did a similar optimization in one of my libs recently and was also surprised by how big the boost was.

bd82 · 2017-03-19T16:50:56Z

Breaking Changes

Custom Token patterns used to be called with three arguments.

     function matchInteger(text, matchedTokens, groups) {}

There are now four arguments.

     function matchInteger(text, offset, matchedTokens, groups) {}

The custom match is expected to be performed beginning from the offset argument in the text.
Previously the custom match always assumed the match must be performed from the start
of the input text.

bd82 · 2017-03-19T17:59:39Z

re-opening as a reminder to update the safari custom pattern in the performance
benchmark.

bd82 · 2017-03-19T19:16:08Z

Final performance results showed almost 30% for a simpler JSON Lexer.
More complex lexers should show lower gains as a smaller percentage of their runtime
is spent performing substrings on the input.

bd82 · 2017-03-19T19:17:35Z

This is only applicable to modern JS engines.
On node.js version 4 the Lexer will run in the old legacy mode without any performance benefit.
Same on IE11.

But anyone using IE11 is doomed anyways 😄

bd82 · 2017-03-19T21:50:34Z

Safari related performance workaround was no longer needed and was removed.

bd82 added the Performance label Mar 11, 2017

bd82 changed the title ~~Investigate Lexer performance optimization~~ Investigate Lexer performance optimization. Mar 11, 2017

bd82 changed the title ~~Investigate Lexer performance optimization.~~ Use RegExp sticky flag for performance optimizations. Mar 18, 2017

bd82 changed the title ~~Use RegExp sticky flag for performance optimizations.~~ Use RegExp sticky flag for performance boost. Mar 18, 2017

bd82 changed the title ~~Use RegExp sticky flag for performance boost.~~ Use RegExp sticky flag for lexer performance boost. Mar 18, 2017

bd82 mentioned this issue Mar 19, 2017

Using sticky flag for lexer performance. #405

Merged

bd82 closed this as completed in #405 Mar 19, 2017

bd82 reopened this Mar 19, 2017

bd82 closed this as completed Mar 23, 2017

bd82 mentioned this issue Mar 27, 2017

May want to try the benchmark with 0.26.0: christianvoigt/argdown-parser#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use RegExp sticky flag for lexer performance boost. #380

Use RegExp sticky flag for lexer performance boost. #380

bd82 commented Mar 11, 2017 •

edited

Loading

bd82 commented Mar 12, 2017

leeoniya commented Mar 12, 2017 •

edited

Loading

bd82 commented Mar 12, 2017

leeoniya commented Mar 12, 2017

bd82 commented Mar 19, 2017 •

edited

Loading

bd82 commented Mar 19, 2017

bd82 commented Mar 19, 2017

bd82 commented Mar 19, 2017

bd82 commented Mar 19, 2017

Use RegExp sticky flag for lexer performance boost. #380

Use RegExp sticky flag for lexer performance boost. #380

Comments

bd82 commented Mar 11, 2017 • edited Loading

bd82 commented Mar 12, 2017

leeoniya commented Mar 12, 2017 • edited Loading

bd82 commented Mar 12, 2017

leeoniya commented Mar 12, 2017

bd82 commented Mar 19, 2017 • edited Loading

Breaking Changes

bd82 commented Mar 19, 2017

bd82 commented Mar 19, 2017

bd82 commented Mar 19, 2017

bd82 commented Mar 19, 2017

bd82 commented Mar 11, 2017 •

edited

Loading

leeoniya commented Mar 12, 2017 •

edited

Loading

bd82 commented Mar 19, 2017 •

edited

Loading