Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not unnecessarily rescan a string for a literal in a RegExp disjunction #754

Merged

Conversation

goyakin
Copy link

@goyakin goyakin commented Apr 6, 2016

Assume we have the RegExp /<(foo|bar)/ and the input string "<0bar<0bar<0bar".
When we try to match the string, we first scan it fully for "foo", but can't
find it. Then we scan for "bar" and find it at index 2. However, since there is
no '<' right before it, we continue with our search. We currently do the same
thing two more times starting at indexes 7 and 12 (since those are where the
"bar"s are), each time scanning the rest of the string fully for "foo".

However, if we cache the furthest offsets we tried, we can skip the searches
for "foo" after the first time.

This change has no effect on the general perf.

VSO bug: 4540543
jQuery issue: jquery/jquery#2563

…ction

Assume we have the RegExp /<(foo|bar)/ and the input string "<0bar<0bar<0bar".
When we try to match the string, we first scan it fully for "foo", but can't
find it. Then we scan for "bar" and find it at index 2. However, since there is
no '<' right before it, we continue with our search. We currently do the same
thing two more times starting at indexes 7 and 12 (since those are where the
"bar"s are), each time scanning the rest of the string fully for "foo".

However, if we cache the furthest offsets we tried, we can skip the searches
for "foo" after the first time.
@goyakin
Copy link
Author

goyakin commented Apr 6, 2016

@abchatra & @akroshg, could you take a look?

@goyakin
Copy link
Author

goyakin commented Apr 8, 2016

Ping...

matcher.literalNextSyncInputOffsets =
RecyclerNewArrayLeaf(matcher.recycler, CharCount, ScannersMixin::MaxNumSyncLiterals);
}
CharCount* literalNextSyncInputOffsets = matcher.literalNextSyncInputOffsets;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the assert here for numLiterals is lesser than MaxNumSincLiterals.
Rest looks good to me.

@chakrabot chakrabot merged commit d72cfaa into chakra-core:release/1.2 Apr 12, 2016
chakrabot pushed a commit that referenced this pull request Apr 12, 2016
…RegExp disjunction

Merge pull request #754 from goyakin:regexp-literal-scanner-perf
chakrabot pushed a commit that referenced this pull request Apr 12, 2016
…literal in a RegExp disjunction

Merge pull request #754 from goyakin:regexp-literal-scanner-perf
@goyakin goyakin deleted the regexp-literal-scanner-perf branch April 14, 2016 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants