New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of Regex ctor and IsMatch #231

Merged
merged 1 commit into from Dec 12, 2014

Conversation

Projects
None yet
5 participants
@stephentoub
Member

stephentoub commented Dec 12, 2014

The Regex class maintains a cache of byte codes, which the Regex ctor indexes into using a key. It uses this seemingly innocuous line to create that key:

String key = ((int)options).ToString(NumberFormatInfo.InvariantInfo) + ":" + cultureKey + ":" + pattern;

This, however, has the unfortunate effect of allocating a string for the options, a string array for the five strings to be passed to the String.Concat call generated by the compiler, another string array allocation inside of Concat, and then the resulting string for the whole operation. The cost of those allocations is causing a non-trivial slowdown for repeated Regex.IsMatch calls for simple regular expressions, such as for a phone number (e.g. from the MSDN docs "^\d{3}-\d{3}-\d{4}$").

This commit adds a new struct key type that just stores the constitutent options, cultureKey, and pattern, rather than creating a string to store them. That key is then what's stored in each entry in the cache.

For repeated Regex.IsMatch calls for basic regular expressions like the phone number one previously mentioned, on my machine this improves throughput by ~35%, in large part due to an ~80% reduction in number of allocations, and (for this particular test case) an ~70% reduction in number of bytes allocated (it depends primarily on the length of the pattern and the length of the culture name).

Improve performance of Regex ctor and IsMatch
The Regex class maintains a cache of byte codes, which the Regex ctor indexes into using a key.  It uses this seemingly innocuous line to create that key:

String key = ((int)options).ToString(NumberFormatInfo.InvariantInfo) + ":" + cultureKey + ":" + pattern;

This, however, has the unfortunate effect of allocating a string for the options, a string array for the five strings to be passed to the String.Concat call generated by the compiler, another string array allocation inside of Concat, and then the resulting string for the whole operation.  The cost of those allocations is causing a non-trivial slowdown for repeated Regex.IsMatch calls for simple regular expressions, such as for a phone number (e.g. "^\\d{3}-\\d{3}-\\d{4}$").

This commit adds a new struct key type that just stores the constitutent options, cultureKey, and pattern, rather than creating a string to store them.  That key is then what's stored in each entry in the cache.

For repeated Regex.IsMatch calls for basic regular expressions like the phone number one previously mentioned, on my machine this improves throughput by ~35%, in large part due to ~80% reduction in number of allocations, and (for this particular test case) an ~70% reduction in number of bytes allocated (it depends primarily on the length of the pattern and the length of the culture name).
@ellismg

This comment has been minimized.

Show comment
Hide comment
@ellismg

ellismg Dec 12, 2014

Contributor

Looks good.

Contributor

ellismg commented Dec 12, 2014

Looks good.

@nguerrera

This comment has been minimized.

Show comment
Hide comment
@nguerrera

nguerrera Dec 12, 2014

Member

👍

Member

nguerrera commented Dec 12, 2014

👍

stephentoub added a commit that referenced this pull request Dec 12, 2014

Merge pull request #231 from stephentoub/regex_ctor_perf
Improve performance of Regex ctor and IsMatch

@stephentoub stephentoub merged commit a3a485e into dotnet:master Dec 12, 2014

1 check passed

continuous-integration/appveyor AppVeyor build succeeded
Details

@stephentoub stephentoub deleted the stephentoub:regex_ctor_perf branch Dec 12, 2014

@cnblogs-dudu

This comment has been minimized.

Show comment
Hide comment
@cnblogs-dudu

cnblogs-dudu Dec 15, 2014

"constitutent" -> "constituent"

cnblogs-dudu commented on 9820567 Dec 15, 2014

"constitutent" -> "constituent"

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Dec 15, 2014

Member

@cnblogs-dudu, thanks, yes, I had a typo in my commit description. But that typo's not in the actual code, is it?

Member

stephentoub replied Dec 15, 2014

@cnblogs-dudu, thanks, yes, I had a typo in my commit description. But that typo's not in the actual code, is it?

This comment has been minimized.

Show comment
Hide comment
@cnblogs-dudu

cnblogs-dudu Dec 15, 2014

@stephentoub yes, merely in commit description.

cnblogs-dudu replied Dec 15, 2014

@stephentoub yes, merely in commit description.

@karelz karelz modified the milestone: 1.0.0-rtm Dec 3, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment