Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the Regex source generator in the AlphaRouteConstraint. #44770

Merged
merged 2 commits into from
Nov 16, 2022

Conversation

eerhardt
Copy link
Member

The Regex source generator is faster because it doesn't need to interpret the regex. It does add slightly more code to the compiled app, but the tradeoff seems worth it.

I switched the pattern from a-z with IgnoreCase to A-Za-z because the source generator generates better code with that pattern.

The results of the changed Benchmark shows a 9-10% faster when using an alpha constraint:

Before

Method Mean Error StdDev Op/s Ratio Gen 0 Gen 1 Gen 2 Allocated
TreeRouter 1.362 us 0.0107 us 0.0100 us 734,023.0 1.00 0.0134 - - 1 KB
EndpointRouting 1.763 us 0.0107 us 0.0100 us 567,280.4 1.29 0.0134 - - 1 KB

After

Method Mean Error StdDev Op/s Ratio Gen 0 Gen 1 Gen 2 Allocated
TreeRouter 1.230 us 0.0087 us 0.0082 us 813,331.4 1.00 0.0134 - - 1 KB
EndpointRouting 1.615 us 0.0068 us 0.0063 us 619,255.2 1.31 0.0153 - - 1 KB

Size

Publishing a Linux NativeAOT app shows about a 4 KB increase in code size with this change:

Before: 27.1 MB (28,483,016 bytes)
After : 27.1 MB (28,487,464 bytes)

The Regex source generator is faster because it doesn't need to interpret the regex. It does add slightly more code to the compiled app, but the tradeoff seems worth it.

I switched the pattern from `a-z` with IgnoreCase to `A-Za-z` because the source generator generates better code with that pattern.
Copy link
Member

@javiercn javiercn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

@build-analysis build-analysis bot mentioned this pull request Oct 27, 2022
2 tasks
@gfoidl
Copy link
Member

gfoidl commented Oct 28, 2022

Note for further optimization potential:
Instead of relying on Regex for this, the Match could be expressed in terms of dotnet/runtime#76106 or dotnet/runtime#68328, thus avoiding Regex's overhead (and allocations).

@danmoseley
Copy link
Member

danmoseley commented Oct 28, 2022

I switched the pattern from a-z with IgnoreCase to A-Za-z because the source generator generates better code with that pattern.

Is there a regex issue for that? @stephentoub
(This is invariant culture)

@stephentoub
Copy link
Member

stephentoub commented Oct 28, 2022

I switched the pattern from a-z with IgnoreCase to A-Za-z because the source generator generates better code with that pattern.

Is there a regex issue for that? @stephentoub (This is invariant culture)

There's no "issue"... it's by design: [a-z] (IgnoreCase) and [A-Za-z] are functionally different (ever so slightly)... the former recognizes the Kelvin sign as being case-equivalent to k/K. So the difference in the code gen is that when using [A-Za-z], we generate:

char.IsAsciiLetter(slice[iteration])

whereas with [a-z] under IgnoreCase, we generate:

((ch = slice[iteration]) < 128 ? ("\0\0\0\0\ufffe\u07ff\ufffe\u07ff"[ch >> 4] & (1 << (ch & 0xF))) != 0 : RegexRunner.CharInClass((char)ch, "\0\u0006\0A[a{KÅ"))

With more work and analysis we could optimize the latter to instead be:

char.IsAsciiLetter(ch = slice[iteration]) || ch == '\u212A'

or something like that, but it still has the additional check.

(I think it's fine to use [A-Za-z] here, though. Technically it's a breaking change, but I struggle to imagine anyone relying on Kelvin being equal to k or K in routes :)

@eerhardt
Copy link
Member Author

Instead of relying on Regex for this,

The public inheritance here is a problem.

AlphaRouteConstraint is a public class that inherits from RegexRouteConstraint.

  1. RegexRouteConstraint's constructors take a Regex or a string. I would need to make a new internal ctor just for this class.
  2. RegexRouteConstraint's implementation of IRouteConstraint.Match is not virtual. In order to make this work without any breaking changes, I would need to explicitly implement IRouteConstraint.Match, and you would get different behavior depending on how you cast the object.

@eerhardt
Copy link
Member Author

the former recognizes the Kelvin sign as being case-equivalent to k/K.

Sounds like that was an existing bug, given its documentation: https://learn.microsoft.com/en-us/dotnet/api/system.web.mvc.routing.constraints.alpharouteconstraint

Constrains a route parameter to contain only lowercase or uppercase letters A through Z in the English alphabet.

for what it is worth:

https://xkcd.com/
https://xKcd.com/
https://xKcd.com/

All take me to the same place.

@danmoseley
Copy link
Member

With more work and analysis we could optimize the latter to instead be:
char.IsAsciiLetter(ch = slice[iteration]) || ch == '\u212A'

On the face of it that seems like that would be a worthwhile improvement for a fairly common looking pattern?

@stephentoub
Copy link
Member

dotnet/runtime#77604

@eerhardt
Copy link
Member Author

I believe all feedback has been addressed. This should be ready for review.

@eerhardt eerhardt merged commit 442ca2f into dotnet:main Nov 16, 2022
@eerhardt eerhardt deleted the AlphaRegexSG-main branch November 16, 2022 15:22
@ghost ghost added this to the 8.0-preview1 milestone Nov 16, 2022
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions Perf
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants