Vectorized HttpCharacters #44041

gfoidl · 2022-09-17T19:57:17Z

⚠️ There are some open questions from my side -- thus it's a draft PR

Note: after I created this PR I got pushed towards dotnet/runtime#68328, with which I think this can be solved in a more general runtime-like way. I don't know the outcome from that issue (yet), so please lets hold on with this PR in the meantime. Should we close it instead and eventually re-open if the mentioned issue doesn't fit?
At least the the "extended" case a special handling is needed -- so if the issue has a usable solution, this PR can be trimmed down if the "extended" case is worth it (which I doubt, as it's seldom?).

As outlined in #33776 (comment) these methods are vectorized now.

Vectorization is not a plain win, there are trade-offs due to a bit more overhead. In the numbers belows this can be seen quite good. There seems to be a pattern were perf regresses:

very short inputs
if a invalid char is at the very beginning

In these cases a scalar-loop is faster, as it has less work to do in comparison to vectorized code.
But the longer the input, the farther away from the beginning the invalid char is or if there's no invalid char at all, vectorization shows nice improvements.

I expect that in real-life most of the input is valid, thus no invalid char is found, as well as inputs are long enough for vectorization to show it's goodness.
If my assumption is wrong, at least for specific method, we could disable vectorizaiton on these.

|     Method |                           Categories | Length | InvalidCharPos |        Mean | Ratio |
|----------- |------------------------------------- |------- |--------------- |------------:|------:|
|    Default |         ContainsInvalidAuthorityChar |      7 |           None |   5.8321 ns |  1.00 |
| Vectorized |         ContainsInvalidAuthorityChar |      7 |           None |   5.5462 ns |  0.95 |
|            |                                      |        |                |             |       |
|    Default |         ContainsInvalidAuthorityChar |      7 |          Start |   0.6175 ns |  1.00 |
| Vectorized |         ContainsInvalidAuthorityChar |      7 |          Start |   1.4070 ns |  2.30 |
|            |                                      |        |                |             |       |
|    Default |         ContainsInvalidAuthorityChar |      7 |            End |   5.5931 ns |  1.00 |
| Vectorized |         ContainsInvalidAuthorityChar |      7 |            End |   5.4145 ns |  0.97 |
|            |                                      |        |                |             |       |
|    Default |         ContainsInvalidAuthorityChar |     15 |           None |  10.9659 ns |  1.00 |
| Vectorized |         ContainsInvalidAuthorityChar |     15 |           None |  11.0064 ns |  1.00 |
|            |                                      |        |                |             |       |
|    Default |         ContainsInvalidAuthorityChar |     15 |          Start |   0.6444 ns |  1.00 |
| Vectorized |         ContainsInvalidAuthorityChar |     15 |          Start |   1.2530 ns |  1.94 |
|            |                                      |        |                |             |       |
|    Default |         ContainsInvalidAuthorityChar |     15 |            End |  10.6669 ns |  1.00 |
| Vectorized |         ContainsInvalidAuthorityChar |     15 |            End |  10.7583 ns |  1.01 |
|            |                                      |        |                |             |       |
|    Default |         ContainsInvalidAuthorityChar |    113 |           None |  87.7130 ns |  1.00 |
| Vectorized |         ContainsInvalidAuthorityChar |    113 |           None |  13.7753 ns |  0.16 |
|            |                                      |        |                |             |       |
|    Default |         ContainsInvalidAuthorityChar |    113 |          Start |   0.6885 ns |  1.00 |
| Vectorized |         ContainsInvalidAuthorityChar |    113 |          Start |   3.1128 ns |  4.57 |
|            |                                      |        |                |             |       |
|    Default |         ContainsInvalidAuthorityChar |    113 |            End |  89.8752 ns |  1.00 |
| Vectorized |         ContainsInvalidAuthorityChar |    113 |            End |  13.2677 ns |  0.15 |
|            |                                      |        |                |             |       |
|    Default |         IndexOfInvalidFieldValueChar |      7 |           None |   5.0010 ns |  1.00 |
| Vectorized |         IndexOfInvalidFieldValueChar |      7 |           None |   6.4814 ns |  1.30 |
|            |                                      |        |                |             |       |
|    Default |         IndexOfInvalidFieldValueChar |      7 |          Start |   0.5463 ns |  1.00 |
| Vectorized |         IndexOfInvalidFieldValueChar |      7 |          Start |   1.0296 ns |  1.88 |
|            |                                      |        |                |             |       |
|    Default |         IndexOfInvalidFieldValueChar |      7 |            End |   4.9472 ns |  1.00 |
| Vectorized |         IndexOfInvalidFieldValueChar |      7 |            End |   6.1276 ns |  1.24 |
|            |                                      |        |                |             |       |
|    Default |         IndexOfInvalidFieldValueChar |     15 |           None |  10.0493 ns |  1.00 |
| Vectorized |         IndexOfInvalidFieldValueChar |     15 |           None |   5.2853 ns |  0.53 |
|            |                                      |        |                |             |       |
|    Default |         IndexOfInvalidFieldValueChar |     15 |          Start |   0.5595 ns |  1.00 |
| Vectorized |         IndexOfInvalidFieldValueChar |     15 |          Start |   3.7454 ns |  6.78 |
|            |                                      |        |                |             |       |
|    Default |         IndexOfInvalidFieldValueChar |     15 |            End |  12.8468 ns |  1.00 |
| Vectorized |         IndexOfInvalidFieldValueChar |     15 |            End |   7.2468 ns |  0.58 |
|            |                                      |        |                |             |       |
|    Default |         IndexOfInvalidFieldValueChar |    113 |           None |  92.7111 ns |  1.00 |
| Vectorized |         IndexOfInvalidFieldValueChar |    113 |           None |  15.7101 ns |  0.17 |
|            |                                      |        |                |             |       |
|    Default |         IndexOfInvalidFieldValueChar |    113 |          Start |   0.5651 ns |  1.00 |
| Vectorized |         IndexOfInvalidFieldValueChar |    113 |          Start |   3.8203 ns |  6.84 |
|            |                                      |        |                |             |       |
|    Default |         IndexOfInvalidFieldValueChar |    113 |            End | 100.5039 ns |  1.00 |
| Vectorized |         IndexOfInvalidFieldValueChar |    113 |            End |  16.9102 ns |  0.17 |
|            |                                      |        |                |             |       |
|    Default | IndexOfInvalidFieldValueCharExtended |      7 |           None |   6.1985 ns |  1.00 |
| Vectorized | IndexOfInvalidFieldValueCharExtended |      7 |           None |   6.9981 ns |  1.13 |
|            |                                      |        |                |             |       |
|    Default | IndexOfInvalidFieldValueCharExtended |      7 |          Start |   0.4000 ns |  1.00 |
| Vectorized | IndexOfInvalidFieldValueCharExtended |      7 |          Start |   1.4511 ns |  3.61 |
|            |                                      |        |                |             |       |
|    Default | IndexOfInvalidFieldValueCharExtended |      7 |            End |   5.9051 ns |  1.00 |
| Vectorized | IndexOfInvalidFieldValueCharExtended |      7 |            End |   7.8430 ns |  1.32 |
|            |                                      |        |                |             |       |
|    Default | IndexOfInvalidFieldValueCharExtended |     15 |           None |  12.6932 ns |  1.00 |
| Vectorized | IndexOfInvalidFieldValueCharExtended |     15 |           None |   6.3665 ns |  0.51 |
|            |                                      |        |                |             |       |
|    Default | IndexOfInvalidFieldValueCharExtended |     15 |          Start |   0.6317 ns |  1.00 |
| Vectorized | IndexOfInvalidFieldValueCharExtended |     15 |          Start |   4.1039 ns |  6.57 |
|            |                                      |        |                |             |       |
|    Default | IndexOfInvalidFieldValueCharExtended |     15 |            End |  11.2936 ns |  1.00 |
| Vectorized | IndexOfInvalidFieldValueCharExtended |     15 |            End |   6.2646 ns |  0.56 |
|            |                                      |        |                |             |       |
|    Default | IndexOfInvalidFieldValueCharExtended |    113 |           None |  88.2136 ns |  1.00 |
| Vectorized | IndexOfInvalidFieldValueCharExtended |    113 |           None |  31.7663 ns |  0.36 |
|            |                                      |        |                |             |       |
|    Default | IndexOfInvalidFieldValueCharExtended |    113 |          Start |   0.6923 ns |  1.00 |
| Vectorized | IndexOfInvalidFieldValueCharExtended |    113 |          Start |   3.7021 ns |  5.51 |
|            |                                      |        |                |             |       |
|    Default | IndexOfInvalidFieldValueCharExtended |    113 |            End |  91.9725 ns |  1.00 |
| Vectorized | IndexOfInvalidFieldValueCharExtended |    113 |            End |  32.3834 ns |  0.35 |
|            |                                      |        |                |             |       |
|    Default |               IndexOfInvalidHostChar |      7 |           None |   5.5270 ns |  1.00 |
| Vectorized |               IndexOfInvalidHostChar |      7 |           None |   6.8330 ns |  1.23 |
|            |                                      |        |                |             |       |
|    Default |               IndexOfInvalidHostChar |      7 |          Start |   0.5591 ns |  1.00 |
| Vectorized |               IndexOfInvalidHostChar |      7 |          Start |   1.1605 ns |  2.08 |
|            |                                      |        |                |             |       |
|    Default |               IndexOfInvalidHostChar |      7 |            End |   5.1811 ns |  1.00 |
| Vectorized |               IndexOfInvalidHostChar |      7 |            End |   6.6147 ns |  1.28 |
|            |                                      |        |                |             |       |
|    Default |               IndexOfInvalidHostChar |     15 |           None |  10.9923 ns |  1.00 |
| Vectorized |               IndexOfInvalidHostChar |     15 |           None |   4.9088 ns |  0.45 |
|            |                                      |        |                |             |       |
|    Default |               IndexOfInvalidHostChar |     15 |          Start |   0.5255 ns |  1.00 |
| Vectorized |               IndexOfInvalidHostChar |     15 |          Start |   3.3160 ns |  6.36 |
|            |                                      |        |                |             |       |
|    Default |               IndexOfInvalidHostChar |     15 |            End |  10.0873 ns |  1.00 |
| Vectorized |               IndexOfInvalidHostChar |     15 |            End |   5.8072 ns |  0.58 |
|            |                                      |        |                |             |       |
|    Default |               IndexOfInvalidHostChar |    113 |           None |  99.1043 ns |  1.00 |
| Vectorized |               IndexOfInvalidHostChar |    113 |           None |  15.6849 ns |  0.16 |
|            |                                      |        |                |             |       |
|    Default |               IndexOfInvalidHostChar |    113 |          Start |   0.3126 ns |  1.00 |
| Vectorized |               IndexOfInvalidHostChar |    113 |          Start |   3.4211 ns | 11.04 |
|            |                                      |        |                |             |       |
|    Default |               IndexOfInvalidHostChar |    113 |            End |  91.7623 ns |  1.00 |
| Vectorized |               IndexOfInvalidHostChar |    113 |            End |  14.9058 ns |  0.16 |
|            |                                      |        |                |             |       |
|    Default |        IndexOfInvalidTokenChar_Bytes |      7 |           None |   6.4794 ns |  1.00 |
| Vectorized |        IndexOfInvalidTokenChar_Bytes |      7 |           None |   5.5825 ns |  0.87 |
|            |                                      |        |                |             |       |
|    Default |        IndexOfInvalidTokenChar_Bytes |      7 |          Start |   0.5898 ns |  1.00 |
| Vectorized |        IndexOfInvalidTokenChar_Bytes |      7 |          Start |   1.1259 ns |  1.90 |
|            |                                      |        |                |             |       |
|    Default |        IndexOfInvalidTokenChar_Bytes |      7 |            End |   6.0219 ns |  1.00 |
| Vectorized |        IndexOfInvalidTokenChar_Bytes |      7 |            End |   5.2224 ns |  0.87 |
|            |                                      |        |                |             |       |
|    Default |        IndexOfInvalidTokenChar_Bytes |     15 |           None |  12.3160 ns |  1.00 |
| Vectorized |        IndexOfInvalidTokenChar_Bytes |     15 |           None |  11.1012 ns |  0.91 |
|            |                                      |        |                |             |       |
|    Default |        IndexOfInvalidTokenChar_Bytes |     15 |          Start |   0.7251 ns |  1.00 |
| Vectorized |        IndexOfInvalidTokenChar_Bytes |     15 |          Start |   1.0208 ns |  1.40 |
|            |                                      |        |                |             |       |
|    Default |        IndexOfInvalidTokenChar_Bytes |     15 |            End |  11.9562 ns |  1.00 |
| Vectorized |        IndexOfInvalidTokenChar_Bytes |     15 |            End |  11.0921 ns |  0.93 |
|            |                                      |        |                |             |       |
|    Default |        IndexOfInvalidTokenChar_Bytes |    113 |           None |  93.1642 ns |  1.00 |
| Vectorized |        IndexOfInvalidTokenChar_Bytes |    113 |           None |  12.9503 ns |  0.14 |
|            |                                      |        |                |             |       |
|    Default |        IndexOfInvalidTokenChar_Bytes |    113 |          Start |   0.6146 ns |  1.00 |
| Vectorized |        IndexOfInvalidTokenChar_Bytes |    113 |          Start |   2.9311 ns |  4.80 |
|            |                                      |        |                |             |       |
|    Default |        IndexOfInvalidTokenChar_Bytes |    113 |            End |  95.4803 ns |  1.00 |
| Vectorized |        IndexOfInvalidTokenChar_Bytes |    113 |            End |  13.4628 ns |  0.14 |
|            |                                      |        |                |             |       |
|    Default |       IndexOfInvalidTokenChar_String |      7 |           None |   6.0107 ns |  1.00 |
| Vectorized |       IndexOfInvalidTokenChar_String |      7 |           None |   7.4539 ns |  1.24 |
|            |                                      |        |                |             |       |
|    Default |       IndexOfInvalidTokenChar_String |      7 |          Start |   0.6448 ns |  1.00 |
| Vectorized |       IndexOfInvalidTokenChar_String |      7 |          Start |   1.0052 ns |  1.61 |
|            |                                      |        |                |             |       |
|    Default |       IndexOfInvalidTokenChar_String |      7 |            End |   5.4034 ns |  1.00 |
| Vectorized |       IndexOfInvalidTokenChar_String |      7 |            End |   6.7783 ns |  1.25 |
|            |                                      |        |                |             |       |
|    Default |       IndexOfInvalidTokenChar_String |     15 |           None |  10.7808 ns |  1.00 |
| Vectorized |       IndexOfInvalidTokenChar_String |     15 |           None |   4.6405 ns |  0.43 |
|            |                                      |        |                |             |       |
|    Default |       IndexOfInvalidTokenChar_String |     15 |          Start |   0.4462 ns |  1.00 |
| Vectorized |       IndexOfInvalidTokenChar_String |     15 |          Start |   3.2645 ns |  7.42 |
|            |                                      |        |                |             |       |
|    Default |       IndexOfInvalidTokenChar_String |     15 |            End |  10.3272 ns |  1.00 |
| Vectorized |       IndexOfInvalidTokenChar_String |     15 |            End |   4.9606 ns |  0.48 |
|            |                                      |        |                |             |       |
|    Default |       IndexOfInvalidTokenChar_String |    113 |           None |  88.7606 ns |  1.00 |
| Vectorized |       IndexOfInvalidTokenChar_String |    113 |           None |  14.8116 ns |  0.17 |
|            |                                      |        |                |             |       |
|    Default |       IndexOfInvalidTokenChar_String |    113 |          Start |   0.4781 ns |  1.00 |
| Vectorized |       IndexOfInvalidTokenChar_String |    113 |          Start |   3.1777 ns |  6.71 |
|            |                                      |        |                |             |       |
|    Default |       IndexOfInvalidTokenChar_String |    113 |            End |  93.7237 ns |  1.00 |
| Vectorized |       IndexOfInvalidTokenChar_String |    113 |            End |  14.8531 ns |  0.16 |

For a description how the approach works, please see comments in code.

ghost · 2022-09-17T19:57:25Z

Thanks for your PR, @gfoidl. Someone from the team will get assigned to your PR shortly and we'll get it reviewed.

gfoidl · 2022-09-17T20:04:29Z

src/Shared/ServerInfrastructure/HttpCharacters.cs


 namespace Microsoft.AspNetCore.Http;

-internal static class HttpCharacters
+[SkipLocalsInit]
+internal static unsafe partial class HttpCharacters


I developed this with a playground-project in https://github.com/gfoidl-Tests/SourceGenerator-Vectors

For best codegen, a code-generator is used (in the project it's still the non-incremental one, I didn't update that code).

Methods in the generator do "the same" as was done in the static init before.

Thus there's no need for static initialization at runtime, instead

ROS<byte> -- will refer to the assembly's static data segment -- can be used

for vectors it's best to use constants like Vector128.Create((byte)0xF) inline instead of loading the from static readonly fields

In this draft-PR I just copied the output from the source-generator over to here (see file below).

What's the right strategy to generate such a file?

A source-generator, tt-file, etc. Please advise so I can update the PR here to make it suite.

gfoidl · 2022-09-17T20:05:41Z

src/Shared/ServerInfrastructure/HttpCharacters.generated.cs

+
+namespace Microsoft.AspNetCore.Http;
+
+[CompilerGenerated]


See comment above -- right now this is just copy & paste (+ little editing) from my playgroud-project.

gfoidl · 2022-10-20T10:05:15Z

I'll close this PR, as I'm pretty sure with dotnet/runtime#68328 we get a proper building block and the wire-up should become trivial.

Only the "extended case" isn't covered by the runtime-issue. In the worst case it remains the scalar-approach as it's now, so no regression should happen, but also no improvement.

ghost · 2022-10-20T10:05:24Z

Hi @gfoidl. It looks like you just commented on a closed PR. The team will most probably miss it. If you'd like to bring something important up to their attention, consider filing a new issue and add enough details to build context.

gfoidl · 2022-11-22T09:03:40Z

dotnet/runtime#68328 is done 🎉, so in the next days I'll create a PR based on these new APIs.

ghost · 2022-11-22T09:03:44Z

Hi @gfoidl. It looks like you just commented on a closed PR. The team will most probably miss it. If you'd like to bring something important up to their attention, consider filing a new issue and add enough details to build context.

Vectorized HttpCharacters

40a40e7

ghost added the community-contribution Indicates that the PR has been added by a community member label Sep 17, 2022

gfoidl mentioned this pull request Sep 17, 2022

Kestrel response header encoding #33776

Merged

gfoidl commented Sep 17, 2022

View reviewed changes

Don't need to call Initialize from Kestrel, there's nothing to init now

903c988

gfoidl force-pushed the httpcharacters_vectorized branch from a940f4a to 903c988 Compare September 17, 2022 20:09

mkArtakMSFT added the area-runtime label Sep 18, 2022

mkArtakMSFT assigned adityamandaleeka Sep 18, 2022

gfoidl mentioned this pull request Sep 19, 2022

Vectorize IndexOfAny on more than 5 chars dotnet/runtime#68328

Closed

gfoidl closed this Oct 20, 2022

gfoidl deleted the httpcharacters_vectorized branch October 20, 2022 10:05

MihaZupan mentioned this pull request Nov 22, 2022

Use IndexOfAnyInRange in StripBidiControlCharacters dotnet/runtime#78658

Merged

gfoidl mentioned this pull request Nov 27, 2022

Vectorized HttpCharacters (and used IndexOfAnyValues in other places found) #45300

Merged

amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorized HttpCharacters #44041

Vectorized HttpCharacters #44041

gfoidl commented Sep 17, 2022 •

edited

Loading

ghost commented Sep 17, 2022

gfoidl Sep 17, 2022

gfoidl Sep 17, 2022

gfoidl commented Oct 20, 2022

ghost commented Oct 20, 2022

gfoidl commented Nov 22, 2022

ghost commented Nov 22, 2022

Vectorized HttpCharacters #44041

Vectorized HttpCharacters #44041

Conversation

gfoidl commented Sep 17, 2022 • edited Loading

ghost commented Sep 17, 2022

gfoidl Sep 17, 2022

Choose a reason for hiding this comment

gfoidl Sep 17, 2022

Choose a reason for hiding this comment

gfoidl commented Oct 20, 2022

ghost commented Oct 20, 2022

gfoidl commented Nov 22, 2022

ghost commented Nov 22, 2022

gfoidl commented Sep 17, 2022 •

edited

Loading