Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorized HttpCharacters #44041

Closed
wants to merge 2 commits into from
Closed

Conversation

gfoidl
Copy link
Member

@gfoidl gfoidl commented Sep 17, 2022

⚠️ There are some open questions from my side -- thus it's a draft PR

Note: after I created this PR I got pushed towards dotnet/runtime#68328, with which I think this can be solved in a more general runtime-like way. I don't know the outcome from that issue (yet), so please lets hold on with this PR in the meantime. Should we close it instead and eventually re-open if the mentioned issue doesn't fit?
At least the the "extended" case a special handling is needed -- so if the issue has a usable solution, this PR can be trimmed down if the "extended" case is worth it (which I doubt, as it's seldom?).


As outlined in #33776 (comment) these methods are vectorized now.

Vectorization is not a plain win, there are trade-offs due to a bit more overhead. In the numbers belows this can be seen quite good. There seems to be a pattern were perf regresses:

  • very short inputs
  • if a invalid char is at the very beginning

In these cases a scalar-loop is faster, as it has less work to do in comparison to vectorized code.
But the longer the input, the farther away from the beginning the invalid char is or if there's no invalid char at all, vectorization shows nice improvements.

I expect that in real-life most of the input is valid, thus no invalid char is found, as well as inputs are long enough for vectorization to show it's goodness.
If my assumption is wrong, at least for specific method, we could disable vectorizaiton on these.

|     Method |                           Categories | Length | InvalidCharPos |        Mean | Ratio |
|----------- |------------------------------------- |------- |--------------- |------------:|------:|
|    Default |         ContainsInvalidAuthorityChar |      7 |           None |   5.8321 ns |  1.00 |
| Vectorized |         ContainsInvalidAuthorityChar |      7 |           None |   5.5462 ns |  0.95 |
|            |                                      |        |                |             |       |
|    Default |         ContainsInvalidAuthorityChar |      7 |          Start |   0.6175 ns |  1.00 |
| Vectorized |         ContainsInvalidAuthorityChar |      7 |          Start |   1.4070 ns |  2.30 |
|            |                                      |        |                |             |       |
|    Default |         ContainsInvalidAuthorityChar |      7 |            End |   5.5931 ns |  1.00 |
| Vectorized |         ContainsInvalidAuthorityChar |      7 |            End |   5.4145 ns |  0.97 |
|            |                                      |        |                |             |       |
|    Default |         ContainsInvalidAuthorityChar |     15 |           None |  10.9659 ns |  1.00 |
| Vectorized |         ContainsInvalidAuthorityChar |     15 |           None |  11.0064 ns |  1.00 |
|            |                                      |        |                |             |       |
|    Default |         ContainsInvalidAuthorityChar |     15 |          Start |   0.6444 ns |  1.00 |
| Vectorized |         ContainsInvalidAuthorityChar |     15 |          Start |   1.2530 ns |  1.94 |
|            |                                      |        |                |             |       |
|    Default |         ContainsInvalidAuthorityChar |     15 |            End |  10.6669 ns |  1.00 |
| Vectorized |         ContainsInvalidAuthorityChar |     15 |            End |  10.7583 ns |  1.01 |
|            |                                      |        |                |             |       |
|    Default |         ContainsInvalidAuthorityChar |    113 |           None |  87.7130 ns |  1.00 |
| Vectorized |         ContainsInvalidAuthorityChar |    113 |           None |  13.7753 ns |  0.16 |
|            |                                      |        |                |             |       |
|    Default |         ContainsInvalidAuthorityChar |    113 |          Start |   0.6885 ns |  1.00 |
| Vectorized |         ContainsInvalidAuthorityChar |    113 |          Start |   3.1128 ns |  4.57 |
|            |                                      |        |                |             |       |
|    Default |         ContainsInvalidAuthorityChar |    113 |            End |  89.8752 ns |  1.00 |
| Vectorized |         ContainsInvalidAuthorityChar |    113 |            End |  13.2677 ns |  0.15 |
|            |                                      |        |                |             |       |
|    Default |         IndexOfInvalidFieldValueChar |      7 |           None |   5.0010 ns |  1.00 |
| Vectorized |         IndexOfInvalidFieldValueChar |      7 |           None |   6.4814 ns |  1.30 |
|            |                                      |        |                |             |       |
|    Default |         IndexOfInvalidFieldValueChar |      7 |          Start |   0.5463 ns |  1.00 |
| Vectorized |         IndexOfInvalidFieldValueChar |      7 |          Start |   1.0296 ns |  1.88 |
|            |                                      |        |                |             |       |
|    Default |         IndexOfInvalidFieldValueChar |      7 |            End |   4.9472 ns |  1.00 |
| Vectorized |         IndexOfInvalidFieldValueChar |      7 |            End |   6.1276 ns |  1.24 |
|            |                                      |        |                |             |       |
|    Default |         IndexOfInvalidFieldValueChar |     15 |           None |  10.0493 ns |  1.00 |
| Vectorized |         IndexOfInvalidFieldValueChar |     15 |           None |   5.2853 ns |  0.53 |
|            |                                      |        |                |             |       |
|    Default |         IndexOfInvalidFieldValueChar |     15 |          Start |   0.5595 ns |  1.00 |
| Vectorized |         IndexOfInvalidFieldValueChar |     15 |          Start |   3.7454 ns |  6.78 |
|            |                                      |        |                |             |       |
|    Default |         IndexOfInvalidFieldValueChar |     15 |            End |  12.8468 ns |  1.00 |
| Vectorized |         IndexOfInvalidFieldValueChar |     15 |            End |   7.2468 ns |  0.58 |
|            |                                      |        |                |             |       |
|    Default |         IndexOfInvalidFieldValueChar |    113 |           None |  92.7111 ns |  1.00 |
| Vectorized |         IndexOfInvalidFieldValueChar |    113 |           None |  15.7101 ns |  0.17 |
|            |                                      |        |                |             |       |
|    Default |         IndexOfInvalidFieldValueChar |    113 |          Start |   0.5651 ns |  1.00 |
| Vectorized |         IndexOfInvalidFieldValueChar |    113 |          Start |   3.8203 ns |  6.84 |
|            |                                      |        |                |             |       |
|    Default |         IndexOfInvalidFieldValueChar |    113 |            End | 100.5039 ns |  1.00 |
| Vectorized |         IndexOfInvalidFieldValueChar |    113 |            End |  16.9102 ns |  0.17 |
|            |                                      |        |                |             |       |
|    Default | IndexOfInvalidFieldValueCharExtended |      7 |           None |   6.1985 ns |  1.00 |
| Vectorized | IndexOfInvalidFieldValueCharExtended |      7 |           None |   6.9981 ns |  1.13 |
|            |                                      |        |                |             |       |
|    Default | IndexOfInvalidFieldValueCharExtended |      7 |          Start |   0.4000 ns |  1.00 |
| Vectorized | IndexOfInvalidFieldValueCharExtended |      7 |          Start |   1.4511 ns |  3.61 |
|            |                                      |        |                |             |       |
|    Default | IndexOfInvalidFieldValueCharExtended |      7 |            End |   5.9051 ns |  1.00 |
| Vectorized | IndexOfInvalidFieldValueCharExtended |      7 |            End |   7.8430 ns |  1.32 |
|            |                                      |        |                |             |       |
|    Default | IndexOfInvalidFieldValueCharExtended |     15 |           None |  12.6932 ns |  1.00 |
| Vectorized | IndexOfInvalidFieldValueCharExtended |     15 |           None |   6.3665 ns |  0.51 |
|            |                                      |        |                |             |       |
|    Default | IndexOfInvalidFieldValueCharExtended |     15 |          Start |   0.6317 ns |  1.00 |
| Vectorized | IndexOfInvalidFieldValueCharExtended |     15 |          Start |   4.1039 ns |  6.57 |
|            |                                      |        |                |             |       |
|    Default | IndexOfInvalidFieldValueCharExtended |     15 |            End |  11.2936 ns |  1.00 |
| Vectorized | IndexOfInvalidFieldValueCharExtended |     15 |            End |   6.2646 ns |  0.56 |
|            |                                      |        |                |             |       |
|    Default | IndexOfInvalidFieldValueCharExtended |    113 |           None |  88.2136 ns |  1.00 |
| Vectorized | IndexOfInvalidFieldValueCharExtended |    113 |           None |  31.7663 ns |  0.36 |
|            |                                      |        |                |             |       |
|    Default | IndexOfInvalidFieldValueCharExtended |    113 |          Start |   0.6923 ns |  1.00 |
| Vectorized | IndexOfInvalidFieldValueCharExtended |    113 |          Start |   3.7021 ns |  5.51 |
|            |                                      |        |                |             |       |
|    Default | IndexOfInvalidFieldValueCharExtended |    113 |            End |  91.9725 ns |  1.00 |
| Vectorized | IndexOfInvalidFieldValueCharExtended |    113 |            End |  32.3834 ns |  0.35 |
|            |                                      |        |                |             |       |
|    Default |               IndexOfInvalidHostChar |      7 |           None |   5.5270 ns |  1.00 |
| Vectorized |               IndexOfInvalidHostChar |      7 |           None |   6.8330 ns |  1.23 |
|            |                                      |        |                |             |       |
|    Default |               IndexOfInvalidHostChar |      7 |          Start |   0.5591 ns |  1.00 |
| Vectorized |               IndexOfInvalidHostChar |      7 |          Start |   1.1605 ns |  2.08 |
|            |                                      |        |                |             |       |
|    Default |               IndexOfInvalidHostChar |      7 |            End |   5.1811 ns |  1.00 |
| Vectorized |               IndexOfInvalidHostChar |      7 |            End |   6.6147 ns |  1.28 |
|            |                                      |        |                |             |       |
|    Default |               IndexOfInvalidHostChar |     15 |           None |  10.9923 ns |  1.00 |
| Vectorized |               IndexOfInvalidHostChar |     15 |           None |   4.9088 ns |  0.45 |
|            |                                      |        |                |             |       |
|    Default |               IndexOfInvalidHostChar |     15 |          Start |   0.5255 ns |  1.00 |
| Vectorized |               IndexOfInvalidHostChar |     15 |          Start |   3.3160 ns |  6.36 |
|            |                                      |        |                |             |       |
|    Default |               IndexOfInvalidHostChar |     15 |            End |  10.0873 ns |  1.00 |
| Vectorized |               IndexOfInvalidHostChar |     15 |            End |   5.8072 ns |  0.58 |
|            |                                      |        |                |             |       |
|    Default |               IndexOfInvalidHostChar |    113 |           None |  99.1043 ns |  1.00 |
| Vectorized |               IndexOfInvalidHostChar |    113 |           None |  15.6849 ns |  0.16 |
|            |                                      |        |                |             |       |
|    Default |               IndexOfInvalidHostChar |    113 |          Start |   0.3126 ns |  1.00 |
| Vectorized |               IndexOfInvalidHostChar |    113 |          Start |   3.4211 ns | 11.04 |
|            |                                      |        |                |             |       |
|    Default |               IndexOfInvalidHostChar |    113 |            End |  91.7623 ns |  1.00 |
| Vectorized |               IndexOfInvalidHostChar |    113 |            End |  14.9058 ns |  0.16 |
|            |                                      |        |                |             |       |
|    Default |        IndexOfInvalidTokenChar_Bytes |      7 |           None |   6.4794 ns |  1.00 |
| Vectorized |        IndexOfInvalidTokenChar_Bytes |      7 |           None |   5.5825 ns |  0.87 |
|            |                                      |        |                |             |       |
|    Default |        IndexOfInvalidTokenChar_Bytes |      7 |          Start |   0.5898 ns |  1.00 |
| Vectorized |        IndexOfInvalidTokenChar_Bytes |      7 |          Start |   1.1259 ns |  1.90 |
|            |                                      |        |                |             |       |
|    Default |        IndexOfInvalidTokenChar_Bytes |      7 |            End |   6.0219 ns |  1.00 |
| Vectorized |        IndexOfInvalidTokenChar_Bytes |      7 |            End |   5.2224 ns |  0.87 |
|            |                                      |        |                |             |       |
|    Default |        IndexOfInvalidTokenChar_Bytes |     15 |           None |  12.3160 ns |  1.00 |
| Vectorized |        IndexOfInvalidTokenChar_Bytes |     15 |           None |  11.1012 ns |  0.91 |
|            |                                      |        |                |             |       |
|    Default |        IndexOfInvalidTokenChar_Bytes |     15 |          Start |   0.7251 ns |  1.00 |
| Vectorized |        IndexOfInvalidTokenChar_Bytes |     15 |          Start |   1.0208 ns |  1.40 |
|            |                                      |        |                |             |       |
|    Default |        IndexOfInvalidTokenChar_Bytes |     15 |            End |  11.9562 ns |  1.00 |
| Vectorized |        IndexOfInvalidTokenChar_Bytes |     15 |            End |  11.0921 ns |  0.93 |
|            |                                      |        |                |             |       |
|    Default |        IndexOfInvalidTokenChar_Bytes |    113 |           None |  93.1642 ns |  1.00 |
| Vectorized |        IndexOfInvalidTokenChar_Bytes |    113 |           None |  12.9503 ns |  0.14 |
|            |                                      |        |                |             |       |
|    Default |        IndexOfInvalidTokenChar_Bytes |    113 |          Start |   0.6146 ns |  1.00 |
| Vectorized |        IndexOfInvalidTokenChar_Bytes |    113 |          Start |   2.9311 ns |  4.80 |
|            |                                      |        |                |             |       |
|    Default |        IndexOfInvalidTokenChar_Bytes |    113 |            End |  95.4803 ns |  1.00 |
| Vectorized |        IndexOfInvalidTokenChar_Bytes |    113 |            End |  13.4628 ns |  0.14 |
|            |                                      |        |                |             |       |
|    Default |       IndexOfInvalidTokenChar_String |      7 |           None |   6.0107 ns |  1.00 |
| Vectorized |       IndexOfInvalidTokenChar_String |      7 |           None |   7.4539 ns |  1.24 |
|            |                                      |        |                |             |       |
|    Default |       IndexOfInvalidTokenChar_String |      7 |          Start |   0.6448 ns |  1.00 |
| Vectorized |       IndexOfInvalidTokenChar_String |      7 |          Start |   1.0052 ns |  1.61 |
|            |                                      |        |                |             |       |
|    Default |       IndexOfInvalidTokenChar_String |      7 |            End |   5.4034 ns |  1.00 |
| Vectorized |       IndexOfInvalidTokenChar_String |      7 |            End |   6.7783 ns |  1.25 |
|            |                                      |        |                |             |       |
|    Default |       IndexOfInvalidTokenChar_String |     15 |           None |  10.7808 ns |  1.00 |
| Vectorized |       IndexOfInvalidTokenChar_String |     15 |           None |   4.6405 ns |  0.43 |
|            |                                      |        |                |             |       |
|    Default |       IndexOfInvalidTokenChar_String |     15 |          Start |   0.4462 ns |  1.00 |
| Vectorized |       IndexOfInvalidTokenChar_String |     15 |          Start |   3.2645 ns |  7.42 |
|            |                                      |        |                |             |       |
|    Default |       IndexOfInvalidTokenChar_String |     15 |            End |  10.3272 ns |  1.00 |
| Vectorized |       IndexOfInvalidTokenChar_String |     15 |            End |   4.9606 ns |  0.48 |
|            |                                      |        |                |             |       |
|    Default |       IndexOfInvalidTokenChar_String |    113 |           None |  88.7606 ns |  1.00 |
| Vectorized |       IndexOfInvalidTokenChar_String |    113 |           None |  14.8116 ns |  0.17 |
|            |                                      |        |                |             |       |
|    Default |       IndexOfInvalidTokenChar_String |    113 |          Start |   0.4781 ns |  1.00 |
| Vectorized |       IndexOfInvalidTokenChar_String |    113 |          Start |   3.1777 ns |  6.71 |
|            |                                      |        |                |             |       |
|    Default |       IndexOfInvalidTokenChar_String |    113 |            End |  93.7237 ns |  1.00 |
| Vectorized |       IndexOfInvalidTokenChar_String |    113 |            End |  14.8531 ns |  0.16 |

For a description how the approach works, please see comments in code.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Sep 17, 2022
@ghost
Copy link

ghost commented Sep 17, 2022

Thanks for your PR, @gfoidl. Someone from the team will get assigned to your PR shortly and we'll get it reviewed.


namespace Microsoft.AspNetCore.Http;

internal static class HttpCharacters
[SkipLocalsInit]
internal static unsafe partial class HttpCharacters
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I developed this with a playground-project in https://github.com/gfoidl-Tests/SourceGenerator-Vectors

For best codegen, a code-generator is used (in the project it's still the non-incremental one, I didn't update that code).

Methods in the generator do "the same" as was done in the static init before.

Thus there's no need for static initialization at runtime, instead

  • ROS<byte> -- will refer to the assembly's static data segment -- can be used
  • for vectors it's best to use constants like Vector128.Create((byte)0xF) inline instead of loading the from static readonly fields

In this draft-PR I just copied the output from the source-generator over to here (see file below).

What's the right strategy to generate such a file?

A source-generator, tt-file, etc. Please advise so I can update the PR here to make it suite.


namespace Microsoft.AspNetCore.Http;

[CompilerGenerated]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above -- right now this is just copy & paste (+ little editing) from my playgroud-project.

@gfoidl
Copy link
Member Author

gfoidl commented Oct 20, 2022

I'll close this PR, as I'm pretty sure with dotnet/runtime#68328 we get a proper building block and the wire-up should become trivial.

Only the "extended case" isn't covered by the runtime-issue. In the worst case it remains the scalar-approach as it's now, so no regression should happen, but also no improvement.

@gfoidl gfoidl closed this Oct 20, 2022
@gfoidl gfoidl deleted the httpcharacters_vectorized branch October 20, 2022 10:05
@ghost
Copy link

ghost commented Oct 20, 2022

Hi @gfoidl. It looks like you just commented on a closed PR. The team will most probably miss it. If you'd like to bring something important up to their attention, consider filing a new issue and add enough details to build context.

@gfoidl
Copy link
Member Author

gfoidl commented Nov 22, 2022

dotnet/runtime#68328 is done 🎉, so in the next days I'll create a PR based on these new APIs.

@ghost
Copy link

ghost commented Nov 22, 2022

Hi @gfoidl. It looks like you just commented on a closed PR. The team will most probably miss it. If you'd like to bring something important up to their attention, consider filing a new issue and add enough details to build context.

@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants