Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpSys request headers Keys and Count both allocate #45156

Merged
merged 2 commits into from Nov 29, 2022

Conversation

ladeak
Copy link
Contributor

@ladeak ladeak commented Nov 17, 2022

HttpSys request headers Keys and Count both allocate

Currently Count and Keys allocate strings and arrays and enumerables. This change intends to reduce the cost of allocation.

Description

RequestHeaders's Key and Count has a new implementation. Count is allocation free, while Keys allocate the string for the unknown header keys and a list for the results.
RequestHeader.Generated.cs has a new HeaderKeys returned as ReadOnlySpan for accessing the known keys - by avoiding allocation with enumerator or a static list.
RequestHeader.Generated.tt is updated, however it does not seem be set up to generate the RequestHeader.Generated.cs file.
NativeRequestContext provides new implementation to access the header keys and count.
Added tests to cover changes, it mocks a memory segment that is normally pinned and passed by IIS.

Considered caching the Keys (as the unknown header names are re-allocated every call). However, it could make sense to re-use the cached values in the Extra property getter too, but that would engage further refactoring.

EDIT (2022.11.19):

Note, that the RequestHeader.ResetFlags() method is used for the benchmakrs only. It does not clear cached values, just clears the flags indicating that a value is set.

Performance Measurements

EDIT (Updated 2022.11.19):

Before:

// * Summary *

BenchmarkDotNet=v0.13.0, OS=Windows 10.0.22621
12th Gen Intel Core i7-1255U, 1 CPU, 12 logical and 10 physical cores
.NET SDK=8.0.100-alpha.1.22531.1
  [Host]     : .NET 8.0.0 (8.0.22.55109), X64 RyuJIT
  DefaultJob : .NET 8.0.0 (8.0.22.52901), X64 RyuJIT

Server=True

|            Method |        Job |     Toolchain | RunStrategy |       Mean |    Error |   StdDev |        Op/s |  Gen 0 |  Gen 1 | Gen 2 | Allocated |
|------------------ |----------- |-------------- |------------ |-----------:|---------:|---------:|------------:|-------:|-------:|------:|----------:|
| CountSingleHeader | DefaultJob |       Default |     Default |   460.3 ns |  2.58 ns |  2.42 ns | 2,172,472.5 | 0.0277 |      - |     - |     176 B |
| CountLargeHeaders | DefaultJob |       Default |     Default | 3,558.5 ns | 19.03 ns | 17.80 ns |   281,017.5 | 1.4381 | 0.0305 |     - |   9,032 B |
|  KeysSingleHeader | DefaultJob |       Default |     Default |   541.2 ns |  2.64 ns |  2.21 ns | 1,847,804.1 | 0.0544 |      - |     - |     344 B |
|  KeysLargeHeaders | DefaultJob |       Default |     Default | 3,859.1 ns | 22.89 ns | 20.29 ns |   259,128.9 | 1.5335 | 0.0763 |     - |   9,648 B |

After (see final results in the comments).

|            Method |        Job |     Toolchain | RunStrategy |       Mean |   Error |  StdDev |        Op/s |  Gen 0 |  Gen 1 | Gen 2 | Allocated |
|------------------ |----------- |-------------- |------------ |-----------:|--------:|--------:|------------:|-------:|-------:|------:|----------:|
| CountSingleHeader | DefaultJob |       Default |     Default |   213.9 ns | 0.60 ns | 0.50 ns | 4,676,068.2 |      - |      - |     - |         - |
| CountLargeHeaders | DefaultJob |       Default |     Default |   213.3 ns | 0.51 ns | 0.43 ns | 4,689,144.1 |      - |      - |     - |         - |
|  KeysSingleHeader | DefaultJob |       Default |     Default |   427.7 ns | 2.71 ns | 2.53 ns | 2,338,190.7 | 0.0048 |      - |     - |      32 B |
|  KeysLargeHeaders | DefaultJob |       Default |     Default | 1,278.5 ns | 5.73 ns | 5.08 ns |   782,195.9 | 0.4425 | 0.0095 |     - |   2,776 B |

Questions

Open question: should we consider re-implementing ContainsKey to gain performance benefits?
EDIT (2022.11.19): will keep it out of scope this PR.

Fixes #44627

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Nov 17, 2022
@ghost
Copy link

ghost commented Nov 17, 2022

Thanks for your PR, @ladeak. Someone from the team will get assigned to your PR shortly and we'll get it reviewed.

@BrennanConroy
Copy link
Member

Could you include the microbenchmark in the PR? We like to keep those for future use 😃
Put it in /HttpSys/perf/Microbenchmarks like https://github.com/dotnet/aspnetcore/tree/main/src/Servers/Kestrel/perf/Microbenchmarks

@Tratcher
Copy link
Member

Tratcher commented Nov 17, 2022

Please address the compilation error.

src/Shared/HttpSys/RequestProcessing/RequestHeaders.cs(105,16): error CS0102: (NETCORE_ENGINEERING_TELEMETRY=Build) The type 'RequestHeaders' already contains a definition for 'Count'

@Tratcher
Copy link
Member

Open question: should we consider re-implementing ContainsKey to gain performance benefits?

You're welcome to try.

@Tratcher
Copy link
Member

RequestHeader.Generated.tt is updated, however it does not seem be set up to generate the RequestHeader.Generated.cs file.

See #44860 (comment)

@ladeak ladeak marked this pull request as draft November 17, 2022 21:16
@ladeak
Copy link
Contributor Author

ladeak commented Nov 18, 2022

I submitted the benchmarks. In my original benchmarks I actually reused a captured NativeMemory - which meant I had 3 KB byte array in the source. I updated these benchmarks to use the same method to create the native memory as the tests. The benchmark results are still the same magnitude
However, as pointed out header removal is not considered in my changes, that will invalidate the above benchmarks, and I will run (and attach their results) once the current implementation is fixed.

@ladeak ladeak requested review from Tratcher and removed request for halter73, JamesNK and BrennanConroy November 20, 2022 08:39
@ladeak
Copy link
Contributor Author

ladeak commented Nov 22, 2022

@Tratcher and @BrennanConroy please review changes.

@Tratcher
Copy link
Member

FYI: #45250

@Tratcher
Copy link
Member

Tratcher commented Nov 23, 2022

Have you rebased on #44860 yet? It has new infrastructure that might be helpful here.

@ladeak ladeak force-pushed the ladeak_44627 branch 3 times, most recently from 1df2501 to 819b814 Compare November 24, 2022 19:24
@ladeak
Copy link
Contributor Author

ladeak commented Nov 24, 2022

EDIT 2022. 11. 29. : this is obsolete

@Tratcher I have rebased to include your latest changes. I addressed all comments. I fixed some space/tabs styling in the .tt file.

|            Method |        Job |     Toolchain | RunStrategy |       Mean |    Error |   StdDev |        Op/s |  Gen 0 |  Gen 1 | Gen 2 | Allocated |
|------------------ |----------- |-------------- |------------ |-----------:|---------:|---------:|------------:|-------:|-------:|------:|----------:|
| CountSingleHeader | DefaultJob |       Default |     Default |   273.0 ns |  4.39 ns |  4.11 ns | 3,663,046.3 |      - |      - |     - |         - |
| CountLargeHeaders | DefaultJob |       Default |     Default |   265.2 ns |  3.77 ns |  3.53 ns | 3,771,270.2 |      - |      - |     - |         - |
|  KeysSingleHeader | DefaultJob |       Default |     Default |   532.6 ns |  4.88 ns |  4.33 ns | 1,877,607.1 | 0.0038 |      - |     - |      24 B |
|  KeysLargeHeaders | DefaultJob |       Default |     Default | 1,124.6 ns | 14.55 ns | 13.61 ns |   889,191.6 | 0.4406 | 0.0095 |     - |   2,768 B |

** I had to adjust Windows Power Mode (from Recommended to Perf) and replaced a for loop to a while, to get comparable execution time.

I think the remaining perf difference is due to where we run the for loop for the known headers (inside RequestHeaders or NativeRequestContext).

{
count++;
}
pUnknownHeader++;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if using ref byte-s and Unsafe.As / Unsafe.Add could be beneficial here?

@BrennanConroy
Copy link
Member

Something broke in your latest changes, but since you force-pushed I can't tell what it was.

@ladeak
Copy link
Contributor Author

ladeak commented Nov 29, 2022

I am looking into it.

@ladeak
Copy link
Contributor Author

ladeak commented Nov 29, 2022

The current performance results:

|            Method |        Job |     Toolchain | RunStrategy |       Mean |    Error |   StdDev |        Op/s |  Gen 0 |  Gen 1 | Gen 2 | Allocated |
|------------------ |----------- |-------------- |------------ |-----------:|---------:|---------:|------------:|-------:|-------:|------:|----------:|
| CountSingleHeader | DefaultJob |       Default |     Default |   403.7 ns |  7.95 ns |  8.83 ns | 2,476,942.2 |      - |      - |     - |         - |
| CountLargeHeaders | DefaultJob |       Default |     Default |   441.2 ns |  3.54 ns |  3.32 ns | 2,266,582.1 |      - |      - |     - |         - |
|  KeysSingleHeader | DefaultJob |       Default |     Default |   916.0 ns | 15.29 ns | 23.36 ns | 1,091,654.4 | 0.0048 |      - |     - |      32 B |
|  KeysLargeHeaders | DefaultJob |       Default |     Default | 1,800.9 ns | 14.42 ns | 13.49 ns |   555,283.7 | 0.4425 | 0.0095 |     - |   2,776 B |

Copy link
Member

@Tratcher Tratcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like everything is working, nice job.
A few comments on formatting and this should be ready to go.

Laszlo Deak added 2 commits November 29, 2022 20:38
…cation free, while Keys allocate the string for the header keys that are not known and a list for the result.

RequestHeader.Generated.cs has a new HeaderKeys returned as ReadOnlySpan<byte> for accessing the know keys - by avoiding allocation with enumerator or a static list.
RequestHeader.Generated.tt is updated, but I don't see it being used to generate the RequestHeader.Generated.cs file.
NativeRequestContext provides new implementation to access the header keys and count.
@ladeak
Copy link
Contributor Author

ladeak commented Nov 29, 2022

@Tratcher I am sorry, but I ended up force pushing against my original intentions :(

@Tratcher Tratcher enabled auto-merge (squash) November 29, 2022 20:16
@Tratcher Tratcher added this to the 8.0-preview1 milestone Nov 29, 2022
@Tratcher Tratcher merged commit 1441b66 into dotnet:main Nov 29, 2022
@Tratcher
Copy link
Member

Thanks for working through this

@Tratcher
Copy link
Member

@ladeak
Copy link
Contributor Author

ladeak commented Nov 30, 2022

Let me know how I can help.

@ghost
Copy link

ghost commented Nov 30, 2022

Hi @ladeak. It looks like you just commented on a closed PR. The team will most probably miss it. If you'd like to bring something important up to their attention, consider filing a new issue and add enough details to build context.

@Tratcher
Copy link
Member

Nevermind, the new error is related to hostfxr even trying to load the app, that shouldn't be related to any changes here. This was just the first occurance.

@ladeak
Copy link
Contributor Author

ladeak commented Nov 30, 2022

Thank you for looking into.

@ghost
Copy link

ghost commented Nov 30, 2022

Hi @ladeak. It looks like you just commented on a closed PR. The team will most probably miss it. If you'd like to bring something important up to their attention, consider filing a new issue and add enough details to build context.

@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions community-contribution Indicates that the PR has been added by a community member feature-httpsys Perf
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HttpSys request headers Keys and Count both allocate
5 participants