Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ValueTask pooling in more places #68457

Merged
merged 1 commit into from
Apr 27, 2022
Merged

Conversation

davidfowl
Copy link
Member

@davidfowl davidfowl commented Apr 24, 2022

  • Use ValueTask pooling on StreamPipeReader.ReadAsync and ReadAtLeastAsync and StreamPipeWriter.FlushAsync

Fixes #30169 and contributes to dotnet/aspnetcore#41343

Here's the allocation profile for the JSON https benchmark:

Crank command line:

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/json.benchmarks.yml --scenario https --profile aspnet-perf-lin --application.options.collectCounters true --application.channel edge --application.framework net7.0 --chart

Crank with allocation profile:

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/json.benchmarks.yml --scenario https --profile aspnet-perf-lin --application.options.collectCounters true --application.channel edge --application.framework net7.0 --chart --application.dotnetTrace true --application.dotnetTraceProviders gc-verbose

Before

Name                                                                                                                                           Exc %           Exc Exc CtInc %             Inc Inc CtFoldFold Ct                             When     First       Last
Type AsyncStateMachineBox`1[System.IO.Pipelines.ReadResult,System.IO.Pipelines.StreamPipeReader+<<ReadAsync>g__Core|36_0>d]                     37.22,525,305,344.000 23,782 37.22,525,305,344.000 23,782   0      0 _034o7777777777577777777777774__ 2,061.743 32,219.062
Type AsyncStateMachineBox`1[System.Int32,System.Net.Security.SslStream+<ReadAsyncInternal>d__179`1[System.Net.Security.AsyncReadWriteAdapter]] 31.0 2,102,077,952 19,796 31.0   2,102,077,952 19,796   0      0 _023o6565666666466556666666653__ 2,061.098 32,216.272
Type System.Text.Json.Utf8JsonWriter                                                                                                            19.3 1,312,253,440 12,358 19.3   1,312,253,440 12,358   0      0 _012o3433333333333344343433432__ 2,012.742 32,217.077
Type System.Text.Json.PooledByteBufferWriter                                                                                                     8.3   562,264,192  5,295  8.3     562,264,192  5,295   0      0 _001.1111111111111111111111110__ 2,081.391 32,218.706
Type Benchmarks.Middleware.JsonMessage                                                                                                           3.9   262,497,232  2,472  3.9     262,497,232  2,472   0      0 _o000.000000000000000000000000__ 2,069.140 32,190.121

You can see the top allocation is the StreamPipeReader state machine. Here are the client stats:

load
CPU Usage (%) 68 ▄██████████████
Cores usage (%) 821 ▄██████████████
Working Set (MB) 49 ██████████████▃▃▃▃▃▃▃▄▄▄▄▄▄▄▄
Private Memory (MB) 376 ▂▂▂▂▂▂▂▂▂▂▂▂▂▂███████████████
Start Time (ms) 0
First Request (ms) 208
Requests/sec 411,794
Requests 6,215,626
Mean latency (ms) 1.00
Max latency (ms) 65.03
Bad responses 0
Socket errors 0
Read throughput (MB/s) 59.69
Latency 50th (ms) 0.45
Latency 75th (ms) 0.74
Latency 90th (ms) 2.30
Latency 99th (ms) 8.35

After

Allocation profile

Name                                                                                                                                           Exc %           Exc Exc CtInc %             Inc Inc CtFoldFold Ct                             When     First       Last
Type AsyncStateMachineBox`1[System.Int32,System.Net.Security.SslStream+<ReadAsyncInternal>d__179`1[System.Net.Security.AsyncReadWriteAdapter]] 57.02,424,627,456.000 22,833 57.02,424,627,456.000 22,833   0      0 __460CBBBCBBBBBAB9AABBBAABABBABA 2,026.491 32,251.241
Type System.Text.Json.Utf8JsonWriter                                                                                                            31.3 1,332,481,792 12,548 31.3   1,332,481,792 12,548   0      0 _o2o6566666656665566666666666666 1,986.173 32,255.860
Type System.Text.Json.PooledByteBufferWriter                                                                                                     6.6   282,760,640  2,663  6.6     282,760,640  2,663   0      0 __00o111101111111111111110111111 2,038.833 32,247.522
Type Benchmarks.Middleware.JsonMessage                                                                                                           4.6   195,596,128  1,842  4.6     195,596,128  1,842   0      0 __000000000000000001010100000101 2,048.108 32,214.326
load
CPU Usage (%) 70 ▄██████████████
Cores usage (%) 835 ▄██████████████
Working Set (MB) 49 ██████████████▃▃▃▄▄▄▄▄▄▄▄▄▄▄▄
Private Memory (MB) 376 ▂▂▂▂▂▂▂▂▂▂▂▂▂▂███████████████
Start Time (ms) 0
First Request (ms) 211
Requests/sec 416,502
Requests 6,288,546
Mean latency (ms) 1.03
Max latency (ms) 37.39
Bad responses 0
Socket errors 0
Read throughput (MB/s) 60.38
Latency 50th (ms) 0.43
Latency 75th (ms) 0.76
Latency 90th (ms) 2.49
Latency 99th (ms) 8.70

- Use ValueTask pooling on StreamPipeReader.ReadAsync and ReadAtLeastAsync and StreamPipeWriter.FlushAsync
Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you validated impact on throughput (and not just that allocation numbers are reduced) in expected use?

@davidfowl
Copy link
Member Author

Will run benchmarks on a few profiles and report back.

@davidfowl
Copy link
Member Author

Updated the description.

@stephentoub
Copy link
Member

Thanks for the data. I can't tell from the numbers shared; did this help? hurt? nop? The code changes themselves in the PR look fine; If we believe this truly helps for more than just removing some lines from an allocation profile, then go ahead. I want to reiterate for the record though that we do need to be thoughtful with this pooling. We added the support for good reason, and there are absolutely valid times to use it, we just need to be careful about it as there can be hidden costs. For example, pooling can easily create more references from pooled Gen2 objects to newer/temporary Gen0 objects, which in turn makes GCs slower when they occur and thus actually negatively impact performance / P99 latency rather than helping it.

@davidfowl
Copy link
Member Author

It doesn't hurt the RPS, in this scenario its slightly better (though that might be noise). Yes, I'm aware of the downsides 😄.

@davidfowl davidfowl merged commit 6af4abc into main Apr 27, 2022
@davidfowl davidfowl deleted the davidfowl/stream-pipe-reader branch April 27, 2022 14:48
@ghost ghost locked as resolved and limited conversation to collaborators May 27, 2022
@AndyAyersMS
Copy link
Member

Improvements: dotnet/perf-autofiling-issues#5095

@stephentoub
Copy link
Member

Improvements: dotnet/perf-autofiling-issues#5095

That doesn't seem right. This PR would help with System.IO.Pipelines, but I don't believe those tests rely on pipelines in any way.

@BrennanConroy
Copy link
Member

Looks more like #59110 would have this affect, also has perf numbers that look similar #59110 (comment)

@BrennanConroy BrennanConroy added this to the 7.0.0 milestone Oct 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StreamPipeReader does not amortize ValueTask ReadAsync calls
4 participants