Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try make IOQueue auto-parallelizing #21873

Closed
wants to merge 1 commit into from

Conversation

tmds
Copy link
Member

@tmds tmds commented May 15, 2020

Applies the technique from dotnet/runtime#35330 to IOQueue.

Applies the technique from dotnet/runtime#35330
to IOQueue.
@ghost ghost added the area-servers label May 15, 2020
@tmds
Copy link
Member Author

tmds commented May 15, 2020

This is an experiment for benchmarking. I'm not sure what to expect.

cc @kouvel @adamsitnik @stephentoub @halter73 @davidfowl

@tmds
Copy link
Member Author

tmds commented May 15, 2020

cc @benaadams

@adamsitnik
Copy link
Member

Applies the technique from dotnet/runtime#35330 to IOQueue.

I was thinking about it too :D

The other (and most probably a very stupid) idea I had was to try to have a scheduler that in the ctor would use reflection to access the internal field of ThreadPool that stores the work items in a ConcurrentQueue:

https://github.com/dotnet/runtime/blob/ec2209e7360cfae481c9f6df8540dccadb02dcb4/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPool.cs#L403

and implement the Schedule method as a simple call to enqueue of this CQ

Again, this is a very dirty idea ;)

@adamsitnik
Copy link
Member

@tmds could you provide the modified .dll so I could run some benchmarks for you?

@stephentoub
Copy link
Member

stephentoub commented May 15, 2020

Again, this is a very dirty idea ;)

As an experiment it's totally fine. We will not ship that.

@dotnet dotnet deleted a comment from pr-benchmarks bot May 15, 2020
@dotnet dotnet deleted a comment from pr-benchmarks bot May 15, 2020
@dotnet dotnet deleted a comment from pr-benchmarks bot May 15, 2020
@dotnet dotnet deleted a comment from pr-benchmarks bot May 15, 2020
@halter73
Copy link
Member

@aspnet-hello benchmark

@pr-benchmarks
Copy link

pr-benchmarks bot commented May 15, 2020

Starting 'Default' pipelined plaintext benchmark with session ID '1279e01068334a1c99e7c62e8b24d597'. This could take up to 30 minutes...

@pr-benchmarks
Copy link

pr-benchmarks bot commented May 15, 2020

Baseline

stdout: Starting baseline run on '8675632723423f6ea2568c4f3cabec9a8364285a'...
[11:32:22.648] Using worker Wrk
[11:32:22.828] Running session '1279e01068334a1c99e7c62e8b24d597' with description 'Before'
[11:32:22.828] Starting scenario Default on benchmark server...
[11:32:22.828] POST http://10.0.0.9:5001/jobs {"DriverVersion":1,"ServerVersion":3,"Id":0,"Hardware":null,"HardwareVersion":null,"OperatingSystem":null,"KestrelThreadCount":null,"Scenario":"Default","Scheme":"Http","Port":5000,"Path":"/plaintext","Connections":0,"Threads":0,"ReadyStateText":"Application started.","IsConsoleApp":false,"AspNetCoreVersion":"Latest","RuntimeVersion":"Latest","SdkVersion":"5.0.100-preview.5.20258.4","UseMonoRuntime":false,"NoGlobalJson":false,"Database":0,"StartupMainMethod":"00:00:00","BuildTime":"00:00:00","PublishedSize":0,"ServerCounters":[],"Source":{"BranchOrCommit":"master","Repository":"https://github.com/aspnet/benchmarks.git","Project":"src/Benchmarks/Benchmarks.csproj","InitSubmodules":false,"DockerFile":null,"DockerImageName":null,"DockerLoad":null,"DockerCommand":null,"DockerContextDirectory":null,"DockerFetchPath":null,"LocalFolder":null,"SourceCode":null},"Arguments":null,"NoArguments":false,"State":"New","Url":null,"WebHost":"KestrelSockets","UseRuntimeStore":false,"Attachments":[],"BuildAttachments":[],"LastDriverCommunicationUtc":"2020-05-15T23:32:22.598588Z","DotNetTrace":false,"DotNetTraceProviders":null,"Collect":false,"CollectArguments":null,"PerfViewTraceFile":null,"CollectStartup":false,"CollectCounters":false,"BasePath":null,"ProcessId":0,"EnvironmentVariables":{},"BuildArguments":[],"NoClean":false,"Framework":null,"Error":null,"SelfContained":true,"BeforeScript":null,"AfterScript":null,"MemoryLimitInBytes":0,"CpuLimitRatio":0.0,"CpuSet":null,"Counters":{},"Measurements":[],"Metadata":[],"Endpoints":[],"Variables":null,"WaitForExit":false,"Timeout":0,"StartTimeout":"00:00:00","Options":{"DisplayOutput":false,"Fetch":false,"FetchOutput":null,"DownloadFiles":[],"TraceOutput":null,"DisplayBuild":false,"RequiredOperatingSystem":null,"RequiredArchitecture":null,"DiscardResults":false,"BuildFiles":[],"OutputFiles":[]},"Features":[]}...
[11:32:22.841] 202 Accepted
[11:32:22.842] Fetching job: http://10.0.0.9:5001/jobs/179
[11:32:22.842] GET http://10.0.0.9:5001/jobs/179...
[11:32:23.906] GET http://10.0.0.9:5001/jobs/179...
[11:32:23.915] Job has been selected by the server ...
[11:32:23.925] Interrupting due to an unexpected exception
[11:32:23.955] System.IO.DirectoryNotFoundException: Could not find a part of the path '/app/aspnetcore/artifacts/bin/Microsoft.AspNetCore.Server.Kestrel/Release/netcoreapp5.0'.
   at System.IO.Enumeration.FileSystemEnumerator`1.CreateDirectoryHandle(String path, Boolean ignoreNotFound)
   at System.IO.Enumeration.FileSystemEnumerator`1.Init()
   at System.IO.Enumeration.FileSystemEnumerator`1..ctor(String directory, Boolean isNormalized, EnumerationOptions options)
   at System.IO.Enumeration.FileSystemEnumerable`1..ctor(String directory, FindTransform transform, EnumerationOptions options, Boolean isNormalized)
   at System.IO.Enumeration.FileSystemEnumerableFactory.UserFiles(String directory, String expression, EnumerationOptions options)
   at System.IO.Directory.InternalEnumeratePaths(String path, String searchPattern, SearchTarget searchTarget, EnumerationOptions options)
   at System.IO.Directory.GetFiles(String path, String searchPattern, SearchOption searchOption)
   at BenchmarksDriver.Program.Run(Uri serverUri, Uri[] clientUris, String sqlConnectionString, ServerJob serverJob, String session, String description, Int32 iterations, Int32 exclude, String shutdownEndpoint, TimeSpan span, List`1 downloadFiles, Boolean fetch, String fetchDestination, Boolean collectR2RLog, String traceDestination, CommandOption outputFileOption, CommandOption sourceOption, CommandOption scriptFileOption, CommandOption markdownOption, CommandOption writeToFileOption, Nullable`1 requiredOperatingSystem, CommandOption archOption, CommandOption saveOption, CommandOption diffOption)
[11:32:23.955] Deleting scenario 'Default' on benchmark server...
[11:32:23.955] DELETE http://10.0.0.9:5001/jobs/179...
[11:32:23.956] 202 Accepted


stderr: Baseline benchmark run on '8675632723423f6ea2568c4f3cabec9a8364285a' failed.

PR


@davidfowl
Copy link
Member

TFMS!

@halter73
Copy link
Member

@aspnet-hello benchmark

@dotnet dotnet deleted a comment from pr-benchmarks bot May 16, 2020
@dotnet dotnet deleted a comment from pr-benchmarks bot May 16, 2020
@pr-benchmarks
Copy link

pr-benchmarks bot commented May 16, 2020

Starting 'Default' pipelined plaintext benchmark with session ID 'f4f99e3ddb614ab3ae381a3270183a8d'. This could take up to 30 minutes...

@pr-benchmarks
Copy link

pr-benchmarks bot commented May 16, 2020

Baseline

Starting baseline run on '8675632723423f6ea2568c4f3cabec9a8364285a'...
RequestsPerSecond:           743,642
Max CPU (%):                 99
WorkingSet (MB):             88
Avg. Latency (ms):           3.41
Startup (ms):                486
First Request (ms):          121.24
Latency (ms):                0.41
Total Requests:              11,178,758
Duration: (ms)               15,030
Socket Errors:               25
Bad Responses:               0
Build Time (ms):             15,505
Published Size (KB):         120,816
SDK:                         5.0.100-preview.5.20258.4
Runtime:                     5.0.0-preview.6.20262.14
ASP.NET Core:                5.0.0-preview.5.20255.6


PR

Starting PR run on '778f8d5c3a8c90497fee6f65621544a1bed0ffde'...
| Description |     RPS | CPU (%) | Memory (MB) | Avg. Latency (ms) | Startup (ms) | Build Time (ms) | Published Size (KB) | First Request (ms) | Latency (ms) | Errors | Ratio |
| ----------- | ------- | ------- | ----------- | ----------------- | ------------ | --------------- | ------------------- | ------------------ | ------------ | ------ | ----- |
|      Before | 743,642 |      99 |          88 |              3.41 |          486 |           15505 |              120816 |             121.24 |         0.41 |     25 |  1.00 |
|       After | 743,872 |      99 |          89 |              3.13 |          455 |            5502 |              120816 |             124.48 |         0.41 |      0 |  1.00 |


@benaadams
Copy link
Member

Is there a non-pipelined benchmark that can be triggered?

@tmds
Copy link
Member Author

tmds commented May 18, 2020

@tmds could you provide the modified .dll so I could run some benchmarks for you?

@adamsitnik here you to: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.dll.tar.gz

@halter73
Copy link
Member

@aspnet-hello benchmark json

@dotnet dotnet deleted a comment from pr-benchmarks bot May 18, 2020
@dotnet dotnet deleted a comment from pr-benchmarks bot May 18, 2020
@pr-benchmarks
Copy link

pr-benchmarks bot commented May 18, 2020

Starting 'json' pipelined plaintext benchmark with session ID '09c904d3d164426f9cfde9da065a433f'. This could take up to 30 minutes...

@pr-benchmarks
Copy link

pr-benchmarks bot commented May 18, 2020

Baseline

Starting baseline run on '8675632723423f6ea2568c4f3cabec9a8364285a'...
RequestsPerSecond:           622,644
Max CPU (%):                 99
WorkingSet (MB):             201
Avg. Latency (ms):           3.9
Startup (ms):                461
First Request (ms):          144.42
Latency (ms):                0.44
Total Requests:              9,358,924
Duration: (ms)               15,030
Socket Errors:               0
Bad Responses:               0
Build Time (ms):             5,504
Published Size (KB):         120,819
SDK:                         5.0.100-preview.5.20258.4
Runtime:                     5.0.0-preview.6.20264.1
ASP.NET Core:                5.0.0-preview.5.20255.6


PR

Starting PR run on '778f8d5c3a8c90497fee6f65621544a1bed0ffde'...
| Description |     RPS | CPU (%) | Memory (MB) | Avg. Latency (ms) | Startup (ms) | Build Time (ms) | Published Size (KB) | First Request (ms) | Latency (ms) | Errors | Ratio |
| ----------- | ------- | ------- | ----------- | ----------------- | ------------ | --------------- | ------------------- | ------------------ | ------------ | ------ | ----- |
|      Before | 622,644 |      99 |         201 |               3.9 |          461 |            5504 |              120819 |             144.42 |         0.44 |      0 |  1.00 |
|       After | 605,376 |      98 |         199 |              3.94 |          470 |            5502 |              120819 |             143.89 |         0.38 |      0 |  0.97 |


@benaadams
Copy link
Member

I was looking at the traces and sendmsg is very slow (comparatively); so thought it wasn't a good idea to have the sends on the same queue as the receives (thus blocking them).

However, didn't have great success in separating them #21981

@adamsitnik
Copy link
Member

@tmds the results:

obraz

@tmds
Copy link
Member Author

tmds commented May 19, 2020

On ARM this gives some nice results. On Citrine, regression.
I'll close this based on Citrine regression.

@tmds tmds closed this May 19, 2020
@tmds
Copy link
Member Author

tmds commented May 19, 2020

An interesting observation:

Starting PR run on '778f8d5c3a8c90497fee6f65621544a1bed0ffde'...
| Description |     RPS | CPU (%) | Memory (MB) | Avg. Latency (ms) | Startup (ms) | Build Time (ms) | Published Size (KB) | First Request (ms) | Latency (ms) | Errors | Ratio |
| ----------- | ------- | ------- | ----------- | ----------------- | ------------ | --------------- | ------------------- | ------------------ | ------------ | ------ | ----- |
|      Before | 622,644 |      99 |         201 |               3.9 |          461 |            5504 |              120819 |             144.42 |         0.44 |      0 |  1.00 |
|       After | 605,376 |      98 |         199 |              3.94 |          470 |            5502 |              120819 |             143.89 |         0.38 |      0 |  0.97 |

After we use less CPU.
A hypothesis: there is contention, and parallelizing doesn't help.

@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants