Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpConnectionPool contention for HTTP/1.1 #70098

Closed
MihaZupan opened this issue Jun 1, 2022 · 17 comments · Fixed by #99364
Closed

HttpConnectionPool contention for HTTP/1.1 #70098

MihaZupan opened this issue Jun 1, 2022 · 17 comments · Fixed by #99364
Labels
area-System.Net.Http enhancement Product code improvement that does NOT require public API changes/additions in-pr There is an active PR which will close this issue when it is merged tenet-performance Performance related issue
Milestone

Comments

@MihaZupan
Copy link
Member

When multiple requests use the same connection pool (same handler + same destination), they will hit a bunch of contention when acquiring and releasing the connection from the pool.

A quick example comparing the HttpClient benchmark using 1 vs 8 handlers, both with 256 worker tasks, shows a ~13% difference.

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/httpclient.benchmarks.yml --scenario httpclient-kestrel-get --profile aspnet-citrine-lin --variable useHttpMessageInvoker=true --variable concurrencyPerHttpClient=256 --variable numberOfHttpClients=1 --json 1x256.json
crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/httpclient.benchmarks.yml --scenario httpclient-kestrel-get --profile aspnet-citrine-lin --variable useHttpMessageInvoker=true --variable concurrencyPerHttpClient=32 --variable numberOfHttpClients=8 --json 8x32.json
crank compare .\1x256.json .\8x32.json
client 1x256 8x32
CPU Usage (%) 75 95 +26,67%
Requests 8.653.509 9.814.549 +13,42%
Mean RPS 576.751 654.417 +13,47%

I saw similar numbers (a ~15% E2E difference) in my LLRP experiment when switching to using multiple handlers.

Approximate numbers from a benchmark doing minimal work outside of spamming the connection pool:

Threads RPS RPS/thread
1 620k 620k
2 1120k 560k
3 1500k 500k
4 1720k 430k
5 1370k 274k
6 1170k 195k
7 1080k 154k

On my machine (8 core CPU) the Threads=6 scenario is spending 57 % of CPU on getting the lock in GetHttp11ConnectionAsync and ReturnHttp11Connection.
image

We should look into reducing the contention here if possible.

@MihaZupan MihaZupan added the tenet-performance Performance related issue label Jun 1, 2022
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jun 1, 2022
@ghost
Copy link

ghost commented Jun 1, 2022

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

When multiple requests use the same connection pool (same handler + same destination), they will hit a bunch of contention when acquiring and releasing the connection from the pool.

A quick example comparing the HttpClient benchmark using 1 vs 8 handlers, both with 256 worker tasks, shows a ~13% difference.

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/httpclient.benchmarks.yml --scenario httpclient-kestrel-get --profile aspnet-citrine-lin --variable useHttpMessageInvoker=true --variable concurrencyPerHttpClient=256 --variable numberOfHttpClients=1 --json 1x256.json
crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/httpclient.benchmarks.yml --scenario httpclient-kestrel-get --profile aspnet-citrine-lin --variable useHttpMessageInvoker=true --variable concurrencyPerHttpClient=32 --variable numberOfHttpClients=8 --json 8x32.json
crank compare .\1x256.json .\8x32.json
client 1x256 8x32
CPU Usage (%) 75 95 +26,67%
Requests 8.653.509 9.814.549 +13,42%
Mean RPS 576.751 654.417 +13,47%

I saw similar numbers (a ~15% E2E difference) in my LLRP experiment when switching to using multiple handlers.

Approximate numbers from a benchmark doing minimal work outside of spamming the connection pool:

Threads RPS RPS/thread
1 620k 620k
2 1120k 560k
3 1500k 500k
4 1720k 430k
5 1370k 274k
6 1170k 195k
7 1080k 154k

On my machine (8 core CPU) the Threads=6 scenario is spending 57 % of CPU on getting the lock in GetHttp11ConnectionAsync and ReturnHttp11Connection.
image

We should look into reducing the contention here if possible.

Author: MihaZupan
Assignees: -
Labels:

area-System.Net.Http, tenet-performance, untriaged

Milestone: -

@stephentoub
Copy link
Member

Are you able to run a similar test on previous versions, e.g. .NET Core 3.1 vs .NET 5 vs .NET 6 vs .NET 7? I'm curious if this has changed recently.

@CarnaViire
Copy link
Member

Are you able to run a similar test on previous versions, e.g. .NET Core 3.1 vs .NET 5 vs .NET 6 vs .NET 7? I'm curious if this has changed recently.

Needs tiny changes in the benchmark app, I'll do that

@MihaZupan
Copy link
Member Author

From a quick test it looks like the Threads=1 case is significantly improved with every release whereas contention (% drop depending on number of threads) is very similar.

@MihaZupan
Copy link
Member Author

MihaZupan commented Jun 6, 2022

.NET 5, 6 and 7 compared to (theoretical) perfect scaling

image

@karelz karelz added bug and removed untriaged New issue has not been triaged by the area owner labels Jun 7, 2022
@karelz karelz added this to the Future milestone Jun 7, 2022
@karelz karelz added enhancement Product code improvement that does NOT require public API changes/additions and removed bug labels Jun 7, 2022
@karelz
Copy link
Member

karelz commented Jun 7, 2022

Triage: Contention in HTTP/1.1. Might be similar problems in HTTP/2 and HTTP/3, but they are not measured / tracked here.
Worth taking a look, but not high pri as the benchmark above is synthetic.

@CarnaViire
Copy link
Member

As an FYI - numbers for HTTP/2 on .NET 6, scenario is GET with OK (no content) from the server:

client 1x256 8x32
CPU Usage (%) 69 93 +34.78%
Requests 2,304,474 5,281,524 +129.19%
Mean RPS 153,626 352,137 +129.22%
crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/httpclient.benchmarks.yml --scenario httpclient-kestrel-get --profile aspnet-citrine-lin --variable useHttpMessageInvoker=true --variable useHttps=true --variable httpVersion="2.0" --variable concurrencyPerHttpClient=256 --variable numberOfHttpClients=1 --json 1x256.json
crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/httpclient.benchmarks.yml --scenario httpclient-kestrel-get --profile aspnet-citrine-lin --variable useHttpMessageInvoker=true --variable useHttps=true --variable httpVersion="2.0" --variable concurrencyPerHttpClient=32 --variable numberOfHttpClients=8 --json 8x32.json

@CarnaViire
Copy link
Member

CarnaViire commented Jun 7, 2022

Scratch that, I forgot the huge H2 improvements in Kestrel; but the gap still persists:
This is both client and server on 7.0 nightly build:

client 1x256 8x32
CPU Usage (%) 70 90 +28.57%
Requests 4,850,055 8,609,368 +77.51%
Mean RPS 323,367 574,032 +77.52%
crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/httpclient.benchmarks.yml --scenario httpclient-kestrel-get --profile aspnet-citrine-lin --variable useHttpMessageInvoker=true --variable useHttps=true --variable httpVersion="2.0" --variable concurrencyPerHttpClient=256 --variable numberOfHttpClients=1 --server.framework net7.0 --server.channel latest --client.framework net7.0 --client.channel latest --json 1x256.json
crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/httpclient.benchmarks.yml --scenario httpclient-kestrel-get --profile aspnet-citrine-lin --variable useHttpMessageInvoker=true --variable useHttps=true --variable httpVersion="2.0" --variable concurrencyPerHttpClient=32 --variable numberOfHttpClients=8 --server.framework net7.0 --server.channel latest  --client.framework net7.0 --client.channel latest --json 8x32.json

@aceven24-csgp
Copy link

We are seeing this under load as well in our production environments. In one scenario we actually created a "proxy" in GoLang that took a bunch of GET requests as a POST reducing 25 HTTP connections to 1 from the .NET perspective and let GoLang deal with heavy load. Pretty sad that we had to do that, but it helped, but in other scenarios it is more complicated and we can't go that route.

We have an HTTP Varnish cache and a decent number of our responses are less than 1 MS but we will see contention at times turn that into 40+ms, which kills performance and our ability to handle high loads as operations take so much longer.

One thing we are thinking about doing is adding multiple "destinations" aka host entries to maybe alleviate the contention, this reminds me of the old browser days when you could only have so many concurrent requests per domain.

We tried HTTP/2 but that didn't seem to help and under heavy load we were getting "accessing a disposed object" errors, which forced us back to 1.1

Things we've noticed

DNS contention - Our thoughts is the framework should allow us dictate DNS behavior. These DNS entries never change as they are service in the K8s cluster, it rarely changes and if it did, all the pods would have been destroyed beforehand anyway.

SSL contention - We switched to HTTP as again, these are internal calls to the cluster.

Other things we've thought about doing is move to UDS and proxy to Istio/Envoy.

Thoughts on a fix and thoughts on multiple DNS entries?

@MihaZupan
Copy link
Member Author

Playing around with a (lightly tested) reimplementation of the HTTP/1.1 pool that boils down to ConcurrentQueue.Enqueue + ConcurrentQueue.TryDequeue on the happy path: MihaZupan@2643235.

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/httpclient.benchmarks.yml --scenario httpclient-kestrel-get --profile aspnet-citrine-lin --variable useHttpMessageInvoker=true --variable concurrencyPerHttpClient=256 --variable numberOfHttpClients=1 --server.framework net9.0 --client.framework net9.0 --json 1x256.json
crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/httpclient.benchmarks.yml --scenario httpclient-kestrel-get --profile aspnet-citrine-lin --variable useHttpMessageInvoker=true --variable concurrencyPerHttpClient=32 --variable numberOfHttpClients=8 --server.framework net9.0 --client.framework net9.0 --json 8x32.json
crank compare .\1x256.json .\8x32.json
client 1x256 8x32
RPS 693,873 875,814 +26.22%
Patched RPS 873,571 876,394 +0.32%

Or YARP's http-http 100 byte scenario

load yarp-base yarp-patched
Latency 50th (ms) 0.73 0.68 -6.97%
Latency 75th (ms) 0.82 0.74 -9.82%
Latency 90th (ms) 1.03 0.89 -13.39%
Latency 95th (ms) 1.41 1.18 -16.41%
Latency 99th (ms) 2.87 2.68 -6.63%
Mean latency (ms) 0.83 0.76 -8.74%
Requests/sec 306,699 335,921 +9.53%

On an in-memory loopback benchmark that stresses the connection pool contention: https://gist.github.com/MihaZupan/27f01d78c71da7b9024b321e743e3d88

Rough RPS numbers with 1-6 threads:

RPS (1000s) 1 2 3 4 5 6
main 2060 1900 1760 1670 1570 1500
patched 2150 2600 3400 3700 4100 4260

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 6, 2024
@MihaZupan MihaZupan modified the milestones: Future, 9.0.0 Mar 12, 2024
@ozziepeeps
Copy link

@stephentoub @MihaZupan Is there any chance that this fix might get cherry-picked back to .NET 8? /CC @aceven24-csgp

@MihaZupan
Copy link
Member Author

Why do you ask? Are you seeing such issues with your service?

It's unlikely that we would backport this change. This isn't a correctness fix - the behavior is the same.
It's a performance improvement that only matters at very high request throughput that won't be reached by most applications.

@ozziepeeps
Copy link

@MihaZupan yes we've been seeing these issues since .NET 6 timeframe (see comment added by @aceven24-csgp above). Our application has very high request throughput. Our company typically only aligns with LTS releases hence the ask about back porting to .NET 8, rather than waiting to what would be .NET 10 for us.

@MihaZupan
Copy link
Member Author

Have you tried manually splitting requests over multiple SocketsHttpHandler instances? E.g. use 8 instances instead of 1 and randomly switch between them?

That would achieve a similar thing as this change at high load. And if it doesn't, you may be running into different issues.

@ozziepeeps
Copy link

Yes, we have tried something similar to what you suggested and saw improvements, so we do have a reasonable workaround for now /cc @kevinstapleton

@kevinstapleton
Copy link

Hi @MihaZupan,

This is somewhat anecdotal now as we implemented the workaround over a year ago, but as @ozziepeeps mentioned, we were seeing heavy contention on the locks within the HTTP connection pool which appeared to be limiting our throughput, primarily when creating new connections. To be specific, in our trace captures, we would see many "RequestLeftQueue" events with high queue times. As a PoC to prove out whether this was affecting us or not, we created a custom handler which had a "pool" of SocketsHttpHandler instances implemented via a ConcurrentQueue. Using this custom handler practically eliminated the observed contention and removed our throughput bottleneck.

So while we don't need this to be back-ported, we are indeed eager to have a proper fix in the runtime for this. If anything, I think our success with using ConcurrentQueue echoes the success you've seen with ConcurrentStack.

@MihaZupan
Copy link
Member Author

we would see many "RequestLeftQueue" events with high queue times

Seeing this during steady state when using HTTP/1 indicates that you didn't have enough connections to handle all requests.
Are you using MaxConnectionsPerServer?
The cost of contention around starting new connections should be orders of magnitude lower than that of establishing new connections themselves. I'd be surprised if the pool contention would be the primary issue there.

The changes I made here are really focused on the "happy path" where we're able to open enough connections to handle the load so eventually we're just processing requests without having to open any new connections. If you were limited by the number of connections, the recent changes wouldn't help with contention.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Http enhancement Product code improvement that does NOT require public API changes/additions in-pr There is an active PR which will close this issue when it is merged tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants