-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dapr GRPC proxy connection leak #6734
Comments
So to be clear, in the following call chain: App 1 --> Dapr 1 -------> Dapr 2 --> App 2 You see connections being opened and not re-used in the path of Dapr 2 --> App 2? |
Yep, that is exactly right @yaron2 but there is some connection reuse between Dapr 2 --> App 2. If I am doing a single request at a time in something like Postman, connection count stays at 1. As soon as I throw JMeter with 10 threads at it, connection count grows steadily, but it doesn't increase by 1 for every request. |
Does the connection count continue growing after you've finished opening the 10 threads with JMeter? |
Ok thanks a lot of the info, will check this out. Do you have any special connection max age/timeout settings on App 2's server? |
I have this configured to keep alive the HTTP2 connection so it doesn't die while idle. I tried without this logic and still saw the same behavior with dapr sidecar --> App 2. This logic exists on App 1 (client).
|
I believe i'm able to reproduce this locally.. tracking this for 1.12 @jsheetzmt. |
Awesome, thanks for the update @yaron2! |
@jsheetzmt I've made some progress and am able to correlate new connection creation to the number of parallel requests coming in to the app. Can you provide more details on the concurrency parameters for your JMeter test? You mentioned 10 threads, but how many requests are sent in total? |
With the 10 threads, I was just letting it run indefinitely and would see a new connection every 100 requests or so. Each thread sends one request to the app, gets the response and then repeats the test. |
And how long would it take to for 100 requests to come back? |
@jsheetzmt it looks like you're seeing a new connection opening up due to this setting: Line 32 in e303261
Essentially Dapr would open a new connection if the current one holds 100 streams in a Dapr to App connection. |
Interesting. Do you know why the sidecar is opening a new stream and not reusing an existing one? |
When I change App 1's URL to go to App 2 directly (not dapr sidecar), there's only a single connection open on App 2. |
It looks like we're treating every request as a stream when incrementing the counter. cc @ItalyPaleAle to chime in on that. |
@yaron2 Would you expect connections to close after a certain period of a time? Right now, it seems like connections ramp up and never ramp down. Would always like a min connection of 1 to keep the connection warm |
We have a max connection age of 3 minutes. If there's keep alive enabled on your app server, it might mess with that. |
Okay, I might have to play around with that some more. We have keep alive on our clients but not the server. Even when I disable the client's keep alive, I am not seeing total connections drop back down. |
In the context of |
So the next question is why does Dapr dial a new connection to talk to the app every for 100 RPCs per connection? When a Dapr SDK talks to Dapr via gRPC, or when users open native gRPC clients, or even when users use gRPC without Dapr, this logic isn't present. |
It's been a while so I don't recall 100% of details but from memory, and from some refreshing Google searches, it was probably related to: First. The need to ensure better load balancing across multiple apps. Before we were closing connections right after they were used, so each RPC was creating a new connection so it was connecting to (possibly) a different sidecar. If we keep connections open, the worry was that it may not load balance. However, in hindsight I wonder if this was some sort of premature optimization and maybe the gRPC SDK does that already? Second. From the gRPC docs Related issue: grpc/grpc#21386 As for why 100, that comes from RFC7540 (the HTTP/2 specs) directly: https://httpwg.org/specs/rfc7540.html#SETTINGS_MAX_CONCURRENT_STREAMS |
This should be done by gRPC based on our The 2nd point of the gRPC docs is relevant, however I can confirm (and also @jsheetzmt) that if you open a client/server dial, you do not see additional TCP connection beyond 100 requests. My conclusion is this: if load balancing isn't affected, we should default to remove this logic by default and make it configurable to allow users to handle special cases that may suffer due to queuing. |
Yes, using Some testing may help figuring out if the additional pooling we're doing is needed or not. |
Indeed, but for Dapr to app channel where there's expected to be a single endpoint for the app (most of the time, anyway), this should likely skip this logic by default and be made configurable. Testing is definitely needed to see if this is also the case for Dapr to Dapr communication. |
We are using Dapr in Azure Container Apps, and we will have scaling so that we spin up additional replicas during load and ideally that traffic is load balanced across the replicas. I would expect each replica to have a connection open to Dapr sidecar, and the side car is still doing round-robin |
Oh, yes. I was primarily fixated on sidecar-to-sidecar. sidecar-to-app can be different |
That is a valid expectation |
@yaron2 Pulled down the nightly and things are looking much, much better now. Thank you! |
In what area(s)?
What version of Dapr?
Expected Behavior
Proxying GRPC calls through dapr sidecar doesn't open many connections to destination app during load
Actual Behavior
During load, dapr sidecar is opening many connections to the destination app causing high TCP usage and memory leak in our .NET Core application. When I remove the dapr sidecar and make a direct connection between the server and client, I only have a single connection open on port 8081 (HTTP2 traffic).
dotnet-monitor
![image](https://private-user-images.githubusercontent.com/103272899/256588182-85125f6e-f3ba-410a-bb29-dd7216e363b5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjIwMjI4MzgsIm5iZiI6MTcyMjAyMjUzOCwicGF0aCI6Ii8xMDMyNzI4OTkvMjU2NTg4MTgyLTg1MTI1ZjZlLWYzYmEtNDEwYS1iYjI5LWRkNzIxNmUzNjNiNS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNlQxOTM1MzhaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1lYjA1MmIzNTdjNTJiMDBiY2E4ZDU2NGI2MzhhYWY1MDA0NDFmZWEyNzA1NjkxMDU4NWMyOTAyNGI4ZTM1ZGRjJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.-I_7fOjrw3rg_VCHOp3lZp7YF5NVTemj8t70Y8kv2Ec)
ss
![image](https://private-user-images.githubusercontent.com/103272899/256588340-744477ea-8781-41a4-bf81-b40c27022d5b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjIwMjI4MzgsIm5iZiI6MTcyMjAyMjUzOCwicGF0aCI6Ii8xMDMyNzI4OTkvMjU2NTg4MzQwLTc0NDQ3N2VhLTg3ODEtNDFhNC1iZjgxLWI0MGMyNzAyMmQ1Yi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNlQxOTM1MzhaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1hNDBmMjQ4NzU1NWQyZWExMThkODFiYWE3MDY3MjdhNjcyOTFiNDJlMGQyYjdmYjU4OGIyMjkxYWJkY2EyYmRmJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.J-vn8gz8pJkOJjs3rIa7t0v6Rbe2dhEbdWjSUWX6w1U)
Steps to Reproduce the Problem
Proxy GRPC calls through dapr sidecar with high load
Possible related issue
#4937
The text was updated successfully, but these errors were encountered: