-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gunicorn is not rejecting connections over backlog limit #2183
Comments
The man page for
Monitoring the TCP traffic on my system, that's exactly what happens when the backlog is full. The client that's trying to connect keeps trying with TCP retransmissions until it ultimately reaches the socket timeout. Here, a server is running on port 64378 with a full backlog. I attempt to connect a new client and we can see the retries happening (at increasing backoffs):
|
Interesting, I totally forgot TCP retry mechanism. I wonder if we can disable it temporarily, to test this without(?) @jamadden What is the tool / command you took that capture with? I guess the second column is for timings, but are they seconds, milliseconds, what...? |
Maybe? If so I expect it's a system dependent tunable somewhere in /proc or /sys on linux or a sysctl on macOS.
That's output from tshark, a tool that's part of wireshark which builds upon libpcap.
I think they're elapsed (fractional) seconds since the capture began. |
Ok also tried it. Using 2s delay, sending 5 concurrent requests, I get this: The first 3 requests get accepted:
The 2 next don't:
HTTP request for the first 3 is sent:
Retransmission of the 2 failed packets:
Response for 1st request:
Connection gets closed, so new one is opened (actually, I wonder why, because I'm sending 5 requests in 5 threads so each thread should only send one request):
Retransmission of the 2 failed packets:
And so on, it continues pretty much like above:
So, in test 3 requests get handled concurrently, not 1 (or 2). |
Running with 5 workers (instead of 1), the result varies, sometimes 1-2 packets need to be re-transmitted, sometimes not. I think this due to Also ran a few different tests, and seems like it's working reasonably. And just to make sure:
So if the only weirdness here is that a backlog value of |
You don't mention the operating system you're testing on. That turns out to be very important as the backlog behaviour differs substantially between operating systems (and even versions). On Linux, there are actually two queues, only one of which is limited by the backlog. The backlog limit only kicks in after the three-way handshake, IIUC; the handshake limit is system-wide and may be unlimited. The behaviour I saw when testing on macOS I think demonstrates the classic BSD approach of a single queue. |
I'm using linux (KDE Neon 5.17, on Ubuntu 18.04, on Debian 9 I think):
I'm running a virtual machine inside VirtualBox, but that probably has no impact. Thanks for the pointers! Looks like I need to do some studying. |
I went through the linked material and indeed it was a good read. However, looking at my earlier trace, I still don't understand how this is possible: First 3 connections get established:
Ok, even if they were in SYN queue at this point, they will be moved to accept queue (and then to worker). How is possible that 3 requests are actually allowed to send content:
Wouldn't these requests be already in accept queue when this happens? If so, first one could get moved to the worker, and the second one would be queued in accept queue. But the third one should fail, as accept queue is full by the time it is moved there? Even if SYN queue allowed data to be received for a connection, before it's moved to accept queue, if there's slight slowness, I had 2+ second delay there and all 3 requests got handled just right. |
We're getting into details I happily admit to being fuzzy on and which I haven't tested recently. This could be where the interaction of worker type (whether the listener socket is blocking or not) and the When it's true and the Again, I haven't really looked closely into the details or experimented on Linux enough to claim to fully understand at this level. My deployed systems perform well with gevent+SO_REUSEPORT, and if the exact backlog isn't correct it's not a big deal for us. And hey, it's just an off-by-one error, and those have never resulted in any big problems, right? 😄 |
Is there something I should do here? AFAICS gunicorn is behaving correctly, it's just that backlog is inherently incredibly system-dependent. |
yup, i just wanted to confirm nothing was missing and we could close the
issue then :)
On Thu 21 Nov 2019 at 12:34 Jason Madden ***@***.***> wrote:
Is there something I should do here? AFAICS gunicorn is behaving
correctly, it's just that backlog is inherently incredibly system-dependent.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2183?email_source=notifications&email_token=AAADRIQFAKOCQ3MJDOMABGLQUZW27A5CNFSM4JPC32W2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEZ5OOY#issuecomment-557045563>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAADRIR7IQGQIR4GKZ7X23LQUZW27ANCNFSM4JPC32WQ>
.
--
Sent from my Mobile
|
Sorry for delay in response!
Indeed I didn't even know about
I mostly ran my tests using
Yeah, I get your point :D and I agree. It seems to work reasonably well in this surprising and magical world of networking. Humble thanks for the prompt responses and effort taken here. However, maybe this should be clarified in the docs:
I mean, if there's relation to total amount of requests that can be queued (and this regard it can be major one), maybe it should be emphasized to the reader? The docs for
For a novice reader, I believe it helps very little. It makes me wonder what the |
Gunicorn uses the socket.listen function to open the port. This maps to the According to the So, it seems, Gunicorn can not provide such fine-grained control over the request queue with the |
Basically this: https://stackoverflow.com/questions/56755512/gunicorn-is-not-rejecting-connections-over-backlog-limit (someone else's question, just happened to find it).
How to reproduce (with gunicorn
20.0.0
):app.py
:Run it with (default) sync worker:
Bombard:
What happens here is that I'm sending in 10 parallel requests, of which only 1 at a time should get processed. My expectation would have been that anything above 1 or 2 parallel requests fails (1 active on worker, 1 queued on backlog), but above actually succeeds. It succeeds even with way larger counts (like 50 but maybe not 100).
The results are quite the same with
--worker-class gevent --worker-connections 1
.Why is this? Why
backlog
andworker
/worker_connection
count don't result in most requests getting rejected? Where are connections queued if not at backlog or workers?The text was updated successfully, but these errors were encountered: