-
-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http3+ngtcp2: Transfer hangs with default libcurl receive buffer size #4525
Comments
I created 4525.c trying to reproduce, but on my dev machine this worked fine in 20 consecutive runs just now and I've not seen it hang even once... Does this hang for you or do we need to do something else to make the issue trigger? |
My initial tests were done on Windows Subsystem for Linux. I just tested on a real Linux machine on EC2 and was able to trigger the issue there as well. This is what I did:
This hung 2 out of 5 times. I will test your libcurl test case now. |
I tested your 4525.c test case and it hung 3 out of 4 times on this EC2 machine:
The first time I ran it I forgot |
It looks like the ngtcp2 backend does not support small buffer sizes. I get this assert and
That line is:
|
OK, I see what's happening. The reason why HTTP/3 downloads are sometimes hanging is because the ngtcp2 backend might retrieve a QUIC datagram from the socket when there isn't enough space in the receive buffer of the target stream for the chunk of data contained in the packet. The code in The main issue is that This was probably an issue with HTTP/2 as well when using multiplexing over a single TCP connection. How is it solved there? |
If we ignore the QPACK problem for a moment, the "standard" way do deal with this problem should probably be to only announce a flow control window that is as large as a stream is able to read into its buffer. However, I fear that might make it very slow (since updating this to the peer is half an RTT away) so we might need something more clever. The HTTP/2 model isn't very optimal either so I'm not sure that will provide a good example to guide us: with nghttp2 we can return a special return code from the callback that says pause when the buffer is full and then there won't be any more data delivered until we unpause it again (after that buffer as been drained). I believe this systems punishes all streams when one of the stream buffers go full. Ideally we want to stop getting data for the stream of which the buffer is full and allow the others to go on but I don't think nghttp3/ngtcp2 will allow this... This said, I'm also willing to land a fix now to make this first work correctly and then work on optimizing it in a second/later step. |
I agree that flow control is probably part of the solution, but I don't think we can rely solely on that, mainly because of header compression (both HPACK and QPACK). From what I can see in I think header compression is what's triggering the hang issue here. With the default buffer size of 16KB we are telling the server to send at most 16KB of unacknowledged QUIC data. It will probably do that very quickly in a burst that arrives at about the same time on the client, which means that curl will find all those packets waiting for it on the socket. Because it's dequeueing all those QUIC packets in one loop and writing out both HTTP/3 headers and data to the receive buffer - and headers will consume more receive buffer than they did flow control credits - we end up chopping off some of the response body from this initial burst, and end up never completing the transfer when the rest of the data arrives from the server. It looks like on HTTP/2 curl uses a separate buffer for headers, but since it then writes those headers to the receive buffer as well I assume it has the same potential problem. In HTTP/2 it might be worse since HEADERS frames don't participate in flow control at all. If we kept headers in a separate buffer from the response body then we would be able to rely on HTTP/2 and HTTP/3 flow control mechanisms to ensure that we never try to write too much data to the receive buffer. This would require changes to Assuming that we can safely rely on flow control, we would then have the issue that you mention where we'd be blocking the sender very frequently since it would only be able to send 16KB of data (by default when using libcurl) per RTT period. With an RTT of 20ms we're talking less than 10Mbps, which is very slow. We would need a significantly larger buffer, but at least the application can control that. What do you think we can do to help solve this issue, both in the short term and the longer term? |
@bagder, how should we move forward? I’d love to help if possible. |
I don't know exactly. I haven't had much time to think about this problem. We need to work out a way to allow us to drain the buffer for the (almost) full stream buffers before we get more data to add to them from the socket - it doesn't scale to have buffers sized to hold a whole RTT full of data for each stream. Maybe it will work good enough if we stop that loop if any stream buffer is full, then "expire" that stream and rely on it draining the buffer soon enough. I'm a little buried other under issues and duties right now but I hope to get back to this soon. I'll of course still welcome and appreciate whatever work and progress you can do here! |
Thanks Daniel. I can try some things but given how new I am to the curl codebase I wanted to make sure I ran ideas by you so I don’t go down a rabbit hole that changes too many things. Your idea of draining the buffer for a given stream before it gets full is something that I was considering, since we could invoke the write function for that easy handle to let it process the buffer and allow us to receive more data for that stream. Unfortunately write functions can request a pause and I don’t have a solution for how to handle that yet. I’ll spend some time thinking about this. |
I think maybe we should ask our friends in the nghttp3 project to allow us to just accept a piece of the buffer passed to us in the Without that ability, we need to implement our own "backup buffer" to store the part of the data that doesn't fit in the stream buffer. Uglier. Plus I suspect we're not going to be the only user of nghttp3 that will run into this problem. |
Status update: We need to make sure that the window size is the same as the buffer size. Which the code already does. We then need to make sure to only extend the window when the buffer is consumed and not like the code does now, where it extends the window immediately when the data is stored in the buffer. I first filed this as an issue with nghttp3 but I believe @tatsuhiro-t is right when he points out we need to use the windowing for this. While working on fix for this (live on twitch! 😁) I ran into this new ngtcp2 issue that is currently blocking me. |
Hey @bagder, that was a fun stream to watch! I missed it but I watched it after the fact. I'd love to be around the next time you stream so we can run ideas past each other. I think your plan of attack is a good start, but even if you do the two things you mentioned I believe you can still encounter the issue. This is what I mentioned on a previous comment:
In short: yes, libcurl is setting the QUIC window size to be the same as the buffer size, but that doesn't prevent your Extreme example:
In other words, flow control is a QUIC layer concept, not an HTTP/3 layer concept. Because the HTTP/3 layer compresses header data, you can't use the QUIC layer's flow control to control how much application-level data you will receive. You will always get more application-level data than the flow control credits you configured. |
Hm, yes very good points indeed. I didn't account much for the headers because in my case here they're not very big and will be drained by the time we get the problems. But in my modified approach that adds window size when we consume it, we need to add window data when the header data is consumed too (which I didn't do in this day's stream session) and then of course our current code doesn't know how much to add since we don't keep that info around! The problem is also as you say that we consume HTTP/3 data, and that serves as the signal where we want to add flow control to the QUIC stream, on another layer. Sort of complicated this! A modified approach could be to do this: Whenever the libcurl API consumer drains the buffer completely, we make sure that we add enough window so that it again goes up to BUFFER_SIZE. Just need to figure out how to do that... 😀 Regarding the stream: I'm in GMT+1, which I presume is about 9 hours ahead of you so maybe an early morning for you and an early evening for me could work... Or possibly even use a more duplex method! |
What kind of duplex method are you thinking of? Yes, I'm in GMT-8, so it's a big difference :) I'm not available weekday mornings because of work, but I can be available late at night (past midnight here, so your morning) which is when you started streaming last time. I just didn't know you were going to live stream this morning or I would have been around :) |
Ah, so then 09:00 am my time could work. I was thinking either stick to the stream, or that we do a video call or something to see if we can work out something together. But let's first make sure that we get the current ngtcp2 version of the build to work again so that we have a "stable" base to work with! |
I would be up for that! We can figure out the details elsewhere so we don't add a lot of churn to this GitHub issue :) Do you think it would make sense to revert back to the ngtcp2 version that I was using to reproduce this problem (commits on my first message above)? That would also bring us back to draft-23 so we can more easily test against Cloudflare, including the site I've been using for all my testing (http3.test.dyn.riotcdn.net). |
Sure! It's easy enough so if we just agree on which commits for ngtcp2 + nghttp3 we revert back to and then we can debug curl with ngtcp2/nghttp3 in state while the @ngtcp2 team works on that -24 issue. Let's use the ones you mention at the top of this issue. |
Assisted-by: Javier Blazquez Ref #4525 (partial fix)
Thanks for fixing the QUIC window size extension logic, Daniel! I think the next step should be dealing with those pesky QPACK-compressed HTTP/3 headers. It's very easy to repro the issue with a tiny receive buffer. For example, here are some logs that I got when running the following command:
If you add up all the values, you get exactly 4096 bytes of QUIC-level stream data (because that's our initial flow control value) but you get 4373 bytes of HTTP/3-level data, which clearly shows the problem happening. Clearly the QUIC flow control system can't be used to avoid running out of receive buffer if we write HTTP/3 headers (in HTTP/1.1 form) to that buffer. I think we need to use a separate buffer for headers. This buffer would store all the headers that we have received during a single call to I think the rest of the code would be able to work fine with no changes. For example, the Is my analysis correct? |
Using a separate buffer for headers turned out to be a pretty complicated change. Instead, I have a potential solution where we dynamically grow the receive buffer: #4602 |
Here's another potential solution that I like a bit better: #4603 |
I did this
I used curl built against ngtcp2 to download a small 37KB file over HTTP/3 using the default libcurl CURLOPT_BUFFERSIZE value of 16KB:
Note that since the
curl
tool overrides CURLOPT_BUFFERSIZE with a 100KB buffer by default, I had to pass--limit-rate 16384
to make it use the 16KB value that users of libcurl that leave CURLOPT_BUFFERSIZE untouched would get. The problem is not related to speed limits, but rather download buffer sizes.I expected the following
I expected the transfer to complete successfully, but at least half of the time it hangs after downloading most of the file. For example, here it is after 30 seconds of no progress at 37433 of 37708 bytes downloaded:
The lower the download buffer size, the more likely it is for the transfer to hang. With large buffer sizes it's still possible to make it hang, but it's much less likely.
curl/libcurl version
This was built as follows:
Git commits per repo:
OpenSSL: tatsuhiro-t/openssl@0685fe7
nghttp3: ngtcp2/nghttp3@723c125
ngtcp2: ngtcp2/ngtcp2@365ae24
curl: a030d48
Note that this hang doesn't occur nearly as often when building against quiche, but it can still occur there.
operating system
debugging output
Here's the full debugging output, including nghttp3 and ngtcp2 logs:
The text was updated successfully, but these errors were encountered: