-
-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CURLE_TOO_LARGE due to full pause buffer if you use curl with http2 and pause transfers especially with compression #16280
Comments
semi related: #5966 and similar |
Yeah, it needs some proper overhaul to make sure pause halts the pipeline before decompressing the data, since otherwise a window-full of compressed data certainly risks exploding (and running into the limit), as you show here. |
Adds a "cw-pause" client writer in the RAW phase that buffers output when the client paused the transfer. This prevents content decpoding from blowing the buffer in the "cw-out" writer. This is a soluation to issue curl#16280, but has the drawbacks. A server sending an overlong response body will not be detected until the transfer is unpaused again, since the write data will be buffered before the PROTOCOL writer checks the lengths. Still work to do.
Adds a "cw-pause" client writer in the PROTOCOL phase that buffers output when the client paused the transfer. This prevents content decoding from blowing the buffer in the "cw-out" writer. Added test_02_35 that downloads 2 100MB gzip bombs in parallel and pauses after 1MB of decoded 0's. This is a solution to issue curl#16280, with some limitations: - cw-out still needs buffering of its own, since it can be paused "in the middle" of a write that started with some KB of gzipped zeros and exploded into several MB of calls to cw-out. - cw-pause will then start buffering on its own *after* the write that caused the pause. cw-pause has no buffer limits, but the data it buffers is still content-encoded. Protocols like http/1.1 stop receiving, h2/h3 have window sizes, so the cw-pause buffer should not grow out of control, at least for these protocols. - the current limit on cw-out's buffer is ~75MB (for whatever historical reason). A potential content-encoding that blows 16KB (the common h2 chunk size) into > 75MB would still blow the buffer, making the transfer fail. A gzip of 0's makes 16KB into ~16MB, so that still works. A better solution would be to allow CURLE_AGAIN handling in the client writer chain and make all content encoders handle that. This would stop explosion of encoding on a pause right away. But this is a large change of the deocoder operations.
I propose #16296 as fix for this. I added a test case with a zip bomb to verify it. Would be nice if you could test that in your setup as well. Thanks! |
Adds a "cw-pause" client writer in the PROTOCOL phase that buffers output when the client paused the transfer. This prevents content decoding from blowing the buffer in the "cw-out" writer. Added test_02_35 that downloads 2 100MB gzip bombs in parallel and pauses after 1MB of decoded 0's. This is a solution to issue curl#16280, with some limitations: - cw-out still needs buffering of its own, since it can be paused "in the middle" of a write that started with some KB of gzipped zeros and exploded into several MB of calls to cw-out. - cw-pause will then start buffering on its own *after* the write that caused the pause. cw-pause has no buffer limits, but the data it buffers is still content-encoded. Protocols like http/1.1 stop receiving, h2/h3 have window sizes, so the cw-pause buffer should not grow out of control, at least for these protocols. - the current limit on cw-out's buffer is ~75MB (for whatever historical reason). A potential content-encoding that blows 16KB (the common h2 chunk size) into > 75MB would still blow the buffer, making the transfer fail. A gzip of 0's makes 16KB into ~16MB, so that still works. A better solution would be to allow CURLE_AGAIN handling in the client writer chain and make all content encoders handle that. This would stop explosion of encoding on a pause right away. But this is a large change of the deocoder operations.
Adds a "cw-pause" client writer in the PROTOCOL phase that buffers output when the client paused the transfer. This prevents content decoding from blowing the buffer in the "cw-out" writer. Added test_02_35 that downloads 2 100MB gzip bombs in parallel and pauses after 1MB of decoded 0's. This is a solution to issue curl#16280, with some limitations: - cw-out still needs buffering of its own, since it can be paused "in the middle" of a write that started with some KB of gzipped zeros and exploded into several MB of calls to cw-out. - cw-pause will then start buffering on its own *after* the write that caused the pause. cw-pause has no buffer limits, but the data it buffers is still content-encoded. Protocols like http/1.1 stop receiving, h2/h3 have window sizes, so the cw-pause buffer should not grow out of control, at least for these protocols. - the current limit on cw-out's buffer is ~75MB (for whatever historical reason). A potential content-encoding that blows 16KB (the common h2 chunk size) into > 75MB would still blow the buffer, making the transfer fail. A gzip of 0's makes 16KB into ~16MB, so that still works. A better solution would be to allow CURLE_AGAIN handling in the client writer chain and make all content encoders handle that. This would stop explosion of encoding on a pause right away. But this is a large change of the deocoder operations.
I did this
With the following program downloading 128MB of gzipped zeros from my website, libcurl will fail http2 transfers with
CURLE_TOO_LARGE
. Part of the problem here is that it is extremely easy to fill up the pause buffer if there is compression at play: a tiny amount of data transferred with gzip compression (of which the sample here is a pathological case, but it happens in less unreasonable cases too). I think that this most commonly happens inside of Lix where someone is running on a slow machine with an extremely fast internet connection (say, github actions) which is downloading especially huge files and is not eating up the data it's downloading very quickly so is giving a lot of pauses for flow control.It's especially bad with compression due to the buffer in question being post decompression, whereas the window size control is applied to the compressed bytes. This is the part of the problem that feels like a curl bug and is relatively clearly not API misuse.
For curious reasons, you have to do two transfers to the same machine simultaneously to hit this problem, but the problem code below reproduces it 100% of the time on my machine at least.
This is a reduction of the Lix file transfer code with all fancy dependencies removed, so it is just C++ and libcurl; I could have written a C sample but it would likely be harder to read and even longer. This is the bug we are having with curl on our end: https://git.lix.systems/lix-project/lix/issues/662.
Compile the program below with
c++ -std=c++20 $(pkg-config --libs --cflags libcurl) bad.cc -o bad
and run with./bad
.C++ sample program extracted from Lix
I expected the following
Curl should manage its window sizes such that it can deal with pauses on transfers with http2 with compression. It may require some redesign so that the buffer getting overly filled up is compressed data rather than decompressed data.
curl/libcurl version
operating system
This is happening on various kernels; the affected userspace code is from nixpkgs 9ecb50d2fae8680be74c08bb0a995c5383747f89 but it also happens on newer nixpkgs with Curl 8.11.1.
The text was updated successfully, but these errors were encountered: