-
-
Notifications
You must be signed in to change notification settings - Fork 6.7k
curl H2 PUT fails with PROTOCOL_ERROR beyond a certain payload size #1410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm not sure if this issue belongs to |
Updated the test code to work with a public request-tester website. It reproduces the exact same issue. I've also added the instruction used to obtain the tested Homebrew |
The way you are doing it will send content type urlencoded. I do notice similar though:
Every time the server will wait ~60 seconds before sending back a 502 reply, according to a number of tries I monitored in Wireshark. And as you say in --http1.1 it will echo back the content immediately. Another thing interesting about the http2 case is curl eats CPU for the minute it is waiting for the reply. It looks like curl_multi_wait is not actually waiting. Reproduced in:
curl issue #1410.zip is a Wireshark capture (macs and stuff anonymized) made from openssl_dump_secrets branch and these commands:
To view the packet capture load the file then load the keys: It shows libcurl is sending in 16384 DATA chunks, except for the first 64k which oddly it's doing as 16383 + 1
|
@jay: Thank you for your extensive tests and info (also for your note about |
This may be timing dependent, I was unable to reproduce today in Windows 8.1 (wireless) and Ubuntu 16 (wired), it works fine there. My repro above was in Windows 7. In all of them though it does start with 16384, 16384, 16384, 16383, 1. afaics libcurl tries to write 16384 via nghttp2_session_send but that calls data_source_read_callback requesting 16383, see a vs
Yes that's the question. A good next step I think will be in Windows 7 I will try a few PUTs of the same size in nghttp client, it looks like it is possible by setting the header method to PUT. |
I can answer why there are 16383 and 1 byte writes separately. |
Replacing #!/bin/sh
alias nghttp="nghttp -H ':method: PUT' -H 'Content-Type: application/x-www-form-urlencoded' https://jsonplaceholder.typicode.com/put/ -d blob.bin"
# OK
dd if=/dev/zero of=blob.bin bs=64k count=1 2> /dev/null; nghttp
dd if=/dev/zero of=blob.bin bs=64k count=2 2> /dev/null; nghttp
dd if=/dev/zero of=blob.bin bs=64k count=3 2> /dev/null; nghttp
dd if=/dev/zero of=blob.bin bs=64k count=4 2> /dev/null; nghttp
Original example keeps failing using latest
|
Just a keep-alive note, that the issue applies to the current latest release as well:
|
Does not repro:
Repro: |
If I reduce the buffer size in
... it seems to work every time. When I redirect in the output to a file and enable the progress meter, it seems to work every now and then. Seems buffer/alignment and timing sensitive. I added
15 seconds where nothing is logged. If I set a break-point where the output above is printed, it is clear that libcurl considers all data sent and nothing left to send.
|
It's definitely sensitive to something, there was one computer I was able to reproduce it on but no others. |
With this fresh version:
The failure is still there, but this time I've seen it succeed with |
This mysterious error still prevails as of:
|
And in:
run under Wine 3.0:
|
I ran the example command line with a 256K blob.bin file multiple times using current master and I only get 500 responses back saying I'll try against some other services... |
The |
Any chance you can build curl with |
Managed to finish it just now (without mistakes I hope):
The 192K test failed (256K was strangely okay this time). |
Ran into the same issue, doesn't work most of the time, but sometimes does pass through. Can't repro with nghttp CLI either. |
Funny thing is it doesn't reproduce at all with |
Just a retest with the latest components (problem still there, though this time I got one successful run using the Windows binary.):
|
I've come back to this again and I tested current 7.61.0-DEV on Linux against my local Apache over HTTP and my local nginx over HTTPS as well as with the original URL in this report, but none of them reproduces this problem for me. On my mac it does however still reproduce fairly easily, but not with any When it fails, the |
I just reran this test due to the many recent h2 changes. I can still trigger it on mac. I've not been able to reproduce it on Linux. |
However, about 29 times of 30 (or so), this site replies with the contents below, which makes this terribly annoying to debug. (It seems weird that it actually sometimes doesn't return this error since the PUT size is passed in the request headers) I'd be very happy for a recipe that is easier to reproduce... I managed to record the flaw on the mac with tcpdump and SSLKEYLOGFILE, but I fail to get wireshark to show me all the h2 frames correctly. What's clear however that in this case the server responds with a "500 internal server error", which I guess is because it doesn't actually like the size of the PUT.
|
Testing with a brand new build, via wine, same results as before, including the randomness:
Not sure if these are useful, but recently wine shows these issues (repeatedly) while running
UPDATE: Same when using a TLSv1.3 connection. |
I'm a beginner with curl HTTP/H2 internals and this may well be trivial/known or not super helpful, anyhow, some tracing revealed that after actually sending the full payload, the In the failing cases Next, in In successful cases Above was consistently repeatable. I used the servers in the OP for these tests (served now by Cloudflare), with 64K payload for successful tests and 3x64K for failing ones, using |
I'm not sure those details matter much since they're all happening in an error situation that likely shouldn't happen in the first place. I've chased this bug as well, but it is a tricky one. It seems to only happen on mac for me (which in my world means a less good environment to trace and debug it). It ends up being an error returned from nghttp2, but from tests above in this issue there are reasons to believe that the error is somehow induced by curl or caused by the way curl uses nghttp2 or similar (thus not happening to plain nghttp2 use that doesn't involve curl). I just haven't figured out what the situation is that makes nghttp2 return this error. It would be useful to record the data (with wireshark or similar) when the problem appears, and it would be interesting to trace into nghttp2 to see what exactly it barfs on or expects there to be that isn't or similar. Alternatively, if we can add some verbose logging from nghttp2 and still trigger the problem. |
I've rebuilt curl with a debug nghttp2 as a next step. Because As a kind of distraction I bisected the largest working payload size, which turned out to be 134519 (=0x20D77) bytes. This was 100% consistent throughout these tests, and even with the latest stable Windows release when run via wine in the same env. (Same Attached is outputs of these two commands (one ok, one fail): alias curl="./curl -sS -X PUT https://jsonplaceholder.typicode.com/put/ --trace-time --data-binary @blob.bin"
dd if=/dev/zero of=blob.bin bs=134519 count=1 2> /dev/null; curl -v --trace-ascii /dev/stdout > _out_ok.txt 2>&1
dd if=/dev/zero of=blob.bin bs=134520 count=1 2> /dev/null; curl -v --trace-ascii /dev/stdout > _out_fail.txt 2>&1
|
Problem persists with this fresh Windows build via latest
|
@vszakats , just another pointer to think at, could it be related to TCP - MSS and segmentation offload? some tools to help in that would be 'ethtool -k | grep offload' |
Can be related, though TCP internals are likely on their default values on at least a significant portion of the components I've made these tests on. Throughout times there have been multiple web servers (including OS, host provider, maybe even server software, though I'm not sure about that), internet exhanges, ISPs, broadband hardware, routers, local networks, client hardware and OS types/versions involved, but the issue notoriously remains. At the same time |
Just bumped into these settings on my host OS (macOS) and the numbers were familiar:
131072 is the largest (+/- a few bytes) payload size that works without the issues reported here. Tried to bump both buffers to 256 KB:
With the above, the 3rd test (with 192 KB) consistently runs fine, only the 4th one (256 KB) failed intermittently. Then tried to bump them to 512 KB:
With this, all 4 tests run fine and consistently so. Version tested:
I've originally found this issue on a different machine (W2K3), which is a virtualized one (via VMware), but sadly I have no idea about the physical hardware there or any of its TCP settings. Anyhow it looks like there is indeed a relation between TCP window sizes and the point where curl's H2 transfers start failing. |
This looks hopeless, so I'm taking the general direction to abandon the Windows platform and migrate to Linux. [The other blow for Windows was nghttp2's 1.40.0 release breaking non-MSVC builds without hope to fix this outside nghttp2 with reasonable effort.] So, closing this one. Please reopen if anything new comes up regarding this. |
I can reproduce what I described in my post from over 2 years ago but now only in the packaged cygwin version of curl.
Just to recap the issue I observed was different from what is reported but surely related. According to Wireshark all of the data is sent except for the last chunk. At this point curl hangs in a busy loop for 1 minute until it receives an http error reply from the server, and only then does curl send the last chunk of data. I also monitored in an API monitor this time. I chose module ws2_32 (Winsock 2) and monitored everything except WSAEnumNetworkEvents since there are millions of them. The API monitor shows that the normal progression is curl will send data until it is blocked by Winsock at which point it waits and tries again: The pattern repeats until the last chunk is blocked, and curl does not try to send it again until after receiving the reply from the server: There is definitely a bug in curl since it should not be eating CPU waiting for the reply. Recall curl prints "We are completely uploaded and fine" after the last chunk fails, even though everything is not uploaded at that time. Even if OpenSSL or nghttp2 turns out to be at fault curl still should not eat cpu. I'm loathe to put this in known bugs. We should fix this, if any of us can find it. That I cannot, on the same computer and OS, reproduce this from source any longer is maddening. The next step for anyone who can would be sprinkle printfs in all send functions to see what exactly is telling us that the final chunk is sent when it is not. Eating cpu may or may not be separate from that.
|
@jay: What you describe here seems what I was also seeing. (Tried the sprinkle-printf method a while ago, but did not manage to reach a useful conclusion.) |
Previously I've managed to repro this issue on mac only, and I really cannot stand debugging curl at this level on that platform. 😢 I've tried really hard to repro on Linux where I feel I'm most able to use tools to track something like this down, but I've never managed.
How can that happen? As it isn't using chunked encoding, that should only be possible when curl has successfully managed to send off the entire file. Right? This condition: Lines 1147 to 1152 in 147fa06
|
Does the apparent relation between the occurrence of the problem and TCP-buffer size (on macOS at least) may point to some known pattern? For a non-network-expert as myself, this looks a little puzzling as one would assume that such low-level setting should not be able to break apps using standard OS function calls. Perhaps unless there is some API-call or app-code corner case that is breaking when hitting on/around low-level buffer boundaries. (Disclaimer: Just shooting blindly in the dark here.) |
What you've observed is likely because the the size of the buffer affects whether the data can be sent immediately. If adding printfs didn't help you could try watching And at the same time it would be useful to see what OpenSSL is returning. We call SSL_write which uses the default send() or whatever. I would start there: diff --git a/lib/vtls/openssl.c b/lib/vtls/openssl.c
index 726ff6e..0f221d2 100644
--- a/lib/vtls/openssl.c
+++ b/lib/vtls/openssl.c
@@ -3835,9 +3835,11 @@ static ssize_t ossl_send(struct connectdata *conn,
memlen = (len > (size_t)INT_MAX) ? INT_MAX : (int)len;
rc = SSL_write(BACKEND->handle, mem, memlen);
+ fprintf(stderr, "SSL_write(): %d out of %d bytes\n", rc, memlen);
if(rc <= 0) {
err = SSL_get_error(BACKEND->handle, rc);
+ fprintf(stderr, "SSL_get_error(): %d\n", err);
switch(err) {
case SSL_ERROR_WANT_READ: |
I hit noticed this bug today, and think I know what might be going on. For example: when sending a payload of 100 bytes, the following first 9 bytes are added (type=0x0): "00 00 64 00 01 00 00 00 01" followed by the 100-byte payload. Everything is fine, and the total length of the message is 100+9 bytes. However, when a payload is very large, forcing curl to split it into chunks (in this case 16,384 bytes), then the header is generated as: "00 40 00 00 00 00 00 00 01" even though the payload is actually 16384-9=16375 bytes. It might be that the bug is that curl is in this particular case (when splitting into chunks) not following the rule "The 9 octets of the frame header are not included in this value." on page 12 of the RFC. More probable, it might be that the internal buffer in curl is not large enough when splitting, since it seems like the 9 missing bytes of the payload is actually sent after (as a separate packet). These following missing 9 bytes of the payload will be viewed a new frame by the server, and since the type of that frame will be more or less random it will most certainly result in a PROTOCOL_ERROR. Might it be that the buffer was set to 16384, when it was intended to be 16384+9=16393 (payload+9 first bytes)? Here is a illustration of the problem (data from my own server implementation, sent by curl): 00000000: 0b 10 94 b7 21 22 08 b4 d5 ....!"... |
Any theory that the buffer size is somehow the reason for this would need to also explain how come we only see this problem on a few rare systems? If the size is indeed involved, there's a combination with something else that makes this happen. We use the same default buffer size everywhere. @olbender: you mean you reproduced the bug? If so, on what system/OS? |
during PUT curl gets chunks of data up to 64k from the file and passes that to http2 and either curl or nghttp2 lib is breaking it into 16k or less DATA frames, so you would expect a bunch of 16384+9=16393, and that's what I see: SSL_write(): -1 out of 16393 bytes stack during PUT where Curl_write passes 64k to http2:
then Can you send us a Wireshark capture of what you are talking about? You will need to set the SSLKEYLOGFILE environment variable to a file to record the decryption keys, and you will need to make Wireshark aware of that in |
bump - when I do:
|
@vszakats can you still reproduce? |
@bagder I’ll be able to make tests ~next Wednesday. Will post the results somewhere around then. |
Did some tests with native macOS (with yesterday's not-yet-bug-patched Homebrew build) and macOS + wine + curl-for-win builds, and the issue seems to be there still. |
We have this issue with CURL request. This is C++ program using curl lib version 7.65.1 curl_easy_perform() is called with debug callback installed. CURL shows these steps while processing the request - => send data : {our request data here} We set the timeout to 60000ms. So the request fails after 1 min. Can someone please advise what may be causing this and how this can be resolved. Is this a problem with the client? Seems like the server side never got our request. The request we send is JSON document, size=4197 bytes |
On Fri, Jan 28, 2022 at 05:23:49PM -0800, hiic wrote:
We have this issue with CURL request. This is C++ program using curl lib
version 7.65.1
This is an old version. Does this problem occur with a recent version?
curl_easy_perform() is called with debug callback installed. CURL shows these
steps while processing the request -
=> send data : {our request data here}
== info : We are completely uploaded and fine
== info : Operation timed out after 60004 ms with 0 bytes received.
A PUT request requires a response, but this shows that thee server has not
responded within the time allotted. You need to find out why the server isn't
sending anything. Try setting up trace callbacks to look at the data being sent
and see if it's as you expect. Or, try performing the same operation with the
curl CLI to see if that works.
|
Thank you for the reply. We do not have a later version available in the company at the moment. What would be a recommended version to use. This is POST request. The sever is expected to respond with the result of the query. Does it still mean the server did not respond in the time allowed? i am using CUROPT_DEBUGFUNCTION. Is that what you mean by setting up trace callback? I will try curl on command line to see if that reproduces the issue. |
Also, if a server did not reply, why would a more recent curl version make a difference? |
On Fri, Jan 28, 2022 at 06:31:53PM -0800, hiic wrote:
We do not have a later version available in the company at the moment. What
would be a recommended version to use.
The latest version, with the least number of known bugs, is 7.81.0
This is POST request. The sever is expected to respond with the result of the
query. Does it still mean the server did not respond in the time allowed?
That's what it appears.
i am using CUROPT_DEBUGFUNCTION. Is that what you mean by setting up trace
callback?
I'm talking about the kind of data you get from the --trace* options available
in the CLI. Mostly, CURLOPT_DEBUGFUNCTION is the interesting one.
Also, if a server did not reply, why would a more recent curl version make a
difference?
If there is actually a libcurl bug involved here, and the universe of possible
bugs that could cause the server to not send a response is vast, then this
would rule that out.
|
I did this
Run (this particular script on macOS):
Outputs for the FAIL cases above:
-v
output:408
is: Request TimeoutI expected the following
Upload to finish successfully, the same way as in HTTP/1.1 mode and 64-128 KiB payloads with H2.
curl/libcurl version
macOS + Homebrew (
brew install curl-openssl
):Windows Server 2012 R2:
UPDATED 2019-02-06: Applied updates necessary for recent Homebrew versions (aka use
curl-openssl
package instead of acurl
with custom options).The text was updated successfully, but these errors were encountered: