Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curl H2 PUT fails with PROTOCOL_ERROR beyond a certain payload size #1410

Closed
vszakats opened this issue Apr 11, 2017 · 46 comments
Closed

curl H2 PUT fails with PROTOCOL_ERROR beyond a certain payload size #1410

vszakats opened this issue Apr 11, 2017 · 46 comments
Assignees
Labels

Comments

@vszakats
Copy link
Member

@vszakats vszakats commented Apr 11, 2017

I did this

Run (this particular script on macOS):

#!/bin/sh

alias curl='/usr/local/opt/curl-openssl/bin/curl -sS -X PUT https://jsonplaceholder.typicode.com/put/ --data-binary @-'

# OK
dd if=/dev/zero bs=64k count=1 2> /dev/null | curl --http1.1
dd if=/dev/zero bs=64k count=2 2> /dev/null | curl --http1.1
dd if=/dev/zero bs=64k count=3 2> /dev/null | curl --http1.1
dd if=/dev/zero bs=64k count=4 2> /dev/null | curl --http1.1

# OK
dd if=/dev/zero bs=64k count=1 2> /dev/null | curl
dd if=/dev/zero bs=64k count=2 2> /dev/null | curl
# FAIL
dd if=/dev/zero bs=64k count=3 2> /dev/null | curl
dd if=/dev/zero bs=64k count=4 2> /dev/null | curl

Outputs for the FAIL cases above:

  • curl output:
curl: (92) HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)
  • curl -v output:
[...]
> User-Agent: curl/7.53.1
> Accept: */*
> Content-Length: 196608
> Content-Type: application/x-www-form-urlencoded
>.
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
* We are completely uploaded and fine

[60-second wait]

* HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)
* Closing connection 0
* TLSv1.2 (OUT), TLS alert, Client hello (1):
  • Server log (nginx 1.10.2):
[...] "PUT /api/path/ HTTP/2.0" 408 0 "-" "curl/7.53.1" "-"

408 is: Request Timeout

I expected the following

Upload to finish successfully, the same way as in HTTP/1.1 mode and 64-128 KiB payloads with H2.

curl/libcurl version

macOS + Homebrew (brew install curl-openssl):

curl 7.53.1 (x86_64-apple-darwin16.4.0) libcurl/7.53.1 OpenSSL/1.0.2k zlib/1.2.8 libssh2/1.8.0 nghttp2/1.21.1
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: IPv6 Largefile NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy 

Windows Server 2012 R2:

curl 7.53.1 (x86_64-pc-win32) libcurl/7.53.1 OpenSSL/1.1.0e zlib/1.2.11 WinIDN libssh2/1.8.0 nghttp2/1.21.1
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smtp smtps telnet tftp 
Features: AsynchDNS IDN IPv6 Largefile SSPI Kerberos SPNEGO NTLM SSL libz TLS-SRP HTTP2 HTTPS-proxy 

UPDATED 2019-02-06: Applied updates necessary for recent Homebrew versions (aka use curl-openssl package instead of a curl with custom options).

@vszakats
Copy link
Member Author

@vszakats vszakats commented Apr 11, 2017

I'm not sure if this issue belongs to curl (or one of its dependencies), but maybe it is interesting nevertheless.

@vszakats vszakats added the HTTP/2 label Apr 11, 2017
@vszakats
Copy link
Member Author

@vszakats vszakats commented Apr 11, 2017

Updated the test code to work with a public request-tester website. It reproduces the exact same issue.

I've also added the instruction used to obtain the tested Homebrew curl build (it doesn't include H2 by default.)

@jay
Copy link
Member

@jay jay commented Apr 12, 2017

The way you are doing it will send content type urlencoded. I do notice similar though:

curl -v -sS https://nghttp2.org/httpbin/put -T blob.bin

Every time the server will wait ~60 seconds before sending back a 502 reply, according to a number of tries I monitored in Wireshark. And as you say in --http1.1 it will echo back the content immediately.

Another thing interesting about the http2 case is curl eats CPU for the minute it is waiting for the reply. It looks like curl_multi_wait is not actually waiting.

Reproduced in:

curl 7.53.1 (i386-pc-win32) libcurl/7.53.1 OpenSSL/1.0.2k nghttp2/1.21.1
Protocols: dict file ftp ftps gopher http https imap imaps ldap pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS Debug Largefile NTLM SSL HTTP2 HTTPS-proxy

curl 7.53.1 (i386-pc-win32) libcurl/7.53.1 wolfSSL/3.10.2 nghttp2/1.21.1
Protocols: dict file ftp ftps gopher http https imap imaps ldap pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS Debug Largefile NTLM SSL HTTP2

curl 7.54.0-DEV (i686-w64-mingw32) libcurl/7.54.0-DEV mbedTLS/2.4.2 zlib/1.2.11 libssh2/1.8.0 nghttp2/1.19.0
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smtp smtps telnet tftp
Features: AsynchDNS Debug TrackMemory IPv6 Largefile SSPI Kerberos SPNEGO NTLM SSL libz HTTP2

curl issue #1410.zip is a Wireshark capture (macs and stuff anonymized) made from openssl_dump_secrets branch and these commands:

dd if=/dev/zero of=blob.bin bs=64k count=4 2> /dev/null;
curl -v -sS -X PUT https://nghttp2.org/httpbin/put --data-binary @blob.bin
curl issue #1410.keys      <--- ssl keys for above capture
curl issue #1410.pcapng    <--- packet capture of demo

To view the packet capture load the file then load the keys:
Edit > Preferences > Protocols > SSL > Master-Secret log

It shows libcurl is sending in 16384 DATA chunks, except for the first 64k which oddly it's doing as 16383 + 1

16384 +
16384 +
16384 +
16383 +  <--- 65535 total so far
    1 +  <--- ?? hm don't know why this is split off
16384 +
16384 +
16384 +
16384 +
16384 +
16384 +
16384 +
16384 +
16384 +
16384 +
16384 +  <-- libcurl gets into tight loop here, presumably waiting for reply
16384 =  <-- libcurl sends this last DATA after receiving the 502 one min later
-------
262144

curl issue 1410

@vszakats
Copy link
Member Author

@vszakats vszakats commented Apr 12, 2017

@jay: Thank you for your extensive tests and info (also for your note about Content-type, I've fixed it locally). This confirms the problem is valid. Question if it's curl or nghttp2-specific.

@jay
Copy link
Member

@jay jay commented Apr 13, 2017

This may be timing dependent, I was unable to reproduce today in Windows 8.1 (wireless) and Ubuntu 16 (wired), it works fine there. My repro above was in Windows 7.

In all of them though it does start with 16384, 16384, 16384, 16383, 1. afaics libcurl tries to write 16384 via nghttp2_session_send but that calls data_source_read_callback requesting 16383, see a vs
codemap for an illustration.

Question if it's curl or nghttp2-specific.

Yes that's the question. A good next step I think will be in Windows 7 I will try a few PUTs of the same size in nghttp client, it looks like it is possible by setting the header method to PUT.

@tatsuhiro-t
Copy link
Contributor

@tatsuhiro-t tatsuhiro-t commented Apr 13, 2017

I can answer why there are 16383 and 1 byte writes separately.
Initially, server side flow control window, for both connection and stream, is 65535.
So you can send 16384, 16384, 16384, and 16383, and stop sending data further, wait for WINDOW_UPDATE from server.
OK, then after WINDOW_UPDATE arrives, we continue to send the remaining data. Previously, curl requested 16384 bytes data to send, but libnghttp2 only send 16383, leaving 1 byte. Therefore, the first write after WINDOW_UPDATE is this just 1 byte.

@vszakats
Copy link
Member Author

@vszakats vszakats commented Jul 3, 2017

Replacing curl with the nghttp client, the same uploads do work correctly. If my tests are correct and they can be replicated by others, it means that the issue lies somewhere inside libcurl code.

#!/bin/sh

alias nghttp="nghttp -H ':method: PUT' -H 'Content-Type: application/x-www-form-urlencoded' https://jsonplaceholder.typicode.com/put/ -d blob.bin"

# OK
dd if=/dev/zero of=blob.bin bs=64k count=1 2> /dev/null; nghttp
dd if=/dev/zero of=blob.bin bs=64k count=2 2> /dev/null; nghttp
dd if=/dev/zero of=blob.bin bs=64k count=3 2> /dev/null; nghttp
dd if=/dev/zero of=blob.bin bs=64k count=4 2> /dev/null; nghttp

nghttp version:

$ nghttp --version
nghttp nghttp2/1.24.0

Original example keeps failing using latest curl built against nghttp2 1.24.0:

$ curl -V
curl 7.54.1 (x86_64-apple-darwin16.6.0) libcurl/7.54.1 OpenSSL/1.0.2l zlib/1.2.8 libssh2/1.8.0 nghttp2/1.24.0
Release-Date: 2017-06-14
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: IPv6 Largefile NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy 
@vszakats
Copy link
Member Author

@vszakats vszakats commented Aug 17, 2017

Just a keep-alive note, that the issue applies to the current latest release as well:

curl 7.55.1 (x86_64-apple-darwin16.7.0) libcurl/7.55.1 OpenSSL/1.0.2l zlib/1.2.8 libssh2/1.8.0 nghttp2/1.24.0
Release-Date: 2017-08-14
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IPv6 Largefile NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy 
@bagder bagder self-assigned this Aug 18, 2017
@bagder
Copy link
Member

@bagder bagder commented Aug 18, 2017

Does not repro:

curl 7.56.0-DEV (x86_64-pc-linux-gnu) libcurl/7.56.0-DEV OpenSSL/1.1.0f zlib/1.2.8 c-ares/1.12.1-DEV libidn2/2.0.2 libpsl/0.18.0 (+libidn2/2.0.2) nghttp2/1.23.0-DEV librtmp/2.3

Repro:
curl 7.56.0-DEV (x86_64-apple-darwin16.7.0) libcurl/7.56.0-DEV OpenSSL/1.0.2l zlib/1.2.8 nghttp2/1.21.1

@bagder
Copy link
Member

@bagder bagder commented Aug 18, 2017

If I reduce the buffer size in curl/curl.h to 8K:

#define CURL_MAX_WRITE_SIZE 8000

... it seems to work every time. When I redirect in the output to a file and enable the progress meter, it seems to work every now and then. Seems buffer/alignment and timing sensitive.

I added --trace-time to the command line and when I get the problem, the pattern is always a 15 second pause. It ends like this:

17:23:06.136874 == Info: We are completely uploaded and fine

15 seconds where nothing is logged. If I set a break-point where the output above is printed, it is clear that libcurl considers all data sent and nothing left to send.

17:23:21.118812 == Info: http2_recv: easy 0x7fcce5005408 (stream 1)
17:23:21.118843 <= Recv SSL data, 5 bytes (0x5) 0000: ....%
17:23:21.118870 == Info: nread=13
17:23:21.118876 == Info: on_frame_recv() header 3 stream 1
17:23:21.118881 == Info: Got frame type 3 for stream 1!
17:23:21.118891 == Info: on_stream_close(), PROTOCOL_ERROR (err 1), stream 1

@jay
Copy link
Member

@jay jay commented Aug 18, 2017

It's definitely sensitive to something, there was one computer I was able to reproduce it on but no others.

@vszakats
Copy link
Member Author

@vszakats vszakats commented Oct 4, 2017

With this fresh version:

curl 7.56.0 (x86_64-apple-darwin17.0.0) libcurl/7.56.0 OpenSSL/1.0.2l zlib/1.2.11 libssh2/1.8.0 nghttp2/1.26.0
Release-Date: 2017-10-04
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IPv6 Largefile NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy 

The failure is still there, but this time I've seen it succeed with bs=64k count=4 a few times. It means I can also confirm it being sensitive to something.

@vszakats
Copy link
Member Author

@vszakats vszakats commented Jan 24, 2018

This mysterious error still prevails as of:

curl 7.58.0 (x86_64-apple-darwin17.4.0) libcurl/7.58.0 OpenSSL/1.0.2n zlib/1.2.11 brotli/1.0.2 libidn2/2.0.4 libssh2/1.8.0 nghttp2/1.29.0
Release-Date: 2018-01-24
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IDN IPv6 Largefile NTLM NTLM_WB SSL libz brotli TLS-SRP HTTP2 UnixSockets HTTPS-proxy 
@vszakats
Copy link
Member Author

@vszakats vszakats commented Mar 14, 2018

And in:

curl 7.59.0 (x86_64-apple-darwin17.4.0) libcurl/7.59.0 OpenSSL/1.0.2n zlib/1.2.11 brotli/1.0.3 libidn2/2.0.4 libssh2/1.8.0 nghttp2/1.31.0
Release-Date: 2018-03-14
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IDN IPv6 Largefile NTLM NTLM_WB SSL libz brotli TLS-SRP HTTP2 UnixSockets HTTPS-proxy

run under Wine 3.0:

curl 7.59.0 (x86_64-pc-win32) libcurl/7.59.0 OpenSSL/1.1.0g (WinSSL) zlib/1.2.11 brotli/1.0.3 WinIDN libssh2/1.8.0 nghttp2/1.31.0
Release-Date: 2018-03-14
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IDN IPv6 Largefile SSPI Kerberos SPNEGO NTLM SSL libz brotli TLS-SRP HTTP2 HTTPS-proxy MultiSSL 
@bagder
Copy link
Member

@bagder bagder commented Mar 15, 2018

I ran the example command line with a 256K blob.bin file multiple times using current master and I only get 500 responses back saying PayloadTooLargeError: request entity too large in the body and then some sort of stack trace. Attaching my --trace-ascii output from this: dump.txt

I'll try against some other services...

@vszakats
Copy link
Member Author

@vszakats vszakats commented Mar 15, 2018

The PayloadTooLargeError is correct for this test server when sending larger payloads. Not sure why I'm getting the errors, maybe it's proxy/firewall related?

@bagder
Copy link
Member

@bagder bagder commented Mar 15, 2018

Any chance you can build curl with ./configure --enable-debug and then add #define DEBUG_HTTP2 at the top of lib/http2.c, rebuild it all and then attach your --trace-ascii logfile here? It might offer some additional clues.

@vszakats
Copy link
Member Author

@vszakats vszakats commented Mar 15, 2018

Managed to finish it just now (without mistakes I hope):

curl 7.59.0 (x86_64-apple-darwin17.4.0) libcurl/7.59.0 OpenSSL/1.0.2n zlib/1.2.11 brotli/1.0.3 libidn2/2.0.4 libssh2/1.8.0 nghttp2/1.31.0
Release-Date: 2018-03-14
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IDN IPv6 Largefile NTLM NTLM_WB SSL libz brotli TLS-SRP HTTP2 UnixSockets HTTPS-proxy 

The 192K test failed (256K was strangely okay this time).
Also attached the OK run with 128K.

logfile-128K-OK.txt
logfile-192K-FAIL.txt

@vszakats
Copy link
Member Author

@vszakats vszakats commented Mar 20, 2018

Diff view of the OK/FAIL output tail:
screen shot 2018-03-20 at 02 42 22

After the end of the upload, only 13 bytes was read from the server in the failed case and 1175 bytes in the other.

@RReverser
Copy link

@RReverser RReverser commented Apr 11, 2018

Ran into the same issue, doesn't work most of the time, but sometimes does pass through. Can't repro with nghttp CLI either.

@RReverser
Copy link

@RReverser RReverser commented Apr 11, 2018

Funny thing is it doesn't reproduce at all with --trace or --trace-ascii so definitely sensitive to buffering or something.

@vszakats
Copy link
Member Author

@vszakats vszakats commented May 16, 2018

Just a retest with the latest components (problem still there, though this time I got one successful run using the Windows binary.):

curl 7.60.0 (x86_64-apple-darwin17.5.0) libcurl/7.60.0 OpenSSL/1.0.2o zlib/1.2.11 brotli/1.0.4 libidn2/2.0.4 libssh2/1.8.0 nghttp2/1.32.0
Release-Date: 2018-05-16
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IDN IPv6 Largefile NTLM NTLM_WB SSL libz brotli TLS-SRP HTTP2 UnixSockets HTTPS-proxy 
curl 7.60.0 (x86_64-pc-win32) libcurl/7.60.0 OpenSSL/1.1.0h (WinSSL) zlib/1.2.11 brotli/1.0.4 WinIDN libssh2/1.8.0 nghttp2/1.32.0
Release-Date: 2018-05-16
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IDN IPv6 Largefile SSPI Kerberos SPNEGO NTLM SSL libz brotli TLS-SRP HTTP2 HTTPS-proxy MultiSSL 
@bagder
Copy link
Member

@bagder bagder commented May 21, 2018

I've come back to this again and I tested current 7.61.0-DEV on Linux against my local Apache over HTTP and my local nginx over HTTPS as well as with the original URL in this report, but none of them reproduces this problem for me.

On my mac it does however still reproduce fairly easily, but not with any --trace option, which is highly annoying and makes debugging difficult.

When it fails, the on_stream_close callback gets called by nghttp2, closing the stream with PROTOCOL_ERROR.

@bagder
Copy link
Member

@bagder bagder commented Jul 31, 2018

I just reran this test due to the many recent h2 changes. I can still trigger it on mac. I've not been able to reproduce it on Linux.

@bagder
Copy link
Member

@bagder bagder commented Jul 31, 2018

However, about 29 times of 30 (or so), this site replies with the contents below, which makes this terribly annoying to debug. (It seems weird that it actually sometimes doesn't return this error since the PUT size is passed in the request headers) I'd be very happy for a recipe that is easier to reproduce...

I managed to record the flaw on the mac with tcpdump and SSLKEYLOGFILE, but I fail to get wireshark to show me all the h2 frames correctly. What's clear however that in this case the server responds with a "500 internal server error", which I guess is because it doesn't actually like the size of the PUT.

PayloadTooLargeError: request entity too large
    at readStream (/app/node_modules/raw-body/index.js:155:17)
    at getRawBody (/app/node_modules/raw-body/index.js:108:12)
    at read (/app/node_modules/body-parser/lib/read.js:77:3)
    at urlencodedParser (/app/node_modules/body-parser/lib/types/urlencoded.js:116:5)
    at Layer.handle [as handle_request] (/app/node_modules/express/lib/router/layer.js:95:5)
    at trim_prefix (/app/node_modules/express/lib/router/index.js:317:13)
    at /app/node_modules/express/lib/router/index.js:284:7
    at Function.process_params (/app/node_modules/express/lib/router/index.js:335:12)
    at next (/app/node_modules/express/lib/router/index.js:275:10)
    at jsonParser (/app/node_modules/body-parser/lib/types/json.js:119:7)
@vszakats
Copy link
Member Author

@vszakats vszakats commented Sep 12, 2018

Testing with a brand new build, via wine, same results as before, including the randomness:

curl 7.61.1 (x86_64-pc-win32) libcurl/7.61.1 OpenSSL/1.1.1 (WinSSL) zlib/1.2.11 brotli/1.0.5 WinIDN libssh2/1.8.0 nghttp2/1.33.0
Release-Date: 2018-09-05
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IDN IPv6 Largefile SSPI Kerberos SPNEGO NTLM SSL libz brotli TLS-SRP HTTP2 HTTPS-proxy MultiSSL 

Not sure if these are useful, but recently wine shows these issues (repeatedly) while running curl.exe for these tests:

0057:fixme:ntdll:server_ioctl_file Unsupported ioctl 4004747b (device=4004 access=1 func=d1e method=3)
0057:fixme:winsock:server_ioctl_sock Unsupported ioctl 4004747b (device=4004 access=1 func=d1e method=3)
0057:fixme:winsock:WSAIoctl unsupported WS_IOCTL cmd (_IOR('t', 123, 4))
curl: (92) HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)

UPDATE: Same when using a TLSv1.3 connection.

@vszakats
Copy link
Member Author

@vszakats vszakats commented Sep 26, 2018

I'm a beginner with curl HTTP/H2 internals and this may well be trivial/known or not super helpful, anyhow, some tracing revealed that after actually sending the full payload, the KEEP_RECV bit remains stuck in k->keepon in the erroneous cases, causing curl to keep looping for 1 minute.

In the failing cases http2_perform_getsock() (in lib/http2.c) gets called twice after uploading the full payload. This doesn't happen in successful cases.

Next, in Curl_readwrite(), Curl_socket_check() will assign CURL_CSELECT_IN (=1) to select_res in successful runs, and zero in failing cases.

In successful cases KEEP_RECV is cleared inside readwrite_data() (in lib/transfer.c), after a few (4) iterations.

Above was consistently repeatable. I used the servers in the OP for these tests (served now by Cloudflare), with 64K payload for successful tests and 3x64K for failing ones, using curl 7.62.0-DEV (x86_64-apple-darwin17.7.0) libcurl/7.62.0-DEV OpenSSL/1.1.1 zlib/1.2.11 brotli/1.0.6 libidn2/2.0.5 libpsl/0.20.2 (+libidn2/2.0.5) nghttp2/1.33.0 librtmp/2.3 on macOS Mojave.

@bagder
Copy link
Member

@bagder bagder commented Sep 27, 2018

I'm not sure those details matter much since they're all happening in an error situation that likely shouldn't happen in the first place.

I've chased this bug as well, but it is a tricky one. It seems to only happen on mac for me (which in my world means a less good environment to trace and debug it). It ends up being an error returned from nghttp2, but from tests above in this issue there are reasons to believe that the error is somehow induced by curl or caused by the way curl uses nghttp2 or similar (thus not happening to plain nghttp2 use that doesn't involve curl). I just haven't figured out what the situation is that makes nghttp2 return this error.

It would be useful to record the data (with wireshark or similar) when the problem appears, and it would be interesting to trace into nghttp2 to see what exactly it barfs on or expects there to be that isn't or similar. Alternatively, if we can add some verbose logging from nghttp2 and still trigger the problem.

@vszakats
Copy link
Member Author

@vszakats vszakats commented Sep 27, 2018

I've rebuilt curl with a debug nghttp2 as a next step. Because Curl_socket_check() calls poll() on Linux and select() on macOS/Windows, and these latter two have problems, I tried poll() on macOS (10.14 has a working implementation) to see if it makes any difference. It didn't.

As a kind of distraction I bisected the largest working payload size, which turned out to be 134519 (=0x20D77) bytes. This was 100% consistent throughout these tests, and even with the latest stable Windows release when run via wine in the same env. (Same curl.exe worked with any tested size on Win10 and Win7, which is interesting.) Anyhow, this allowed to create ok/fail test runs with the least amount of difference.

Attached is outputs of these two commands (one ok, one fail):

alias curl="./curl -sS -X PUT https://jsonplaceholder.typicode.com/put/ --trace-time --data-binary @blob.bin"
dd if=/dev/zero of=blob.bin bs=134519 count=1 2> /dev/null; curl -v --trace-ascii /dev/stdout > _out_ok.txt 2>&1
dd if=/dev/zero of=blob.bin bs=134520 count=1 2> /dev/null; curl -v --trace-ascii /dev/stdout > _out_fail.txt 2>&1
curl 7.62.0-DEV (x86_64-apple-darwin18.0.0) libcurl/7.62.0-DEV OpenSSL/1.1.1 zlib/1.2.11 brotli/1.0.6 libidn2/2.0.5 libpsl/0.20.2 (+libidn2/2.0.5) nghttp2/1.33.0 librtmp/2.3
Release-Date: [unreleased]
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS Debug TrackMemory IDN IPv6 Largefile NTLM NTLM_WB SSL libz brotli TLS-SRP HTTP2 UnixSockets HTTPS-proxy PSL 

out_fail.txt
out_ok.txt

@vszakats
Copy link
Member Author

@vszakats vszakats commented Jun 6, 2019

Problem persists with this fresh Windows build via latest wine and macOS:

curl 7.65.1 (x86_64-pc-win32) libcurl/7.65.1 OpenSSL/1.1.1c (Schannel) zlib/1.2.11 brotli/1.0.7 WinIDN libssh2/1.8.2 nghttp2/1.38.0
Release-Date: 2019-06-05
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile MultiSSL NTLM SPNEGO SSL SSPI TLS-SRP brotli libz
@kunalekawde
Copy link

@kunalekawde kunalekawde commented Jun 13, 2019

@vszakats , just another pointer to think at, could it be related to TCP - MSS and segmentation offload? some tools to help in that would be 'ethtool -k | grep offload'

@vszakats
Copy link
Member Author

@vszakats vszakats commented Jun 13, 2019

Can be related, though TCP internals are likely on their default values on at least a significant portion of the components I've made these tests on. Throughout times there have been multiple web servers (including OS, host provider, maybe even server software, though I'm not sure about that), internet exhanges, ISPs, broadband hardware, routers, local networks, client hardware and OS types/versions involved, but the issue notoriously remains. At the same time nghttp works just fine as a client, whenever tried. So, if there was anything common in these tests, I couldn't pinpoint what it might be.

@vszakats
Copy link
Member Author

@vszakats vszakats commented Sep 6, 2019

Just bumped into these settings on my host OS (macOS) and the numbers were familiar:

$ sysctl -n net.inet.tcp.recvspace
131072
$ sysctl -n net.inet.tcp.sendspace
131072

131072 is the largest (+/- a few bytes) payload size that works without the issues reported here.

Tried to bump both buffers to 256 KB:

$ sudo sysctl net.inet.tcp.sendspace=262144
$ sudo sysctl net.inet.tcp.recvspace=262144

With the above, the 3rd test (with 192 KB) consistently runs fine, only the 4th one (256 KB) failed intermittently.

Then tried to bump them to 512 KB:

$ sudo sysctl net.inet.tcp.sendspace=524288
$ sudo sysctl net.inet.tcp.recvspace=524288

With this, all 4 tests run fine and consistently so.

Version tested:

curl 7.65.3 (x86_64-apple-darwin18.6.0) libcurl/7.65.3 OpenSSL/1.0.2s zlib/1.2.11 brotli/1.0.7 c-ares/1.15.0 libssh2/1.9.0 nghttp2/1.39.2 librtmp/2.3
Release-Date: 2019-07-19
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS brotli GSS-API HTTP2 HTTPS-proxy IPv6 Kerberos Largefile libz Metalink NTLM NTLM_WB SPNEGO SSL TLS-SRP UnixSockets

I've originally found this issue on a different machine (W2K3), which is a virtualized one (via VMware), but sadly I have no idea about the physical hardware there or any of its TCP settings.

Anyhow it looks like there is indeed a relation between TCP window sizes and the point where curl's H2 transfers start failing.

@vszakats
Copy link
Member Author

@vszakats vszakats commented Dec 7, 2019

This looks hopeless, so I'm taking the general direction to abandon the Windows platform and migrate to Linux. [The other blow for Windows was nghttp2's 1.40.0 release breaking non-MSVC builds without hope to fix this outside nghttp2 with reasonable effort.]

So, closing this one. Please reopen if anything new comes up regarding this.

@vszakats vszakats closed this Dec 7, 2019
@jay
Copy link
Member

@jay jay commented Dec 7, 2019

I can reproduce what I described in my post from over 2 years ago but now only in the packaged cygwin version of curl.

export SSLKEYLOGFILE=[my path to sslkeylog.txt]
dd if=/dev/zero of=blob.bin bs=64k count=4 2> /dev/null;
curl -v -sS -X PUT https://nghttp2.org/httpbin/put --data-binary @blob.bin

Just to recap the issue I observed was different from what is reported but surely related. According to Wireshark all of the data is sent except for the last chunk. At this point curl hangs in a busy loop for 1 minute until it receives an http error reply from the server, and only then does curl send the last chunk of data.

I also monitored in an API monitor this time. I chose module ws2_32 (Winsock 2) and monitored everything except WSAEnumNetworkEvents since there are millions of them.

The API monitor shows that the normal progression is curl will send data until it is blocked by Winsock at which point it waits and tries again:
apimon1
Above you can see captured call 99 sends 16k successfully then 100 send is attempted but fails (10035 is WSAEWOULDBLOCK), so it waits (I've skipped monitoring the enumevents for clarity and so that's not visible) and then when it's clear it tries again and in 105 sends it off successfully.

The pattern repeats until the last chunk is blocked, and curl does not try to send it again until after receiving the reply from the server:
apimon2
Above you can see captured call 276 which is the last chunk send is attempted but fails at 2:36:41, and then server reply is received in 282 at 2:37:39 (~1min later) and the failed last chunk is resent successfully immediately after in 283 at 2:37:39.

There is definitely a bug in curl since it should not be eating CPU waiting for the reply. Recall curl prints "We are completely uploaded and fine" after the last chunk fails, even though everything is not uploaded at that time. Even if OpenSSL or nghttp2 turns out to be at fault curl still should not eat cpu.

I'm loathe to put this in known bugs. We should fix this, if any of us can find it. That I cannot, on the same computer and OS, reproduce this from source any longer is maddening. The next step for anyone who can would be sprinkle printfs in all send functions to see what exactly is telling us that the final chunk is sent when it is not. Eating cpu may or may not be separate from that.

curl 7.66.0 (i686-pc-cygwin) libcurl/7.66.0 OpenSSL/1.1.1d zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.0.4) libssh/0.8.7/openssl/zlib nghttp2/1.37.0
Release-Date: 2019-09-11
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: AsynchDNS brotli Debug GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz Metalink NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP TrackMemory UnixSockets
@jay jay reopened this Dec 7, 2019
@vszakats
Copy link
Member Author

@vszakats vszakats commented Dec 7, 2019

@jay: What you describe here seems what I was also seeing. (Tried the sprinkle-printf method a while ago, but did not manage to reach a useful conclusion.)

@bagder
Copy link
Member

@bagder bagder commented Dec 7, 2019

Previously I've managed to repro this issue on mac only, and I really cannot stand debugging curl at this level on that platform. 😢 I've tried really hard to repro on Linux where I feel I'm most able to use tools to track something like this down, but I've never managed.

Recall curl prints "We are completely uploaded and fine" after the last chunk fails, even though everything is not uploaded at that time.

How can that happen? As it isn't using chunked encoding, that should only be possible when curl has successfully managed to send off the entire file. Right? This condition:

curl/lib/transfer.c

Lines 1147 to 1152 in 147fa06

if((!k->upload_chunky || k->forbidchunk) &&
(k->writebytecount == data->state.infilesize)) {
/* we have sent all data we were supposed to */
k->upload_done = TRUE;
infof(data, "We are completely uploaded and fine\n");
}

@vszakats
Copy link
Member Author

@vszakats vszakats commented Dec 7, 2019

Does the apparent relation between the occurrence of the problem and TCP-buffer size (on macOS at least) may point to some known pattern?

For a non-network-expert as myself, this looks a little puzzling as one would assume that such low-level setting should not be able to break apps using standard OS function calls. Perhaps unless there is some API-call or app-code corner case that is breaking when hitting on/around low-level buffer boundaries. (Disclaimer: Just shooting blindly in the dark here.)

@jay
Copy link
Member

@jay jay commented Dec 9, 2019

Does the apparent relation between the occurrence of the problem and TCP-buffer size (on macOS at least) may point to some known pattern?

What you've observed is likely because the the size of the buffer affects whether the data can be sent immediately. If adding printfs didn't help you could try watching strace -e trace=write curl ... (assuming write is what's used to send data) with your own repro to figure it out. I expect it will show you something preceding the hang like
write(1, \000\000....) = -1 EAGAIN (Resource temporarily unavailable)

And at the same time it would be useful to see what OpenSSL is returning. We call SSL_write which uses the default send() or whatever. I would start there:

diff --git a/lib/vtls/openssl.c b/lib/vtls/openssl.c
index 726ff6e..0f221d2 100644
--- a/lib/vtls/openssl.c
+++ b/lib/vtls/openssl.c
@@ -3835,9 +3835,11 @@ static ssize_t ossl_send(struct connectdata *conn,
 
   memlen = (len > (size_t)INT_MAX) ? INT_MAX : (int)len;
   rc = SSL_write(BACKEND->handle, mem, memlen);
+  fprintf(stderr, "SSL_write(): %d out of %d bytes\n", rc, memlen);
 
   if(rc <= 0) {
     err = SSL_get_error(BACKEND->handle, rc);
+    fprintf(stderr, "SSL_get_error(): %d\n", err);
 
     switch(err) {
     case SSL_ERROR_WANT_READ:
@olbender
Copy link

@olbender olbender commented Dec 9, 2019

I hit noticed this bug today, and think I know what might be going on. For example: when sending a payload of 100 bytes, the following first 9 bytes are added (type=0x0): "00 00 64 00 01 00 00 00 01" followed by the 100-byte payload. Everything is fine, and the total length of the message is 100+9 bytes.

However, when a payload is very large, forcing curl to split it into chunks (in this case 16,384 bytes), then the header is generated as: "00 40 00 00 00 00 00 00 01" even though the payload is actually 16384-9=16375 bytes. It might be that the bug is that curl is in this particular case (when splitting into chunks) not following the rule "The 9 octets of the frame header are not included in this value." on page 12 of the RFC. More probable, it might be that the internal buffer in curl is not large enough when splitting, since it seems like the 9 missing bytes of the payload is actually sent after (as a separate packet). These following missing 9 bytes of the payload will be viewed a new frame by the server, and since the type of that frame will be more or less random it will most certainly result in a PROTOCOL_ERROR. Might it be that the buffer was set to 16384, when it was intended to be 16384+9=16393 (payload+9 first bytes)?

Here is a illustration of the problem (data from my own server implementation, sent by curl):
.....
00003ff0: 21 30 08 b4 d5 f4 de 00 00 00 00 00 00 00 00 00 !0..............
(end packet 0, note the missing 9 bytes)

00000000: 0b 10 94 b7 21 22 08 b4 d5 ....!"...
(end packet 1, the missing 9 bytes of the previous payload, server sees this as a new frame)
type: 183 -> PROTOCOL_ERROR

@bagder
Copy link
Member

@bagder bagder commented Dec 10, 2019

Any theory that the buffer size is somehow the reason for this would need to also explain how come we only see this problem on a few rare systems? If the size is indeed involved, there's a combination with something else that makes this happen. We use the same default buffer size everywhere.

@olbender: you mean you reproduced the bug? If so, on what system/OS?

@jay
Copy link
Member

@jay jay commented Dec 10, 2019

during PUT curl gets chunks of data up to 64k from the file and passes that to http2 and either curl or nghttp2 lib is breaking it into 16k or less DATA frames, so you would expect a bunch of 16384+9=16393, and that's what I see:

SSL_write(): -1 out of 16393 bytes
SSL_get_error(): 3
SSL_write(): 16393 out of 16393 bytes
SSL_write(): 16393 out of 16393 bytes
SSL_write(): 12125 out of 12125 bytes
SSL_write(): 16393 out of 16393 bytes
SSL_write(): 16393 out of 16393 bytes
SSL_write(): 4277 out of 4277 bytes
SSL_write(): 16393 out of 16393 bytes
SSL_write(): 16393 out of 16393 bytes
SSL_write(): 3664 out of 3664 bytes
SSL_write(): 16393 out of 16393 bytes
SSL_write(): 12738 out of 12738 bytes

stack during PUT where Curl_write passes 64k to http2:

>	libcurld.dll!h2_session_send(Curl_easy * data, nghttp2_session * h2)  Line 1526	C
 	libcurld.dll!http2_send(connectdata * conn, int sockindex, const void * mem, unsigned int len, CURLcode * err)  Line 1894 + 0xf bytes	C
 	libcurld.dll!Curl_write(connectdata * conn, unsigned int sockfd, const void * mem, unsigned int len, int * written)  Line 348 + 0x25 bytes	C
 	libcurld.dll!readwrite_upload(Curl_easy * data, connectdata * conn, int * didwhat)  Line 1133 + 0x2b bytes	C
 	libcurld.dll!Curl_readwrite(connectdata * conn, Curl_easy * data, bool * done, bool * comeback)  Line 1249 + 0x11 bytes	C
 	libcurld.dll!multi_runsingle(Curl_multi * multi, curltime now, Curl_easy * data)  Line 2076 + 0x1b bytes	C
 	libcurld.dll!curl_multi_perform(Curl_multi * multi, int * running_handles)  Line 2339 + 0x29 bytes	C
 	libcurld.dll!easy_transfer(Curl_multi * multi)  Line 599 + 0xd bytes	C
 	libcurld.dll!easy_perform(Curl_easy * data, bool events)  Line 692 + 0x18 bytes	C
 	libcurld.dll!curl_easy_perform(Curl_easy * data)  Line 711 + 0xb bytes	C
 	curld.exe!serial_transfers(GlobalConfig * global, Curl_share * share)  Line 2197 + 0xf bytes	C
 	curld.exe!run_all_transfers(GlobalConfig * global, Curl_share * share, CURLcode result)  Line 2373 + 0xd bytes	C
 	curld.exe!operate(GlobalConfig * global, int argc, char * * argv)  Line 2485 + 0x11 bytes	C
 	curld.exe!main(int argc, char * * argv)  Line 314 + 0x11 bytes	C
 	curld.exe!__tmainCRTStartup()  Line 555 + 0x19 bytes	C
 	curld.exe!mainCRTStartup()  Line 371	C
 	kernel32.dll!@BaseThreadInitThunk@12()  + 0x12 bytes	
 	ntdll.dll!___RtlUserThreadStart@8()  + 0x27 bytes	
 	ntdll.dll!__RtlUserThreadStart@8()  + 0x1b bytes	

then
h2_session_send -> nghttp2_session_send -> send_callback -> ossl_send
to send DATA frames in chunks up to 16k from that 64k

Can you send us a Wireshark capture of what you are talking about? You will need to set the SSLKEYLOGFILE environment variable to a file to record the decryption keys, and you will need to make Wireshark aware of that in Edit > Preferences > Protocols > SSL > Master-Secret log. You will also need to send us that decryption key file (so make sure only the one key is in there).

@xquery
Copy link
Member

@xquery xquery commented Mar 5, 2020

bump - when I do:

dd if=/dev/zero of=blob.bin bs=64k count=4 2> /dev/null; /usr/local/bin/curl -v -sS -X PUT https://nghttp2.org/httpbin/put --data-binary @blob.bin
on macOS mojave, with build of latest curl (7.69.0) built (w/ cmake) all seems to work correctly.

@bagder
Copy link
Member

@bagder bagder commented Mar 5, 2020

@vszakats can you still reproduce?

@vszakats
Copy link
Member Author

@vszakats vszakats commented Mar 5, 2020

@bagder I’ll be able to make tests ~next Wednesday. Will post the results somewhere around then.

@vszakats
Copy link
Member Author

@vszakats vszakats commented Mar 10, 2020

Did some tests with native macOS (with yesterday's not-yet-bug-patched Homebrew build) and macOS + wine + curl-for-win builds, and the issue seems to be there still.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

8 participants
You can’t perform that action at this time.