HTTP/2 connections to Google GFE aren't reused! #750

Closed
bagder opened this Issue Apr 2, 2016 · 17 comments

Projects

None yet

4 participants

@bagder
Member
bagder commented Apr 2, 2016

I did this

$ curl https://www.google.se/ https://www.google.se/ --trace-ascii dump --trace-time -A "Mozilla/" > /dev/null

I expected the following

... that the single HTTPS connection would be kept alive after the first request and reused for the subsequent one, as it is if you replace www.google.se with daniel.haxx.se.

The interesting snippet of the trace at the end of the first request and the beginning of the second says:

0400: ,"ovr":{},"pq":"","refpd":true,"rfs":[],"scd":10,"sce":5,"stok":
0440: "dZ4Wr9hTodQ_eMzKJ61AiFVM1VE"},"d":{}};google.y.first.push(funct
0480: ion(){if(google.med){google.med('init');google.initHistory();goo
04c0: gle.med('history');}});if(google.j&&google.j.en&&google.j.xi){wi
0500: ndow.setTimeout(google.j.xi,0);}.</script></div></body></html>
20:47:08.886104 == Info: http2_recv returns 0, http2_handle_stream_close
20:47:08.886112 == Info: nread <= 0, server closed connection, bailing
20:47:08.886124 == Info: STATE: PERFORM => DONE handle 0x19982b8; line 1946 (connection #0) 
20:47:08.886131 == Info: multi_done
20:47:08.886137 == Info: free header_recvbuf!!
20:47:08.886269 == Info: Connection #0 to host www.google.se left intact
20:47:08.886287 == Info: Expire cleared
20:47:08.886382 == Info: STATE: INIT => CONNECT handle 0x19982b8; line 1402 (connection #-5000) 
20:47:08.886418 == Info: Found bundle for host www.google.se: 0x19d3748 [can multiplex]
20:47:08.886428 == Info: Connection 0 seems to be dead!
20:47:08.886436 == Info: HTTP/2 DISCONNECT starts now
20:47:08.886448 == Info: HTTP/2 DISCONNECT done
20:47:08.886454 == Info: Closing connection 0
20:47:08.886462 == Info: The cache now contains 0 members
20:47:08.886476 => Send SSL data, 5 bytes (0x5)
0000: .....
20:47:08.886515 == Info: TLSv1.2 (OUT), TLS alert, Client hello (1):
20:47:08.886522 => Send SSL data, 2 bytes (0x2)
0000: ..
20:47:08.887682 == Info: Added connection 1. The cache now contains 1 members
20:47:08.887698 == Info: Hostname www.google.se was found in DNS cache
20:47:08.887719 == Info:   Trying 83.140.66.42...
20:47:08.887751 == Info: STATE: CONNECT => WAITCONNECT handle 0x19982b8; line 1455 (connection #1) 
20:47:08.891580 == Info: Connected to www.google.se (83.140.66.42) port 443 (#1)
20:47:08.891600 == Info: STATE: WAITCONNECT => SENDPROTOCONNECT handle 0x19982b8; line 1554 (connection #1) 
20:47:08.891610 == Info: Marked for [keep alive]: HTTP default
20:47:08.892085 == Info: ALPN, offering h2
20:47:08.892095 == Info: ALPN, offering http/1.1
20:47:08.892172 == Info: Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
20:47:08.904573 == Info: successfully set certificate verify locations:
20:47:08.904590 == Info:   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: none
20:47:08.904616 == Info: SSL re-using session ID
20:47:08.904650 => Send SSL data, 5 bytes (0x5)
0000: .....

curl/libcurl version

curl 7.49.0-DEV (x86_64-pc-linux-gnu) libcurl/7.49.0-DEV OpenSSL/1.0.2g zlib/1.2.8 c-ares/1.11.0 libidn/1.32 libpsl/0.11.0 (+libicu/55.1) libssh2/1.7.0_DEV nghttp2/1.10.0-DEV librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS Debug TrackMemory IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets Metalink PSL 

operating system

Debian Linux, kernel 4.4.6 but this is most likely an OS independent problem.

@bagder bagder added the HTTP/2 label Apr 2, 2016
@joneskoo
joneskoo commented Apr 2, 2016

Confirmed OS independent because I get "connection dead" on OS X too. So far I've seen dead connections only with Google Frontend (GFE), and e.g. http2.golang.org doesn't have the same message. Even with those servers without connection dead, I don't think connection reuse works always properly because there's many TLS handshakes.

@joneskoo
joneskoo commented Apr 2, 2016

Here's example of http2.golang.org and connection reuse not working. https://gist.github.com/joneskoo/924586cd853e0b2063aa6d6bd51f3e0a

I would expect the TLS connection to be reused in that case.

@joneskoo
joneskoo commented Apr 2, 2016

Based on IRC discussion on #curl what I saw against http2.golang.org is fixed in 1fc7672. Issue with GFE Connection 0 seems to be dead is different. Thanks.

@bagder
Member
bagder commented Apr 4, 2016

The "seems to be dead" logic checks if the socket is readable when it is about to get reused, as it shouldn't be. A readable socket there implies that data has arrived (that we haven't taken care of) or that the socket is closed. Either way we can't reuse it since its state is unknown.

As a debug effort I added a call to Curl_read() there to see if it would get any data - but it didn't. That at least tells us that there's no HTTP(2) level incoming data pending. If there was stuff sent to the socket it looks like it was TLS based.

This case is quirky to check with wireshark too since it is all HTTPS.

@bagder bagder self-assigned this Apr 4, 2016
@joneskoo
joneskoo commented Apr 4, 2016

With curl+nss on Homebrew: Homebrew/homebrew-core#49

SSLKEYLOGFILE=curl-keylogfile \
 /usr/local/Cellar/curl/7.48.0/bin/curl -k -v --http2 \
 https://www.google.com/'[1-2]'

screen shot 2016-04-04 at 21 12 53

- https://joneskoo.kapsi.fi/tmp/curl-google-http2.pcap - https://joneskoo.kapsi.fi/tmp/curl-keylogfile.txt

The PING frame from server probably confuses cURL.

@bagder
Member
bagder commented Apr 5, 2016

I think a first take on this, is to make sure we drain the socket and send back http/2 frames even in this case when the a single stream happens to be done. With this little patch my simplest ways of reproducing this problem are now working correctly.

--- a/lib/http2.c
+++ b/lib/http2.c
@@ -1147,18 +1147,10 @@ static ssize_t http2_recv(struct connectdata *conn, int sockindex,
   struct SessionHandle *data = conn->data;
   struct HTTP *stream = data->req.protop;

   (void)sockindex; /* we always do HTTP2 on sockindex 0 */

-  /* If stream is closed, return 0 to signal the http routine to close
-     the connection.  We need to handle stream closure here,
-     otherwise, we may be going to read from underlying connection,
-     and gets EAGAIN, and we will get stuck there. */
-  if(stream->memlen == 0 && stream->closed) {
-    return http2_handle_stream_close(conn, data, stream, err);
-  }
-
   /* Nullify here because we call nghttp2_session_send() and they
      might refer to the old buffer. */
   stream->upload_mem = NULL;
   stream->upload_len = 0;

@bagder
Member
bagder commented Apr 5, 2016

This fix will only really make it work because the PING immediately follows the DATA. If the PING would instead just be delayed and come a few hundred milliseconds or so later, we will not drain the socket there and subsequently fail on the reuse check again. But to work on that angle of the problem, I think I need to produce a test case that behaves like that first. Baby steps.

@bagder bagder added a commit that referenced this issue Apr 5, 2016
@bagder bagder http2: fix connection reuse when PING comes after last DATA
It turns out the google GFE HTTP/2 servers send a PING frame immediately
after a stream ends and its last DATA has been received by curl. So if
we don't drain that from the socket, it makes the socket readable in
subsequent checks and libcurl then (wrongly) assumes the connection is
dead when trying to reuse the connection.

Reported-by: Joonas Kuorilehto

Discussed in #750
c8ab613
@bagder bagder closed this in 582d423 Apr 10, 2016
@jay
Member
jay commented Apr 10, 2016

The fix you applied c8ab613 removes what appears to have been for preventing streams from getting stuck when close is true and the stream has some DATA, see d261652. Did you test for a regression or is there already a test that covers this?

@jay
Member
jay commented Apr 10, 2016

also ping @tatsuhiro-t see if he has any ideas on this

@bagder
Member
bagder commented Apr 10, 2016

I just read the code and I ran a couple of tests on live sites. Oh , the lack of HTTP/2 tests .... :-(

@tatsuhiro-t
Contributor

The removed part is required when response header does not contain content-length header field.

This is probably a long time goal, but in my opinion, it would be good to have some kind of "idle" handle which takes care of the HTTP/2 connection when no active regular handles exist. It will handle all ping/settings over the connection while there is no handles using that connection.

@bagder
Member
bagder commented Apr 11, 2016

I completely agree @tatsuhiro-t, and I've added a mention about that need in the KNOWN_BUGS document now. As a first stop gap we can at least make sure that we consume the trailing ping if we read it as part of that last recv() ...

@bagder
Member
bagder commented Apr 11, 2016

What about preventing recv() after the stream closed so that we won't end up in EAGAIN after a stream is done? Then we can still consume the PING if we read it as part of the last recv. Or maybe that's too lame ...

diff --git a/lib/http2.c b/lib/http2.c
index 5305eb7..6dd4402 100644
--- a/lib/http2.c
+++ b/lib/http2.c
@@ -1237,11 +1237,11 @@ static ssize_t http2_recv(struct connectdata *conn, int sockindex,
        buffer is */
     stream->mem = mem;
     stream->len = len;
     stream->memlen = 0;

-    if(httpc->inbuflen == 0) {
+    if((httpc->inbuflen == 0) && (!stream->closed)) {
       nread = ((Curl_recv *)httpc->recv_underlying)(
           conn, FIRSTSOCKET, httpc->inbuf, H2_BUFSIZE, &result);

       if(nread == -1) {
         if(result != CURLE_AGAIN)
@tatsuhiro-t
Contributor

The above patch works with the response without content-length, but we don't read PING frame with that. Perhaps, they are in the different packet?

@bagder
Member
bagder commented Apr 11, 2016

Correct. In fact my test case with two google servers after another fails with this patch so clearly that case needs the extra recv() to work...!

How about detecting the closed stream in case EAGAIN is returned after the recv? Like this, which seems to also make my test case work:

--- a/lib/http2.c
+++ b/lib/http2.c
@@ -1244,10 +1244,14 @@ static ssize_t http2_recv(struct connectdata *conn, int sockindex,
           conn, FIRSTSOCKET, httpc->inbuf, H2_BUFSIZE, &result);

       if(nread == -1) {
         if(result != CURLE_AGAIN)
           failf(data, "Failed receiving HTTP2 data");
+        else if(stream->closed)
+          /* received when the stream was already closed! */
+          return http2_handle_stream_close(conn, data, stream, err);
+
         *err = result;
         return -1;
       }

       if(nread == 0) {
@tatsuhiro-t
Contributor

It seems to work for me too. Lacking test case (especially for corner cases) really hurts us here..

@bagder
Member
bagder commented Apr 11, 2016

Lacking test case (especially for corner cases) really hurts us here

Agreed. I need to get working on this... Thanks anyway, I'll push that patch for now.

@bagder bagder added a commit that referenced this issue Apr 11, 2016
@bagder bagder http2: drain the socket better...
... but ignore EAGAIN if the stream has ended so that we don't end up in
a loop. This is a follow-up to c8ab613 in order to avoid the problem
d261652 was made to fix.

Reported-by: Jay Satiro
Clues-provided-by: Tatsuhiro Tsujikawa

Discussed in #750
b2a0376
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment