Connection timeout on https streams #1174

Closed
Hummer12007 opened this Issue Dec 23, 2016 · 25 comments

Projects

None yet

7 participants

@Hummer12007
Hummer12007 commented Dec 23, 2016 edited

A regression appeared in cb4e2be, which causes https connections to time out (found out by bisection).

To be precise, since the commit, mpd (0.19.21 as well as earlier versions) often (almost always) fails to play remote https streams due to connection timeouts: see https://github.com/curl/curl/blob/cb4e2be7c6d42ca0780f8e0a747cecf9ba45f151/lib/multi.c#L1378 for the precise error which occurs. Such errors did not happen in prior commits.

This does not occur for http streams.

@MaxKellerman may be interested.

I'm running Gentoo.

@jay
Member
jay commented Dec 23, 2016

You didn't post the information from the template, please do that in the future. What's your curl -V ? Is this related to #1156?

@Hummer12007
Hummer12007 commented Dec 23, 2016 edited

@jay I've mentioned the relevant commit. And it doesn't seem so, as I don't use a proxy.
here's curl -V nonetheless (from the relevant commit):
curl 7.51.0 (x86_64-pc-linux-gnu) libcurl/7.52.0-DEV OpenSSL/1.0.2j zlib/1.2.8 libidn2/0.11 libssh2/1.8.0
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: Debug TrackMemory IDN IPv6 Largefile NTLM NTLM_WB SSL libz TLS-SRP UnixSockets

@Hummer12007

To be clear, this still occurs on git master.

@bagder
Member
bagder commented Dec 23, 2016

Can you please clarify what "this" is? What still occurs in git master? I miss a detailed description of what you're doing and what happens when you do that in comparison to what you expect.

@Hummer12007
Hummer12007 commented Dec 23, 2016 edited

What I do:

  1. start mpd
  2. try to play a remote https stream

What I expect to happen (and happens before cb4e2be): it connects and starts playing.

What happens: an error in libcurl occurs: Operation timed out after 10010 milliseconds with 0 out of 0 bytes received. It is triggered at line 1378 of lib/multi.c (the branch if (timeout_ms <0)... if (k->size != -1). It does not start playing.

@bagder
Member
bagder commented Dec 23, 2016

What libcurl options does this use? Which TLS backend is this libcurl using? How can we reproduce this problem - without mpd?

@Hummer12007
Hummer12007 commented Dec 23, 2016 edited

It uses the openssl backend, it seems.
The libcurl options are specified here: https://github.com/MaxKellermann/MPD/blob/master/src/input/plugins/CurlInputPlugin.cxx#L724
I did not reproduce the problem without mpd yet.

The backtrace (if it is relevant): https://paste.pound-python.org/show/U8PF7y4wi20ol6UkWPpY/

@bagder
Member
bagder commented Dec 23, 2016

Can you tell as a public HTTPS URL that you use when you get the problem? Are you using a proxy?

@Hummer12007

https://radio.stew.moe/stream/stream256.opus (though it may rarely be down). I'm not using a proxy.

@bagder
Member
bagder commented Dec 24, 2016

I tried it like 20 times in a row. It usually gives back a 404 really fast, and once it did take 16 seconds to connect. I've not managed to get any hints that say that this is a curl problem and not the server's.

How frequently do you see the problem? Do you see the problem on all HTTPS URLs you try or just this? Can you change the timeout from 10 seconds to something larger, like 30 and see if that also is reached or if it then is likelier to succeed?

@Hummer12007
Hummer12007 commented Dec 24, 2016 edited

Ah, it dies several hours ago, again (the error occured when it was alive, to be clear). Will fix that (it's up now) and try with a larger timeout in the meantime. And I see it on both https streams I've found.

@Hummer12007
Hummer12007 commented Dec 24, 2016 edited

Still getting the timeout even after half an hour a minute.

@bagder
Member
bagder commented Dec 24, 2016

after half an hour

You meant half a minute, right?

So can you invoke curl -o /dev/null -v https://radio.stew.moe/stream/stream256.opus and show us the output when it doesn't work? I don't get it to hang in connect, but I do see very slow response at times when it doesn't seem to send any data for very long.

@Hummer12007
Hummer12007 commented Dec 24, 2016 edited

It didn't fail with the command line client even once for me though (and doesn't after a dozen tries).
The issue is not specific to the server though.

@jetomit
jetomit commented Dec 25, 2016

Running hiperfifo from docs/examples exhibits the same behavior, with every HTTPS server I've tried. I get the following output:

Adding easy 0x20a1150 to multi 0x206c8a0 (https://startpage.com)
multi_timer_cb: Setting timeout to 0 ms
* Rebuilt URL to: https://startpage.com/
*   Trying 37.0.87.15...
* TCP_NODELAY set
Progress: https://startpage.com (0/0)
Progress: https://startpage.com (0/0)
socket callback: s=7 e=0x20a1150 what=OUT Adding data: OUT
multi_timer_cb: Setting timeout to 0 ms
REMAINING: 1
Progress: https://startpage.com (0/0)
multi_timer_cb: Setting timeout to 199 ms
REMAINING: 1
* Connected to startpage.com (37.0.87.15) port 443 (#0)
Progress: https://startpage.com (0/0)
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: none
Progress: https://startpage.com (0/0)
socket callback: s=7 e=0x20a1150 what=REMOVE 
REMAINING: 1
Progress: https://startpage.com (0/0)
REMAINING: 1

and after that curl waits indefinitely (well, at least 15 minutes). If I add

curl_easy_setopt(conn->easy, CURLOPT_CONNECTTIMEOUT, 10L);

like mpd, then curl aborts the connection after 10 s:

* Operation timed out after 10010 milliseconds with 0 out of 0 bytes received
Progress: https://startpage.com (0/0)
* Closing connection 0
REMAINING: 0
DONE: https://startpage.com/ => (28) Operation timed out after 10010 milliseconds with 0 out of 0 bytes received

Tested on Debian Jessie (using OpenSSL/1.0.1t and zlib/1.2.8) and Gentoo (using OpenSSL/1.0.2j and zlib/1.2.8). The bug appears to be specific to openssl - the gnutls backend works. I tested using the latest git master, and git bisect points to cb4e2be as the first bad commit. The command-line client works in all cases.

@Hummer12007

Hmm, indeed, everything works fine with curl 7.52.1 with gnutls backend. The issue probably lies in the openssl backend.

@bagder
Member
bagder commented Dec 25, 2016

Try this very simple patch:

From 3388920be62ea5ae7b3ca34336683ab7195cc3bb Mon Sep 17 00:00:00 2001
From: Daniel Stenberg <daniel@haxx.se>
Date: Mon, 26 Dec 2016 00:06:33 +0100
Subject: [PATCH] vtls: s/SSLEAY/OPENSSL

Fixed an old leftover use of the USE_SSLEAY define which would make a
socket get removed from the applications sockets to monitor when the
multi_socket API was used, leading to timeouts.
---
 lib/vtls/vtls.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/vtls/vtls.c b/lib/vtls/vtls.c
index b808e1c5f..707f24b02 100644
--- a/lib/vtls/vtls.c
+++ b/lib/vtls/vtls.c
@@ -482,11 +482,11 @@ void Curl_ssl_close_all(struct Curl_easy *data)
   }
 
   curlssl_close_all(data);
 }
 
-#if defined(USE_SSLEAY) || defined(USE_GNUTLS) || defined(USE_SCHANNEL) || \
+#if defined(USE_OPENSSL) || defined(USE_GNUTLS) || defined(USE_SCHANNEL) || \
     defined(USE_DARWINSSL) || defined(USE_NSS)
 /* This function is for OpenSSL, GnuTLS, darwinssl, and schannel only. */
 int Curl_ssl_getsock(struct connectdata *conn, curl_socket_t *socks,
                      int numsocks)
 {
@@ -516,11 +516,11 @@ int Curl_ssl_getsock(struct connectdata *conn,
   (void)conn;
   (void)socks;
   (void)numsocks;
   return GETSOCK_BLANK;
 }
-/* USE_SSLEAY || USE_GNUTLS || USE_SCHANNEL || USE_DARWINSSL || USE_NSS */
+/* USE_OPENSSL || USE_GNUTLS || USE_SCHANNEL || USE_DARWINSSL || USE_NSS */
 #endif
 
 void Curl_ssl_close(struct connectdata *conn, int sockindex)
 {
   DEBUGASSERT((sockindex <= 1) && (sockindex >= -1));
-- 
2.11.0
@bagder bagder added a commit that referenced this issue Dec 25, 2016
@bagder bagder vtls: s/SSLEAY/OPENSSL
Fixed an old leftover use of the USE_SSLEAY define which would make a
socket get removed from the applications sockets to monitor when the
multi_socket API was used, leading to timeouts.

Bug: #1174
a7b38c9
@Hummer12007

The issue's resolved after the commit. Thanks!

@KyleSanderson

@bagder could you please tag a new stable release? I was 'owned' by the stabilization of 7.52.1 by my distro. This breaks libcurl consumers, like mpd mentioned above, and rtorrent.

@bagder
Member
bagder commented Dec 31, 2016

The next release is planned for February 2017. The mistake this commit fixes has been around for many releases. If you want to argue for an earlier release, I'd urge you to take that to the curl-library mailing list.

@mkauf
Contributor
mkauf commented Jan 20, 2017

@bagder :

The mistake this commit fixes has been around for many releases.

That's wrong. The mistake has been introduced with the "SSL proxy" commit cb4e2be . This commit has removed these lines:

#if defined(USE_OPENSSL) || defined(USE_GNUTLS) || defined(USE_SCHANNEL) || \
    defined(USE_DARWINSSL) || defined(USE_POLARSSL) || defined(USE_NSS) || \
    defined(USE_MBEDTLS)
/* This function is for OpenSSL, GnuTLS, darwinssl, schannel and polarssl only.
   It should be made to query the generic SSL layer instead. */

... and added these lines:

#if defined(USE_SSLEAY) || defined(USE_GNUTLS) || defined(USE_SCHANNEL) || \
    defined(USE_DARWINSSL) || defined(USE_NSS)
/* This function is for OpenSSL, GnuTLS, darwinssl, and schannel only. */

So this has introduced bugs for: OpenSSL (fixed with a7b38c9), mbed TLS (fixed with 06b1197 ) and PolarSSL (not fixed yet).

For people that use one of these TLS backends, that's an important regression in curl 7.52.1.

@bagder
Member
bagder commented Jan 20, 2017

Seems correct, yes. I'll bring it back to the working layout and fix PolarSSL. Thanks!

@bagder bagder added a commit that referenced this issue Jan 20, 2017
@bagder bagder vtls: fix PolarSSL non-blocking handling
A regression brought in cb4e2be

Reported-by: Michael Kaufmann
Bug: #1174 (comment)
01ab7c3
@jkralik jkralik added a commit to jkralik/curl that referenced this issue Jan 23, 2017
@bagder @jkralik bagder + jkralik vtls: fix PolarSSL non-blocking handling
A regression brought in cb4e2be

Reported-by: Michael Kaufmann
Bug: curl#1174 (comment)
9f10e79
@edolstra edolstra added a commit to NixOS/nixpkgs that referenced this issue Jan 24, 2017
@edolstra edolstra curl: Apply upstream patch to fix https hangs 7bc801e
@edolstra edolstra added a commit to NixOS/nixpkgs that referenced this issue Jan 24, 2017
@edolstra edolstra curl: Apply upstream patch to fix https hangs
curl/curl#1174

Fixes NixOS/nix#1181.

(cherry picked from commit 7bc801e)
c78022a
@peterpih peterpih pushed a commit to railsnewbie257/curl that referenced this issue Jan 24, 2017
@bagder @railsnewbie257 bagder + railsnewbie257 vtls: fix PolarSSL non-blocking handling
A regression brought in cb4e2be

Reported-by: Michael Kaufmann
Bug: curl#1174 (comment)
cac1dc6
@tassmjau tassmjau added a commit to tassmjau/nixpkgs that referenced this issue Jan 28, 2017
@edolstra @tassmjau edolstra + tassmjau curl: Apply upstream patch to fix https hangs d894fa8
@pschlan
pschlan commented Feb 1, 2017

I'm really glad I found this patch after many hours of debugging my application. Thanks to all people involved in fixing this.

@algitbot algitbot pushed a commit to alpinelinux/aports that referenced this issue Feb 19, 2017
@fabled Przemyslaw Pawelczyk + fabled main/curl: Fix socket removal when multi_socket API is used (timeouts).
It's a fix (curl's a7b38c9) for regression in v7.52.0 (curl's cb4e2be).

curl/curl#1174

Without this patch apps like rtorrent don't work with HTTPS stuff.

(edge commit 2626290)
d42a473
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment