Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of rustls backend in curl. #6350

Closed
wants to merge 1 commit into from
Closed

Conversation

@jsha
Copy link
Contributor

@jsha jsha commented Dec 20, 2020

This adds a new TLS backend, rustls. It uses the C-to-rustls bindings
from https://github.com/abetterinternet/crustls.

Rustls is at https://github.com/ctz/rustls/.

Blog post for context:
https://www.abetterinternet.org/post/memory-safe-curl/

To build:

  • Follow the instructions at https://github.com/abetterinternet/crustls#c-rustls to build and install crustls. You'll need Rust installed, and cbindgen.
  • Run ./configure --with-rustls --with-default-ssl-backend=rustls && make
  • The resulting curl tool uses rustls as the default backend, so you can use it to make various requests.
  • The rustls backend emits a lot of logs with infof, so you can pass -v to see that it's working.
lib/vtls/rustls.c Outdated Show resolved Hide resolved
@gvanem
Copy link
Contributor

@gvanem gvanem commented Dec 20, 2020

Since I think Rust is pretty cool, I tried with a rewrite of your Makefile.Windows to build for Windows using:

cargo build --color never --release --target i686-pc-windows-msvc --lib

Then trying your patch and using target/i686-pc-windows-msvc/release/crustls.lib in libcurl.
Failing to find crustls.h. Where is it or how is generated?

Edit: I figured the above out; your src/lib.h should instead be cbindgen --lang C --output crustls.h.

Then I managed to build with rustls in libcurl.dll. But it bombs at runtime since you use write() and read().
Changing those to swrite() and sread(), it works on Windows too!

@bagder
Copy link
Member

@bagder bagder commented Dec 20, 2020

Yeah, I too would appreciate a complete instruction on how to proceed to build this.

There are CI test failures now for test 1119 because docs/libcurl/symbols-in-versions lacks CURLSSLBACKEND_RUSTLS.

* Write n bytes from buf to the provided fd, retrying short writes until
* we finish or hit an error. Assumes fd is blocking and therefore doesn't
* handle EAGAIN. Returns 0 for success or 1 for error.
*/
Copy link
Member

@bagder bagder Dec 20, 2020

This sounds suspicious. libcurl should always use non-blocking sockets and this kind of loop is not a good sign.

Copy link
Contributor Author

@jsha jsha Dec 20, 2020

You're right that this should assume nonblocking (as the recv function does) and handle EAGAIN. Fixed. A question about nonblocking I/O though: My understanding is that I should loop until either error or EAGAIN. Is that accurate, or should I return at the first short write?

Copy link
Contributor Author

@jsha jsha left a comment

Thanks for the reviews! I've pushed some fixes that I think address them. I've changed bzero->memset, and write/read->swrite/sread.

I also rewrote rustls_send to do the right thing for nonblocking I/O, and rewrote Curl_rustls_connect_nonblocking to take advantage of rustls_send and rustls_recv.

I updated the PR description with notes on how to build, copied here:

To build:

  • Follow the instructions at https://github.com/abetterinternet/crustls#c-rustls to build and install crustls. You'll need Rust installed, and cbindgen.
  • Run ./configure --with-rustls --with-default-ssl-backend=rustls && make
  • The resulting curl tool uses rustls as the default backend, so you can use it to make various requests.
  • The rustls backend emits a lot of logs with infof, so you can pass -v to see that it's working.

Thanks so much for writing that Makefile.windows, @gvanem ! I assume you don't mind if I add that to the crustls repo?

* Write n bytes from buf to the provided fd, retrying short writes until
* we finish or hit an error. Assumes fd is blocking and therefore doesn't
* handle EAGAIN. Returns 0 for success or 1 for error.
*/
Copy link
Contributor Author

@jsha jsha Dec 20, 2020

You're right that this should assume nonblocking (as the recv function does) and handle EAGAIN. Fixed. A question about nonblocking I/O though: My understanding is that I should loop until either error or EAGAIN. Is that accurate, or should I return at the first short write?

@jsha
Copy link
Contributor Author

@jsha jsha commented Dec 20, 2020

It looks like these test cases are failing in AppVeyor:

FTP fetch with --proxy set to http:// and with --connect-to tests/data/test714 7 sec 647 ms
FTP fetch with --preproxy, --proxy and --connect-to tests/data/test715 0 ms
HTTP request, remove handle while resolving, don't block tests/data/test1592

I'll need to set up a Windows environment to diagnose these. If anyone has insight into what aspect of this change might be causing these errors, that would definitely help get me pointed in the right direction.

DEBUGASSERT(tlswritten > 0);
break;
}
perror("writing to socket");
Copy link
Contributor

@gvanem gvanem Dec 21, 2020

Using perror() for sockets on Windows does not print the correct error. Add something like:

static void Curl_perror (const char *what)
{
  char buf[300];
  fprintf (stderr, "%s: %s", what, Curl_strerror(SOCKERRNO, buf, sizeof(buf)));
}

at top. @bagder Should something like this be in strerror.c?

Copy link
Member

@jay jay Dec 21, 2020

It shouldn't be there

@gvanem
Copy link
Contributor

@gvanem gvanem commented Dec 21, 2020

I assume you don't mind if I add that to the crustls repo?

No, go ahead. Here is an updated version Makefile.Windows.
I noted:

  • crustls.h has no header guard.
  • It uses ssize_t which does not exist on MSVC. It could be defined to INT_PTR.

@bagder bagder added the SSL/TLS label Dec 21, 2020
@mback2k
Copy link
Member

@mback2k mback2k commented Dec 21, 2020

It looks like these test cases are failing in AppVeyor:

FTP fetch with --proxy set to http:// and with --connect-to tests/data/test714 7 sec 647 ms
FTP fetch with --preproxy, --proxy and --connect-to tests/data/test715 0 ms
HTTP request, remove handle while resolving, don't block tests/data/test1592

These are most likely unrelated. Unfortunately the testsuite is flaky on the various Windows CI hosts. Once you are satisfied with the PR, we can retry them until they succeed, just to be sure they are really not related.

@bagder
Copy link
Member

@bagder bagder commented Dec 21, 2020

To build this as the only TLS backend (with OpenSSL disabled) I had to do a little patch:

configure.ac

diff --git a/configure.ac b/configure.ac
index 9bb10c5ee..19df8661b 100755
--- a/configure.ac
+++ b/configure.ac
@@ -2917,11 +2917,11 @@ if test -z "$ssl_backends" -o "x$OPT_NSS" != xno; then
   fi dnl NSS not disabled
 
   test -z "$ssl_msg" || ssl_backends="${ssl_backends:+$ssl_backends, }$ssl_msg"
 fi
 
-case "x$OPENSSL_ENABLED$GNUTLS_ENABLED$NSS_ENABLED$MBEDTLS_ENABLED$WOLFSSL_ENABLED$SCHANNEL_ENABLED$SECURETRANSPORT_ENABLED$MESALINK_ENABLED$BEARSSL_ENABLED$AMISSL_ENABLED" in
+case "x$OPENSSL_ENABLED$GNUTLS_ENABLED$NSS_ENABLED$MBEDTLS_ENABLED$WOLFSSL_ENABLED$SCHANNEL_ENABLED$SECURETRANSPORT_ENABLED$MESALINK_ENABLED$BEARSSL_ENABLED$AMISSL_ENABLED$RUSTLS_ENABLED" in
 x)
   AC_MSG_WARN([SSL disabled, you will not be able to use HTTPS, FTPS, NTLM and more.])
   AC_MSG_WARN([Use --with-ssl, --with-gnutls, --with-wolfssl, --with-mbedtls, --with-nss, --with-schannel, --with-secure-transport, --with-mesalink, --with-amissl or --with-bearssl to address this.])
   ;;
 x1)

lib/curl_setup.h

diff --git a/lib/curl_setup.h b/lib/curl_setup.h
index 22def2def..d2134a07e 100644
--- a/lib/curl_setup.h
+++ b/lib/curl_setup.h
@@ -610,11 +610,11 @@ int netware_init(void);
 
 #if defined(USE_GNUTLS) || defined(USE_OPENSSL) || defined(USE_NSS) || \
     defined(USE_MBEDTLS) || \
     defined(USE_WOLFSSL) || defined(USE_SCHANNEL) || \
     defined(USE_SECTRANSP) || defined(USE_GSKIT) || defined(USE_MESALINK) || \
-    defined(USE_BEARSSL)
+    defined(USE_BEARSSL) || defined(USE_RUSTLS)
 #define USE_SSL    /* SSL support has been enabled */
 #endif
 
 /* Single point where USE_SPNEGO definition might be defined */
 #if !defined(CURL_DISABLE_CRYPTO_AUTH) && \

static size_t
Curl_rustls_version(char *buffer, size_t size)
{
return msnprintf(buffer, size, "rustls");
Copy link
Member

@bagder bagder Dec 21, 2020

Is there really no version number? I would strongly urge you to provide one, as this is often key material when debugging issues and tracking down suspicious behavior in user reports.

ssize_t plain_bytes_copied = 0;
int rustls_result = 0;

memset(tlsbuf, 0, sizeof(tlsbuf));
Copy link
Member

@bagder bagder Dec 22, 2020

Isn't it a bit excessive to clear the entire buffer like this every time?

Copy link
Contributor Author

@jsha jsha Dec 22, 2020

Fair, I've removed it. I did profile curl over a big transfer, and this did not seem to be a significant use of CPU time.

I was looking at the docs for read and recv, and while implicitly they only write to the provided buf, they don't seem to explicitly guarantee that they don't read from the provided buf, which would be unsafe if the buf was uninitialized. I'm thinking of these related Rust issues:

rust-lang/rust#42002
rust-lang/rust#42788

@jsha
Copy link
Contributor Author

@jsha jsha commented Dec 23, 2020

I'm working on improving the configurability of this backend (trusted roots, ciphersuites, etc). It looks like the existing backends do all their setup on a per-connection basis. That makes sense, since different connections might have different options. But the rustls ClientConfig says:

Making one of these can be expensive, and should be once per process rather than once per connection.

It seems like the best thing to do would be to cache built config objects based on their inputs. Does that make sense? Are there instances of other TLS backends doing the same thing?

I think for the short term I might just create a ClientConfig per-connection, and treat the cache as a later optimization.

@bagder
Copy link
Member

@bagder bagder commented Dec 23, 2020

It looks like the existing backends do all their setup on a per-connection basis. That makes sense, since different connections might have different options.

Right. The configs is set per-transfer in the libcurl API, but the connection is then setup and verified using a specific set of TLS parameters. We then make sure that this connection can only be reused for subsequent transfers if the transfers' TLS config matches that connection's.

It seems like the best thing to do would be to cache built config objects based on their inputs. Does that make sense? Are there instances of other TLS backends doing the same thing?

This is a very long-standing optimization detail that we've never gotten around to providing. It would make sense to have some sort of caching, at least for cases when the exact same config is used again - which is probably what 99% of all transfers do.

I think for the short term I might just create a ClientConfig per-connection, and treat the cache as a later optimization.

Makes perfect sense to me!

@jsha
Copy link
Contributor Author

@jsha jsha commented Jan 8, 2021

Alright, I've added ClientConfig per-session, and I've made configuration of CAfile work. I think this is ready for another round of reviews.

@bagder
Copy link
Member

@bagder bagder commented Jan 12, 2021

After the crustls update, I needed this configure.ac patch to make curl's configure find it:

diff --git a/configure.ac b/configure.ac
index 303274716..50b17b001 100755
--- a/configure.ac
+++ b/configure.ac
@@ -2720,11 +2720,11 @@ if test -z "$ssl_backends" -o "x$OPT_RUSTLS" != xno; then
     fi
 
     if test -z "$OPT_RUSTLS" ; then
       dnl check for lib first without setting any new path
 
-      AC_CHECK_LIB(crustls, rustls_client_config_new,
+      AC_CHECK_LIB(crustls, rustls_client_session_read,
       dnl libcrustls found, set the variable
        [
          AC_DEFINE(USE_RUSTLS, 1, [if rustls is enabled])
          AC_SUBST(USE_RUSTLS, [1])
          RUSTLS_ENABLED=1
@@ -2748,11 +2748,11 @@ if test -z "$ssl_backends" -o "x$OPT_RUSTLS" != xno; then
       LDFLAGS="$LDFLAGS $addld"
       if test "$addcflags" != "-I/usr/include"; then
          CPPFLAGS="$CPPFLAGS $addcflags"
       fi
 
-      AC_CHECK_LIB(crustls, rustls_client_config_new,
+      AC_CHECK_LIB(crustls, rustls_client_session_read,
        [
        AC_DEFINE(USE_RUSTLS, 1, [if rustls is enabled])
        AC_SUBST(USE_RUSTLS, [1])
        RUSTLS_ENABLED=1
        USE_RUSTLS="yes"

@bagder
Copy link
Member

@bagder bagder commented Jan 12, 2021

When I build this with the current master crustls, test 300 fails and leaks memory:

$ ./src/curl -V
curl 7.75.0-DEV (x86_64-pc-linux-gnu) libcurl/7.75.0-DEV crustls/0.2.0/rustls/0.19.0
zlib/1.2.11 brotli/1.0.9 zstd/1.4.8 c-ares/1.17.1 libidn2/2.3.0 libpsl/0.21.0 (+libidn2/2.3.0)
libssh2/1.9.0_DEV nghttp2/1.42.0 librtmp/2.3
$ cd tests
$ make -sj7
$ ./runtests.pl 300
...

 300: protocol FAILED!
 There was no content at all in the file log/server.input.
 Server glitch? Total curl failure? Returned: 7

 - abort tests
TESTDONE: 0 tests out of 1 reported OK: 0%

TESTFAIL: These test cases failed: 300 

TESTDONE: 1 tests were considered during 4 seconds.

After the test, check tests/log/valgrind300 as it contains a leak:

$ cat log/valgrind300 
==2740594== 68,891 (216 direct, 68,675 indirect) bytes in 1 blocks are definitely lost in loss record 16 of 16
==2740594==    at 0x483877F: malloc (vg_replace_malloc.c:307)
==2740594==    by 0x7475BB: alloc::alloc::alloc (alloc.rs:74)
==2740594==    by 0x747679: alloc::alloc::Global::alloc_impl (alloc.rs:153)
==2740594==    by 0x7477D9: <alloc::alloc::Global as core::alloc::AllocRef>::alloc (alloc.rs:212)
==2740594==    by 0x74751C: alloc::alloc::exchange_malloc (alloc.rs:302)
==2740594==    by 0x74869D: alloc::sync::Arc<T>::new (sync.rs:314)
==2740594==    by 0x28BE18: rustls_client_config_builder_build (lib.rs:93)
==2740594==    by 0x20454E: Curl_rustls_connect_nonblocking (rustls.c:312)
==2740594==    by 0x201B14: Curl_ssl_connect_nonblocking (vtls.c:338)
==2740594==    by 0x22A111: https_connecting (http.c:1532)
==2740594==    by 0x229F35: Curl_http_connect (http.c:1459)
==2740594==    by 0x1D6257: protocol_connect (multi.c:1531)
==2740594==    by 0x1D6E0A: multi_runsingle (multi.c:1866)
==2740594==    by 0x1D8196: curl_multi_perform (multi.c:2431)
==2740594==    by 0x1C3903: easy_transfer (easy.c:606)
==2740594==    by 0x1C3AE6: easy_perform (easy.c:696)
==2740594== 

static const struct rustls_client_config *client_config = NULL;

static int
Curl_rustls_init(void)
Copy link
Member

@bagder bagder Jan 12, 2021

Just noticed! We use the Curl_ prefix for symbols that are global within the library, and since this is static it should not use Curl_ (it goes for a few other functions too). Most TLS backends just go with a fixed private prefix. I would probably not recommend using rustls_ since that's the prefix the actual TLS library use.

Copy link
Contributor Author

@jsha jsha Jan 13, 2021

It seems to me that local style is to use a Curl_<backend>_ prefix for the functions that wind up as fields of the Curl_ssl struct:

$ grep Curl_.*version, lib/vtls/*.c 
lib/vtls/bearssl.c:  Curl_bearssl_version,
lib/vtls/gskit.c:  Curl_gskit_version,             /* version */
lib/vtls/gtls.c:  Curl_gtls_version,             /* version */
lib/vtls/mbedtls.c:  Curl_mbedtls_version,             /* version */
lib/vtls/mesalink.c:  Curl_mesalink_version, /* version */
lib/vtls/nss.c:  Curl_nss_version,             /* version */
lib/vtls/openssl.c:  Curl_ossl_version,             /* version */
lib/vtls/schannel.c:  Curl_schannel_version,             /* version */
lib/vtls/sectransp.c:  Curl_sectransp_version,             /* version */
lib/vtls/vtls.c:  Curl_multissl_version,             /* version */
lib/vtls/wolfssl.c:  Curl_wolfssl_version,             /* version */

(each of those functions is static to the relevant file). Am I missing something here?

Copy link
Member

@bagder bagder Jan 13, 2021

See #6443 ...

@jsha
Copy link
Contributor Author

@jsha jsha commented Jan 13, 2021

The memory leak came from failure to free the client_config object if creation of a client session failed. I've fixed that in 37fdf9e.

The failure of test case 300 is because the hostname is 127.0.0.1. rustls doesn't support certificates for IP addresses, only for hostnames like localhost. I'll add a specific error result for IP addresses, since I think this is likely to come up a lot.

@bagder
Copy link
Member

@bagder bagder commented Jan 13, 2021

rustls doesn't support certificates for IP addresses, only for hostnames like localhost

Gosh. And is that a design decision or will this be fixed soon? It seems like a rather big blow to users. IP-address certs might be slightly unusual but they've always been specified and functional and works with all other curl TLS backends...

@stephanbuys
Copy link

@stephanbuys stephanbuys commented Jan 13, 2021

@badger we were looking at sponsoring the work briansmith/webpki#54 but we just don't have the budget to do the funding ourselves at the moment. @briansmith just needs some support from the community to get this over the line.

@bagder
Copy link
Member

@bagder bagder commented Jan 13, 2021

I'm surprised because it's A) not exactly complicated and B) been in the spec since forever C) it is a feature used publicly.

@jsha
Copy link
Contributor Author

@jsha jsha commented Jan 13, 2021

It was a design decision, but has been changed and now needs implementation for the change: briansmith/webpki#54 (comment).

@briansmith
Copy link

@briansmith briansmith commented Jan 14, 2021

I'm surprised because it's A) not exactly complicated and B) been in the spec since forever C) it is a feature used publicly.

At the time the webpki crate was first created, there were not good policies in place for people to get IP address certificates from CAs. That's improved in the interim. At the time, it was something that was cut in the interest of getting the first version done.

There are probably going to be other areas where webpki is stricter or more limited than what you're used to. Often this affects test suites, and often test suites can be changed. For example, some projects changed their test suites so that all their certificates were in a chain of the form "Root -> Intermediate -> End Entity" (Web PKI standards require an intermediate) so that webpki will accept them, except for the tests that were explicitly testing support for chains of the form "Root -> End Entity" or just "End Entity". Similarly, webpki doesn't support the use of the "Common Name" field to specify the domain name or IP address; instead it only uses subjectAltName. Lots of people had to change their test certificates to move the domain name into subjectAltName, except for the tests that were specifically testing support for the invalid form.

My priority right now is to improve the quality of webpki's support for things that are consistent with Web PKI standards. Definitely adding support for IP address in subjectAltName is one of those things.

As you work on your Rustls integration, please keep track of what issues you run into regarding webpki. If webpki is accepting a certificate that it shouldn't accept, that's a very high priority issue for me to fix; please file issues when you encounter this. If it rejects a certificate that is valid and that conforms to Web PKI standards, that's a medium-priority issue for the webpki project; again, please file a separate issue for each one, after checking that there is not already an issue on file. As far as extensions to the Web PKI standards are concerned, those are low priority for the webpki project but they won't be dismissed without consideration.

@bagder
Copy link
Member

@bagder bagder commented Jan 14, 2021

The curl test suite is not very thorough when it comes to TLS/cert testing, we mostly assume that the TLS libraries have their own test suites for that. However, once we merge this work into master it should be easy enough for a wider crowd of users to build and run curl+rustls builds against whatever they like and test it further that way.

@bagder
Copy link
Member

@bagder bagder commented Jan 14, 2021

For fun, I did a build with Hyper and rustls (all from master branches)

$ ./src/curl -V
curl 7.75.0-DEV (x86_64-pc-linux-gnu) libcurl/7.75.0-DEV crustls/0.2.0/rustls/0.19.0
zlib/1.2.11 brotli/1.0.9 zstd/1.4.8 c-ares/1.17.1 libidn2/2.3.0 libpsl/0.21.0 (+libidn2/2.3.0)
libssh2/1.9.0_DEV nghttp2/1.42.0 librtmp/2.3 Hyper/0.14.2
elease-Date: [unreleased]
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3
pop3s rtmp scp sftp smtp smtps telnet tftp 
Features: alt-svc AsynchDNS brotli Debug GSS-API HSTS HTTP2 IDN IPv6 Kerberos
Largefile libz PSL SPNEGO SSL TrackMemory UnixSockets zstd

... and I ran into this:

$ RUST_BACKTRACE=1 ./src/curl https://google.com/ -L
thread '<unnamed>' panicked at 'env_logger::init should not be called after logger initialized: SetLoggerError(())', /home/daniel/.cargo/registry/src/github.com-1ecc6299db9ec823/env_logger-0.8.2/src/lib.rs:1096:16
stack backtrace:
   0: rust_begin_unwind
             at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:483
   1: core::panicking::panic_fmt
             at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/core/src/panicking.rs:85
   2: core::option::expect_none_failed
             at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/core/src/option.rs:1234
   3: core::result::Result<T,E>::expect
             at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/core/src/result.rs:933
   4: env_logger::init
             at /home/daniel/.cargo/registry/src/github.com-1ecc6299db9ec823/env_logger-0.8.2/src/lib.rs:1096
   5: rustls_client_config_builder_new
             at /home/daniel/src/crustls/src/lib.rs:81
   6: cr_connect_nonblocking
             at ./lib/vtls/rustls.c:292
   7: Curl_ssl_connect_nonblocking
             at ./lib/vtls/vtls.c:338
   8: https_connecting
             at ./lib/http.c:1532
   9: Curl_http_connect
             at ./lib/http.c:1459
  10: protocol_connect
             at ./lib/multi.c:1517
  11: multi_runsingle
             at ./lib/multi.c:1852
  12: curl_multi_perform
             at ./lib/multi.c:2417
  13: easy_transfer
             at ./lib/easy.c:606
  14: easy_perform
             at ./lib/easy.c:696
  15: curl_easy_perform
             at ./lib/easy.c:715
  16: serial_transfers
             at ./src/tool_operate.c:2326
  17: run_all_transfers
             at ./src/tool_operate.c:2504
  18: operate
             at ./src/tool_operate.c:2620
  19: main
             at ./src/tool_main.c:277
  20: __libc_start_main
  21: _start
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
fatal runtime error: failed to initiate panic, error 5
Aborted

@jsha
Copy link
Contributor Author

@jsha jsha commented Jan 14, 2021

Rust has a common logging system in the log crate. Programs can emit logs by registering various output libraries (but they must be registered only once). I added env_logger during debugging, but need to replace it with a mechanism for curl to register its interest in log lines to send through its own logging system. Posted about some ideas here: https://users.rust-lang.org/t/logging-for-a-c-library-wrapper/53833

@sagebind
Copy link

@sagebind sagebind commented Jan 14, 2021

That doesn't explain the panic though. Perhaps Hyper is also registering a global logger via the tracing crate for some reason?

@bagder
Copy link
Member

@bagder bagder commented Jan 15, 2021

I retried the command line with a build without Hyper and I got the same panic then as well.

@jsha
Copy link
Contributor Author

@jsha jsha commented Jan 15, 2021

This shows up specifically with -L because the redirection causes a new connection. The call to env_logger::init() was in rustls_client_config_builder_new(). As of 3e822bd that's called on each new connection. The second time rustls_client_config_builder_new(), we get a panic due to trying to register a log handler twice.

I've removed env_logger (rustls/rustls-ffi#22), and confirmed that it fixes the panic. This is also a reminder that I need to add across-the-board panic handling and return errors, since panicking across an FFI boundary is undefined (https://blog.rust-lang.org/inside-rust/2020/02/27/ffi-unwind-design-meeting.html).

Separately I'll work on a better facility to share logs from rustls to curl.

@bagder
Copy link
Member

@bagder bagder commented Jan 15, 2021

need to add across-the-board panic handling and return errors

Panics can never be the right choice for a library. If something goes wrong you need to return a proper error so that the user of the library can decide the next step. It is never the library's choice to abort.

@bagder
Copy link
Member

@bagder bagder commented Jan 30, 2021

There's something wrong in the socket waiting/handling that makes it very slow. Try downloading from curl.se or even curl.haxx.se:

plain debian curl with openssl

$ time curl https://curl.haxx.se/ -v
...
real    0m0.042s
user    0m0.016s
sys     0m0.010s

git master with crustls

$ time ./src/curl https://curl.haxx.se/ -v
...
real    0m2.228s
user    0m0.031s
sys     0m0.008s

On repeated invokes it seems to vary between 1 and 2.3 seconds, while the openssl build does it in 33-45 milliseconds.

@jsha
Copy link
Contributor Author

@jsha jsha commented Jan 31, 2021

Ah, interesting. I can reproduce locally too. Adding timestamps with ts:

$ time ./src/curl -s https://curl.haxx.se/ -v 2>&1 | ts %.S
45.738743 * STATE: INIT => CONNECT handle 0x55906f065ba8; line 1654 (connection #-5000)
45.738884 * Added connection 0. The cache now contains 1 members
45.742016 * family0 == v6, family1 == v4
45.742125 *   Trying 2a04:4e42::561:443...
45.742159 * STATE: CONNECT => WAITCONNECT handle 0x55906f065ba8; line 1715 (connection #0)
45.763029 * Connected to curl.haxx.se (2a04:4e42::561) port 443 (#0)
45.763156 * STATE: WAITCONNECT => SENDPROTOCONNECT handle 0x55906f065ba8; line 1845 (connection #0)
45.763199 * Marked for [keep alive]: HTTP default
45.804799 * ClientSession wants us to write_tls.
45.804900 * ClientSession wants us to read_tls.
45.804927 * sread: EAGAIN or EWOULDBLOCK
45.804952 * STATE: SENDPROTOCONNECT => PROTOCONNECT handle 0x55906f065ba8; line 1866 (connection #0)
45.943440 * ClientSession wants us to read_tls.
46.948923 * ClientSession wants us to write_tls.
46.949101 * ClientSession wants us to read_tls.
46.949156 * sread: EAGAIN or EWOULDBLOCK
47.950510 * ClientSession wants us to read_tls.
47.951053 * Done handshaking
47.951164 * STATE: PROTOCONNECT => DO handle 0x55906f065ba8; line 1885 (connection #0)
47.951361 > GET / HTTP/1.1
47.951480 > Host: curl.haxx.se
47.951538 > User-Agent: curl/7.75.0-DEV
47.951585 > Accept: */*
47.951618 >
47.951655 * STATE: DO => DO_DONE handle 0x55906f065ba8; line 1940 (connection #0)
47.951705 * STATE: DO_DONE => PERFORM handle 0x55906f065ba8; line 2061 (connection #0)
47.985677 * EOF from rustls_client_session_read
47.985804 * Mark bundle as not supporting multiuse
47.985843 * HTTP 1.1 or later with persistent connection
47.985888 < HTTP/1.1 301 Moved Permanently
47.985927 < Connection: keep-alive
47.985964 < Content-Length: 286
47.986000 < Server: nginx/1.17.6
47.986037 < Content-Type: text/html; charset=iso-8859-1
47.986086 < X-Frame-Options: SAMEORIGIN
47.986122 < Location: https://curl.se/
47.986150 < Cache-Control: max-age=60
47.986175 < Expires: Sun, 31 Jan 2021 08:33:04 GMT
47.986208 < Strict-Transport-Security: max-age=31536000
47.986249 < Via: 1.1 varnish, 1.1 varnish
47.986289 < Accept-Ranges: bytes
47.986333 < Date: Sun, 31 Jan 2021 08:33:48 GMT
47.986371 < Age: 103
47.986409 < X-Served-By: cache-bma1647-BMA, cache-sea4459-SEA
47.986458 < X-Cache: MISS, HIT
47.986492 < X-Cache-Hits: 0, 1
47.986551 < X-Timer: S1612082028.009374,VS0,VE0
47.986596 <
47.986662 { [286 bytes data]
47.986700 * rustls_data_pending 0
47.986731 * STATE: PERFORM => DONE handle 0x55906f065ba8; line 2259 (connection #0)
47.986774 * multi_done
47.986851 * Connection #0 to host curl.haxx.se left intact
47.986901 * Expire cleared (transfer 0x55906f065ba8)
47.986935 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
47.986962 <html><head>
47.986996 <title>301 Moved Permanently</title>
47.987041 </head><body>
47.987084 <h1>Moved Permanently</h1>
47.987122 <p>The document has moved <a href="https://curl.se/">here</a>.</p>
47.987160 <hr>
47.987195 <address>Apache Server at curl.haxx.se Port 80</address>
47.987247 </body></html>

real    0m2.316s
user    0m0.141s
sys     0m0.028s

We can see two 1-second gaps:

45.943440 * ClientSession wants us to read_tls.
46.948923 * ClientSession wants us to write_tls.
46.949156 * sread: EAGAIN or EWOULDBLOCK
47.950510 * ClientSession wants us to read_tls.

Both of these are happening in cr_connect_nonblocking. I believe the issue is twofold:

  • cr_connect_nonblocking is not driving the handshake forward as far as it can during each call, but instead is doing at most one read and one write before returning. The calling code then sleeps a second before calling it again.
  • When cr_connect_nonblocking gets CURLE_AGAIN from cr_recv, it doesn't return that onwards to the caller.

I could use some guidance here: Is it definitely the case that cr_connect_nonblocking should return CURLE_AGAIN if it got EAGAIN or EWOULDBLOCK from a syscall? I seem to recall looking at some calling code that seemed to expect CURLE_OK with *done == false in that case, but it's entirely possible I misunderstood.

@jsha
Copy link
Contributor Author

@jsha jsha commented Feb 1, 2021

I've pushed a commit (f98c23d) that fixes one of the 1-second sleeps. However, there's still another 1-second sleep remaining. It seems we get to a point where we need to read from the socket, and reading would block, so we return CURLE_OK with *done = false. Then the next call to curl_multi_poll appears to wait for its full 1000ms timeout. Not sure why that is. I'll take a deeper look tomorrow, but I'd definitely welcome ideas.

@curl curl deleted a comment Feb 1, 2021
@jsha
Copy link
Contributor Author

@jsha jsha commented Feb 2, 2021

Alright, figured it out! I needed an implementation of Curl_ssl_getsock. Most of the other SSL implementations all use the same implementation, which relies on connssl->connecting_state. rustls.c doesn't use (or update) the connecting_state state machine.

As a quick fix, I added another clause to #ifdef for Curl_ssl_getsock, and a non-static function in rustls.c. But I think Curl_ssl_getsock should actually become a field of the Curl_ssl struct, with a default implementation and a none implementation. I'll send a separate PR for that.

@bagder
Copy link
Member

@bagder bagder commented Feb 2, 2021

I think Curl_ssl_getsock should actually become a field of the Curl_ssl struct, with a default implementation and a none implementation. I'll send a separate PR for that.

Sounds like an excellent idea!

@bagder
Copy link
Member

@bagder bagder commented Feb 8, 2021

With #6558 merged, this just needs a little adjustment and then I think we're getting ready to merged this one too... Right?

@jsha
Copy link
Contributor Author

@jsha jsha commented Feb 8, 2021

With #6558 merged, this just needs a little adjustment and then I think we're getting ready to merged this one too... Right?

Yes, I think this is just about ready to merge! There are still some config settings that are ignored by this integration, like ciphersuite selection, TLS version selection, and disabling peer certificate validation, but I hope to add those to crustls and then send a followup PR to curl for them.

One thing I'm slightly stuck on: when I run tests locally, 2100 consistently fails. If I build with openssl and no rustls, 2100 passes locally. This is a DOH test:

# This is the DOH response for foo.example.com A 127.0.0.1. This requires that
# the test server is accessible at that address!

Here's the diff output after ./runtests.pl 2100. Note that the actual check-expected and check-generated files are binary, so diff refused to show them. In order to get useful diff output, I replaced all the non-ASCII bytes with their hex versions (00, 01, 03, 07).

$ diff -C 10 log/check-expected log/check-generated 
*** log/check-expected  2021-02-08 11:48:26.485293628 -0800
--- log/check-generated 2021-02-08 11:48:28.541238000 -0800
***************
*** 1,11 ****
  POST /21000001 HTTP/1.1[CR][LF]
  Host: 127.0.0.1:41851[CR][LF]
  Accept: */*[CR][LF]
  Content-Type: application/dns-message[CR][LF]
  Content-Length: 33[CR][LF]
  [CR][LF]
! 00000100000100000000000003foo07example03com0000010001GET /2100 HTTP/1.1[CR][LF]
! Host: foo.example.com:41851[CR][LF]
! User-Agent: curl/7.75.1-DEV[CR][LF]
! Accept: */*[CR][LF]
! [CR][LF]
--- 1,7 ----
  POST /21000001 HTTP/1.1[CR][LF]
  Host: 127.0.0.1:41851[CR][LF]
  Accept: */*[CR][LF]
  Content-Type: application/dns-message[CR][LF]
  Content-Length: 33[CR][LF]
  [CR][LF]
! 00000100000100000000000003foo07example03com0000010001

@bagder
Copy link
Member

@bagder bagder commented Feb 8, 2021

I suspect that test is flaky. I cannot reproduce the rustls failure right now with your code and the lastest crustls on my Debian deb box! 😢 I can rerun the test 50 times in a row just fine. With and without valgrind.

I noticed you merged master into your branch. We rather see you rebase your commits (and squash them suitably and then force-push) so that we can merge them without merge commits and maintain a linear history.

Since this backend is opt-in and totally new, I won't mind if we merge it early to make it easier to continue working on it with additional PRs.

@jsha
Copy link
Contributor Author

@jsha jsha commented Feb 8, 2021

Sounds good. I am happy to rebase. On my own repos, I usually squash a whole PR into a single commit. Is that okay here, or would you rather retain a series of commits that reflect the evolution of this branch during review?

@jsha
Copy link
Contributor Author

@jsha jsha commented Feb 9, 2021

Test 2100 started passing for me again, which leans towards your idea of a flaky test. I did run it under gdb once and saw that it was going to do a null pointer dereference in Curl_connecthost because Curl_dns_entry *remotehost was null. I've included the log below under a details tag in case it's useful. If I can reproduce a few more times I'll file a separate issue.

``` ~/curl/tests $ ./runtests.pl 2100 -g ********* System characteristics ******** * curl 7.75.1-DEV (x86_64-pc-linux-gnu) * libcurl/7.75.1-DEV OpenSSL/1.1.1f (crustls/0.2.0/rustls/0.19.0) zlib/1.2.11 * Features: alt-svc Debug HTTPS-proxy IPv6 Largefile libz MultiSSL NTLM NTLM_WB SSL TLS-SRP TrackMemory UnixSockets * Disabled: * Host: carapace * System: Linux carapace 5.8.0-41-generic #46~20.04.1-Ubuntu SMP Mon Jan 18 17:52:23 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux * OS: linux * Servers: SSL HTTP-IPv6 HTTP-unix FTP-IPv6 * Env: Valgrind Libtool * Seed: 257452 ***************************************** break cotest 2100...[HTTP GET using DOH] nnecGNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: .

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/jsha/curl/src/.libs/curl...
Argument list to give program being debugged when it is started is "--output log/curl2100.out --include --trace-ascii log/trace2100 --trace-time http://foo.example.com:38569/2100 -4 --doh-url http://127.0.0.1:38569/21000001".
(gdb) break connect.c:1360
No source file named connect.c.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (connect.c:1360) pending.
(gdb) run
Starting program: /home/jsha/curl/src/.libs/curl --output log/curl2100.out --include --trace-ascii log/trace2100 --trace-time http://foo.example.com:38569/2100 -4 --doh-url http://127.0.0.1:38569/21000001
warning: Missing auto-load script at offset 0 in section .debug_gdb_scripts
of file /home/jsha/curl/lib/.libs/libcurl.so.4.
Use `info auto-load python-scripts [REGEXP]' to list them.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* STATE: INIT => CONNECT handle 0x5555555f6158; line 1646 (connection #-5000)

  • Added connection 0. The cache now contains 1 members

Breakpoint 1, Curl_connecthost (data=0x5555555f6158, conn=0x5555555fb648, remotehost=0x5555555d4048) at connect.c:1360
1360 conn->num_addr = Curl_num_addresses(remotehost->addr);
(gdb) bt
#0 Curl_connecthost (data=0x5555555f6158, conn=0x5555555fb648, remotehost=0x5555555d4048) at connect.c:1360
#1 0x00007ffff77df6d3 in Curl_setup_conn (data=0x5555555f6158, protocol_done=0x7fffffffd891) at url.c:3992
#2 0x00007ffff77df854 in Curl_connect (data=0x5555555f6158, asyncp=0x7fffffffd890, protocol_done=0x7fffffffd891) at url.c:4037
#3 0x00007ffff77a9ea8 in multi_runsingle (multi=0x5555555d62f8, nowp=0x7fffffffd960, data=0x5555555f6158) at multi.c:1671
#4 0x00007ffff77ab7b0 in curl_multi_perform (multi=0x5555555d62f8, running_handles=0x7fffffffda50) at multi.c:2412
#5 0x00007ffff7777be3 in easy_transfer (multi=0x5555555d62f8) at easy.c:606
#6 0x00007ffff7777e5d in easy_perform (data=0x5555555d9d08, events=false) at easy.c:696
#7 0x00007ffff7777ec8 in curl_easy_perform (data=0x5555555d9d08) at easy.c:715
#8 0x000055555557abb9 in serial_transfers (global=0x7fffffffdca0, share=0x5555555d3e58) at tool_operate.c:2326
#9 0x000055555557b099 in run_all_transfers (global=0x7fffffffdca0, share=0x5555555d3e58, result=CURLE_OK) at tool_operate.c:2504
#10 0x000055555557b3d9 in operate (global=0x7fffffffdca0, argc=11, argv=0x7fffffffde18) at tool_operate.c:2620
#11 0x0000555555570991 in main (argc=11, argv=0x7fffffffde18) at tool_main.c:277
(gdb) p remotehost
$1 = (const struct Curl_dns_entry *) 0x5555555d4048
(gdb) p remotehost->addr
$2 = (struct Curl_addrinfo *) 0x5555555d3fd8
(gdb) p conn
$3 = (struct connectdata *) 0x5555555fb648
(gdb) next
1361 conn->tempaddr[0] = conn->tempaddr[1] = remotehost->addr;
(gdb) continue
Continuing.

  • family0 == v4, family1 == v6
  • Trying 127.0.0.1:38569...
  • STATE: CONNECT => WAITCONNECT handle 0x5555555f6158; line 1707 (connection #0)
  • Connected to 127.0.0.1 (127.0.0.1) port 38569 (#0)
  • STATE: WAITCONNECT => SENDPROTOCONNECT handle 0x5555555f6158; line 1837 (connection #0)
  • STATE: SENDPROTOCONNECT => DO handle 0x5555555f6158; line 1861 (connection #0)

POST /21000001 HTTP/1.1
Host: 127.0.0.1:38569
Accept: /
Content-Type: application/dns-message
Content-Length: 33

  • STATE: DO => DO_DONE handle 0x5555555f6158; line 1935 (connection #0)
  • STATE: DO_DONE => PERFORM handle 0x5555555f6158; line 2056 (connection #0)
    0 0 0 0 0 0 0 0 --:--:-- 0:00:16 --:--:-- 0* Mark bundle as not supporting multiuse
  • HTTP 1.1 or later with persistent connection
    < HTTP/1.1 200 OK
    < Date: Thu, 09 Nov 2010 14:49:00 GMT
    < Server: test-server/fake
    < Connection: close
    < Content-Type: application/dns-message
    < Content-Length: 49
    <
  • STATE: PERFORM => DONE handle 0x5555555f6158; line 2254 (connection #0)
  • multi_done
  • The cache now contains 0 members
  • Closing connection 0
  • Expire cleared (transfer 0x5555555f6158)

Breakpoint 1, Curl_connecthost (data=0x5555555d9d08, conn=0x5555555f56c8, remotehost=0x0) at connect.c:1360
1360 conn->num_addr = Curl_num_addresses(remotehost->addr);
(gdb) p remotehost
$4 = (const struct Curl_dns_entry *) 0x0
(gdb) bt
#0 Curl_connecthost (data=0x5555555d9d08, conn=0x5555555f56c8, remotehost=0x0) at connect.c:1360
#1 0x00007ffff77df6d3 in Curl_setup_conn (data=0x5555555d9d08, protocol_done=0x7fffffffd891) at url.c:3992
#2 0x00007ffff778c33f in Curl_once_resolved (data=0x5555555d9d08, protocol_done=0x7fffffffd891) at hostip.c:1118
#3 0x00007ffff77aa14b in multi_runsingle (multi=0x5555555d62f8, nowp=0x7fffffffd960, data=0x5555555d9d08) at multi.c:1757
#4 0x00007ffff77ab7b0 in curl_multi_perform (multi=0x5555555d62f8, running_handles=0x7fffffffda50) at multi.c:2412
#5 0x00007ffff7777be3 in easy_transfer (multi=0x5555555d62f8) at easy.c:606
#6 0x00007ffff7777e5d in easy_perform (data=0x5555555d9d08, events=false) at easy.c:696
#7 0x00007ffff7777ec8 in curl_easy_perform (data=0x5555555d9d08) at easy.c:715
#8 0x000055555557abb9 in serial_transfers (global=0x7fffffffdca0, share=0x5555555d3e58) at tool_operate.c:2326
#9 0x000055555557b099 in run_all_transfers (global=0x7fffffffdca0, share=0x5555555d3e58, result=CURLE_OK) at tool_operate.c:2504
#10 0x000055555557b3d9 in operate (global=0x7fffffffdca0, argc=11, argv=0x7fffffffde18) at tool_operate.c:2620
#11 0x0000555555570991 in main (argc=11, argv=0x7fffffffde18) at tool_main.c:277
(gdb) up
#1 0x00007ffff77df6d3 in Curl_setup_conn (data=0x5555555d9d08, protocol_done=0x7fffffffd891) at url.c:3992
3992 result = Curl_connecthost(data, conn, conn->dns_entry);
(gdb) p conn
$5 = (struct connectdata *) 0x5555555f56c8
(gdb) p conn->dns_entry
$6 = (struct Curl_dns_entry *) 0x0
(gdb) quit
A debugging session is active.

    Inferior 1 [process 782683] will be killed.

Quit anyway? (y or n) y

</details>

This adds a new TLS backend, rustls. It uses the C-to-rustls bindings
from https://github.com/abetterinternet/crustls.

Rustls is at https://github.com/ctz/rustls/.

There is still a fair bit to be done, like sending CloseNotify on
connection shutdown, respecting CAPATH, and properly indicating features
like "supports TLS 1.3 ciphersuites." But it works well enough to make
requests and receive responses.

Blog post for context:
https://www.abetterinternet.org/post/memory-safe-curl/
@jsha
Copy link
Contributor Author

@jsha jsha commented Feb 9, 2021

Rebased and (IMO) ready to merge. During a last round of testing I found and fixed a couple of memory leaks in error paths, and added a better error message for failure to create a session. I also realized I had missed a couple of calls to Curl_ssl_getsock in #6558. You can see the diffs in 4179224. I'm happy to pull those into their own PR if you'd like.

@bagder bagder closed this in 246399a Feb 9, 2021
@bagder
Copy link
Member

@bagder bagder commented Feb 9, 2021

Thanks for your work @jsha. This is now merged and can be experimented with easier by more people and we can take follow-up PRs against master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

8 participants