Core issue
Windows SSPI engine (https://github.com/curl/curl/blob/master/lib/vtls/schannel.c) has trouble managing curl_schannel_cred lifetime properly:
- refcount is modified not atomically and outside of CURL_LOCK_DATA_SSL_SESSION scope
- refcount is modified inconsistently:
Curl_ssl_{get,add}sessionid() in https://github.com/curl/curl/blob/master/lib/vtls/vtls.c don't lead to refcount being immediately incremented, as it should be semantically.
This leads to sporadic memory-related crashes (accessing freed memory, double free, etc) when downloading https:// URLs concurrently in separate threads.
I also believe that the same set of issues is present for OpenSSL engine in https://github.com/curl/curl/blob/master/lib/vtls/openssl.c, but the race gap is way more narrow there and I couldn't crash it in a reasonable amount of time.
And there are also a couple of somewhat related notes on schannel.c:
How to reproduce
To have a chance at reproducing this one needs to start several threads and use them to fetch several https:// URLs concurrently.
Unfortunately, I don't have a clean standalone utility for this purpose, and there is ideally a server part that is completely out of scope.
However, I at least can give you more details about my setup:
- There are many (>8, as per
max_ssl_sessions*) HTTPS servers/virtual hosts that can stand several random requests per second for hours without anyone complaining or throttling. I use trivial python http server for generating several kilobytes of random payload, and nginx as an ssl-wrapping reverse proxy to it. I use valid CA-signed SSL certificates, but I think that it could also be reproduced using self-signed certs (allowing libcurl to accept them, of course)
- libcurl 7.43.0 on Windows with SSPI SSL engine is used. The offending code hasn't changed since that rather old version, so I believe the issue is still present.
CURLSH handle is created and set with proper locking functions. This handle is used for all requests.
- Several threads are created. Each thread generates a stream of random
https:// URLs to random servers.
curl_easy API is used to perform each request.
After a few minutes and a few thousands requests a crash is usually observed.
(*interestingly, this variable seems to be intended to be set by user, but there is no API for it, only a hardcoded value of 8)
Ways to fix
I could come up with two feasibly-looking options. I ask maintainers for their opinion, as I'm not familiar with curl codebase.
General ideas of these options are:
- A. Along with
Curl_ssl_kill_session() make a Curl_ssl_retain_session() that is called by Curl_ssl_{get,add}sessionid() on appropriate occasions. Amend schannel.c accordingly. Note that if this issue is indeed present for OpenSSL engine, then this gets a bit trickier. OpenSSL has no separate increment-refcount function for SSL_SESSION, and I see no clean way to fake it.
- B. Increase
CURL_LOCK_DATA_SSL_SESSION scope, e.g. by making it taken explicitly by a user of Curl_ssl_*sessionid() API, rather than implicitly by these functions themselves. Although this is less clean in terms of who manages what lifetime, and looks somewhat ugly for multiple-returns, it is a very straightforward change. I have already implemented it for SSPI and OpenSSL engines locally to discover that I'm not getting crashes anymore.
Note that I haven't looked at other SSL engines, and have no idea what's going on there. And, unfortunately, it is likely I won't have resources to do that. I'd appreciate any feedback from informed people on this.
Thanks!
Core issue
Windows SSPI engine (https://github.com/curl/curl/blob/master/lib/vtls/schannel.c) has trouble managing
curl_schannel_credlifetime properly:Curl_ssl_{get,add}sessionid()in https://github.com/curl/curl/blob/master/lib/vtls/vtls.c don't lead to refcount being immediately incremented, as it should be semantically.This leads to sporadic memory-related crashes (accessing freed memory, double free, etc) when downloading
https://URLs concurrently in separate threads.I also believe that the same set of issues is present for OpenSSL engine in https://github.com/curl/curl/blob/master/lib/vtls/openssl.c, but the race gap is way more narrow there and I couldn't crash it in a reasonable amount of time.
And there are also a couple of somewhat related notes on
schannel.c:BOOL cachedfield seems to be superfluous -- maintainingrefcountshould be enough.curl_schannel_credobjects: https://github.com/curl/curl/blob/master/lib/vtls/schannel.c#L1430 and https://github.com/curl/curl/blob/master/lib/vtls/schannel.c#L1458 -- oneCurl_schannel_session_free()called from appropriate places should be enough.How to reproduce
To have a chance at reproducing this one needs to start several threads and use them to fetch several
https://URLs concurrently.Unfortunately, I don't have a clean standalone utility for this purpose, and there is ideally a server part that is completely out of scope.
However, I at least can give you more details about my setup:
max_ssl_sessions*) HTTPS servers/virtual hosts that can stand several random requests per second for hours without anyone complaining or throttling. I use trivial python http server for generating several kilobytes of random payload, and nginx as an ssl-wrapping reverse proxy to it. I use valid CA-signed SSL certificates, but I think that it could also be reproduced using self-signed certs (allowing libcurl to accept them, of course)CURLSHhandle is created and set with proper locking functions. This handle is used for all requests.https://URLs to random servers.curl_easyAPI is used to perform each request.After a few minutes and a few thousands requests a crash is usually observed.
(*interestingly, this variable seems to be intended to be set by user, but there is no API for it, only a hardcoded value of 8)
Ways to fix
I could come up with two feasibly-looking options. I ask maintainers for their opinion, as I'm not familiar with curl codebase.
General ideas of these options are:
Curl_ssl_kill_session()make aCurl_ssl_retain_session()that is called byCurl_ssl_{get,add}sessionid()on appropriate occasions. Amendschannel.caccordingly. Note that if this issue is indeed present for OpenSSL engine, then this gets a bit trickier. OpenSSL has no separate increment-refcount function forSSL_SESSION, and I see no clean way to fake it.CURL_LOCK_DATA_SSL_SESSIONscope, e.g. by making it taken explicitly by a user ofCurl_ssl_*sessionid()API, rather than implicitly by these functions themselves. Although this is less clean in terms of who manages what lifetime, and looks somewhat ugly for multiple-returns, it is a very straightforward change. I have already implemented it for SSPI and OpenSSL engines locally to discover that I'm not getting crashes anymore.Note that I haven't looked at other SSL engines, and have no idea what's going on there. And, unfortunately, it is likely I won't have resources to do that. I'd appreciate any feedback from informed people on this.
Thanks!