Skip to content

Race condition in the schannel SSL sessionid cache #815

@w23

Description

@w23

Core issue

Windows SSPI engine (https://github.com/curl/curl/blob/master/lib/vtls/schannel.c) has trouble managing curl_schannel_cred lifetime properly:

  1. refcount is modified not atomically and outside of CURL_LOCK_DATA_SSL_SESSION scope
  2. refcount is modified inconsistently: Curl_ssl_{get,add}sessionid() in https://github.com/curl/curl/blob/master/lib/vtls/vtls.c don't lead to refcount being immediately incremented, as it should be semantically.

This leads to sporadic memory-related crashes (accessing freed memory, double free, etc) when downloading https:// URLs concurrently in separate threads.

I also believe that the same set of issues is present for OpenSSL engine in https://github.com/curl/curl/blob/master/lib/vtls/openssl.c, but the race gap is way more narrow there and I couldn't crash it in a reasonable amount of time.

And there are also a couple of somewhat related notes on schannel.c:

How to reproduce

To have a chance at reproducing this one needs to start several threads and use them to fetch several https:// URLs concurrently.
Unfortunately, I don't have a clean standalone utility for this purpose, and there is ideally a server part that is completely out of scope.

However, I at least can give you more details about my setup:

  • There are many (>8, as per max_ssl_sessions*) HTTPS servers/virtual hosts that can stand several random requests per second for hours without anyone complaining or throttling. I use trivial python http server for generating several kilobytes of random payload, and nginx as an ssl-wrapping reverse proxy to it. I use valid CA-signed SSL certificates, but I think that it could also be reproduced using self-signed certs (allowing libcurl to accept them, of course)
  • libcurl 7.43.0 on Windows with SSPI SSL engine is used. The offending code hasn't changed since that rather old version, so I believe the issue is still present.
  • CURLSH handle is created and set with proper locking functions. This handle is used for all requests.
  • Several threads are created. Each thread generates a stream of random https:// URLs to random servers.
  • curl_easy API is used to perform each request.

After a few minutes and a few thousands requests a crash is usually observed.

(*interestingly, this variable seems to be intended to be set by user, but there is no API for it, only a hardcoded value of 8)

Ways to fix

I could come up with two feasibly-looking options. I ask maintainers for their opinion, as I'm not familiar with curl codebase.

General ideas of these options are:

  • A. Along with Curl_ssl_kill_session() make a Curl_ssl_retain_session() that is called by Curl_ssl_{get,add}sessionid() on appropriate occasions. Amend schannel.c accordingly. Note that if this issue is indeed present for OpenSSL engine, then this gets a bit trickier. OpenSSL has no separate increment-refcount function for SSL_SESSION, and I see no clean way to fake it.
  • B. Increase CURL_LOCK_DATA_SSL_SESSION scope, e.g. by making it taken explicitly by a user of Curl_ssl_*sessionid() API, rather than implicitly by these functions themselves. Although this is less clean in terms of who manages what lifetime, and looks somewhat ugly for multiple-returns, it is a very straightforward change. I have already implemented it for SSPI and OpenSSL engines locally to discover that I'm not getting crashes anymore.

Note that I haven't looked at other SSL engines, and have no idea what's going on there. And, unfortunately, it is likely I won't have resources to do that. I'd appreciate any feedback from informed people on this.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions