Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Denial of service when using X509 certificates #31664

Closed
ayende opened this issue Feb 3, 2020 · 12 comments
Closed

Denial of service when using X509 certificates #31664

ayende opened this issue Feb 3, 2020 · 12 comments

Comments

@ayende
Copy link
Contributor

ayende commented Feb 3, 2020

Note, this has caused production impact for us.

As I'm writing this,
http://crl.identrust.com/DSTROOTCAX3CRL.crl and http://apps.identrust.com/roots/dstrootcax3.p7c
are not responding.

When using the attached certificate, the following code will hang for a long time on Linux.

cert.zip

            var cert = new X509Certificate2(args[0]);
            var chain = new X509Chain();
            chain.ChainPolicy.VerificationFlags = X509VerificationFlags.AllFlags;
            chain.ChainPolicy.RevocationFlag = X509RevocationFlag.ExcludeRoot;
            Console.WriteLine(chain.Build(cert));

Note, this code is called as part of doing client certificate authentication, which means that it will hang badly on creating new connections.

If this doesn't fail, you may need to delete this file: /home/$USER/.dotnet/corefx/cryptography/crls/2e5ac55d.crl

The underlying issue is that when validating the chain, it tries to get the CRL, but right now the CA is not responding. At this point, this means that we can't create any new connections.

On existing machines, the file is cached, so you won't notice that. When you deploy a clean machine, the first time that it connects, it try to download and cache the value, but because it is dead, the whole system is dead.

This is applicable to Linux only, on Windows, we had no issue with this, even on new machines.

@stephentoub
Copy link
Member

stephentoub commented Feb 3, 2020

@ayende
Copy link
Contributor Author

ayende commented Feb 3, 2020

Key issues for us here:

  • This takes a very long time and effectively hangs.
  • When using SslStream, there is no way to pass configuration the the X509Chain.
  • There is no indication of the root cause of the problem

@ayende
Copy link
Contributor Author

ayende commented Feb 3, 2020

To make things worst, the output of this code, eventually, is True, so the process is failing open.
The major issue is the length of time that it takes and that there are no reporting here.

$ time dotnet run cert.pfx
True

real    1m42.055s
user    0m2.445s
sys     0m0.267s

A (much) shorter wait time (or a way to configure that) would be much better.

@bartonjs
Copy link
Member

bartonjs commented Feb 3, 2020

On Windows, we had no issue with this, even on new machines.

What are you calling "no issue"? Digging into things, I think under the model where X509ChainPolicy.UrlRetrievalTimeout is zero ("take as long as you like") the Windows code will have a 15 second timeout, and our Linux code has a 100 second timeout (1 minute, 40 seconds). Are you seeing 15 seconds on Windows, or something else?

We could probably change our implicit 100 second timeout to something smaller on Linux; but it's interesting to see what behavior you're actually hoping for.

I think Windows caches failures for a little bit, and we don't have a good way to do that on Linux, so this is really "first failure" behavior that we're after.

@ayende
Copy link
Contributor Author

ayende commented Feb 4, 2020

I meant, that it completed in reasonable time frame. It it possible that it had it cached already? I'm not sure where the CRL cache is on Windows.

At any rate, the killer thing for us is that we have no control over this when using SslStream.
This isn't the first time we run into this, see: #24527

We are using SslStream and client certificates, and there is no way to control the policy for the client certs. In our use case, we really want to be able to authenticate the client certificate without leaving the machine.

As we just saw, not being able to do that is a recipe for production failures.

@ayende
Copy link
Contributor Author

ayende commented Feb 4, 2020

Also, why isn't there a way to cache the failure on Linux?
Can't you write a file the the crls directory with a failure indicator and test every N time?

The failure caching on Windows is both a plus and a minus.
It is a plus because after 15 seconds, we are running normally, but every 15 / 30 minutes, we'll have a spike in latency.

What I'm really looking forward to is to be able to say: "don't leave this machine at all" and be able to say that from the SslStream, not when dealing with the X509Chain directly. Because this is where I'm running into this. I don't think that I have any way to configure the chain policy from SslStream

@bartonjs
Copy link
Member

bartonjs commented Feb 4, 2020

What I'm really looking forward to is to be able to say: "don't leave this machine at all"

From X509Chain, that's set X509ChainPolicy.RevocationMode to X509RevocationMode.Offline (using the cache is OK, fail if it's missing). It's a weird mode to be in, though, because it will always fail on a fresh machine.

SslStream defaults to not even checking CRLs, IIRC, With the new SslServerAuthenticationOptions (or SslClientAuthenticationOptions) options class you can set CertificateRevocationCheckMode to Offline.

@ayende
Copy link
Contributor Author

ayende commented Feb 5, 2020

When using the following option, will this cause the certificate validation to not leave the machine?

 stream.AuthenticateAsServerAsync(new SslServerAuthenticationOptions
 {
     ClientCertificateRequired = true,
     AllowRenegotiation = false,
     CertificateRevocationCheckMode = X509RevocationMode.NoCheck,
     EncryptionPolicy = EncryptionPolicy.RequireEncryption,
     EnabledSslProtocols = SslProtocols.Tls12|SslProtocols.Tls13,
 }, CancellationToken.None);

Is so, that is exactly what we want and I don't need anything else.

@bartonjs
Copy link
Member

bartonjs commented Jun 24, 2020

Across the existing revocation mode value, the new disable AIA option, and the in-progress API for SslStream to prebuild the certificate context for multiple uses, the scenarios from here are taken care of now.

@ayende
Copy link
Contributor Author

ayende commented Jun 24, 2020

I'm sorry, I looked a bit, but I can't find what you mean by AIA and in progress API. Can you point me to them?

@ayende
Copy link
Contributor Author

ayende commented Jun 24, 2020

WRT to the in progress API, are you referring to #35844 ?

@bartonjs
Copy link
Member

bartonjs commented Jun 24, 2020

Yep, #35844 and #37485

@msftbot msftbot bot locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants