Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Memory Consumption When Validating Remote Certificates With Client SslStreams #101552

Closed
autoracer42 opened this issue Apr 25, 2024 · 12 comments
Assignees
Milestone

Comments

@autoracer42
Copy link

Description

When authenticating an SSL Stream as client using the Online Certificate revocation check option in a Linux container we see significant unmanaged memory usage / growth often leading to OOM pod restarts.

This can most clearly be seen by monitoring the RES memory of the dotnet process via the TOP command whilst making TLS sessions and changing the CertificateRevocationCheckMode against the SslClientAuthenticationOptions passed to the AuthenticateAsClientAsync method.

Reproduction Steps

This issue is most prevalent when using a certificate issued by a public CA (in the case of the testing digicert) and only occurs when the server does not support OCSP stapling.

Attached are two test applications, one a TLS server and one a TLS client. The server expects two command line arguments, the PFX file for a server certificate and the password for that PFX, it subsequently listens on all hosts on port 54784
The TLS client application then expects three arguments, the host to connect to, the port to connect to and whether to perform revocation checks on server certificates. The client subsequently makes 100 TLS sessions to the server and outputs memory details, hitting enter will make it send another batch of 100 requests.

When the client is run on Linux with the revocation argument set to false, the reported memory for the application remains around 65MB and this is consistent if numerous sets of requests are made. When setting the check revocation argument to true at the end of the first set of requests the memory usage is around 400MB and often continues to increase through subsequent sets of requests.

CertificateRevocationMemoryUsageTestApps.zip

Expected behavior

Memory remains relatively stable and is cleared when TLS connections are terminated

Actual behavior

Memory is higher and increases

Regression?

Behaviour appears very similar if the test application is converted to .NET 6.0 using the runtime/6.0 container

Known Workarounds

Disable Certificate revocation checks when authenticating SslStream objects

Configuration

.NET 8.0 (tested with both 8.0.4 and the 8.0.5 preview)
runtime/8.0 and runtime/8.0-jammy containers
The certificate used appears quite important, as the issue does not occur with a self-signed certificate

Other information

No response

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Apr 25, 2024
@rzikm
Copy link
Member

rzikm commented Apr 26, 2024

Triage: we should investiate in 9.0

@rzikm rzikm added this to the 9.0.0 milestone Apr 26, 2024
@autoracer42
Copy link
Author

I've quickly upgraded the client app from above to .NET 9.0 (using a nightly/sdk:9.0-preview container) and run a few more tests with the same certificate, behaviour is unchanged from .NET 8.0.

@rzikm rzikm removed the untriaged New issue has not been triaged by the area owner label Apr 29, 2024
@rzikm
Copy link
Member

rzikm commented Apr 29, 2024

I was unable to reproduce this issue with various combinations of OCSP/CRL revocation checking on the server (I found #101684, but that was not as severe as you describe and reproduced regardless of parameters).

  • Can you provide a generated certificate (or OpenSSL instructions how to generate one) which reproduces the issue?
  • Alternatively, if there's public server against which I can run the repro, can you share the url (you can use my work email radekzikmund (at) microsoft.com)
  • lastly, if neither is applicable, can you share the list of extensions on the certificate? You can anonymize them, but I need something which I can use to generate similar certificate to repro the issue.

Or, if you are willing to try investigating yourself, then the tool which is generally useful for these issues is heaptrack

heaptrack -o <path to output file> <app name> <args....>

run the repro and then process the output file to create a report

heaptrack_print -l1 -a0 -T0 -p0 -f <path to output file>.zst

And upload the report file here.

@rzikm rzikm self-assigned this Apr 29, 2024
@autoracer42
Copy link
Author

As requested, please find below the anonymised details of the certificate our server is presenting (I’m a bit out of my depth with what various extensions do so I might have anonymised a bit too much).

Version = V3
Serial Number = <32 hex characters>
Signature Algorithm = sha256RSA
Signature Hash Algorithm = sha256
Issuer = <CN (42 characters), O (13 characters) and C (2 characters)>
Valid From = ‎01 ‎August ‎2023 01:00:00
Valid To = ‎21 ‎August ‎2024 00:59:59
Subject = <CN (DNS wildcard, 25 characters), O (27 characters), L (6 characters), S (5 characters), C (2 characters)>
Public Key = RSA (2048 bits)
Public Key Parameters = 05 00
Authority Key Identifier = KeyID=<40 hex characters>
Subject Key Identifier = <40 hex characters>
Subject Alternative Name = DNS Name = <wildcard DNS>, DNS Name = <DNS>
Enhanced Key Usage = Server Authentication (1.3.6.1.5.5.7.3.1), Client Authentication (1.3.6.1.5.5.7.3.2)
CRL Distribution Points = <2 with full name HTTP URLs>
Certificate Policy = Policy identifier = 2.23.140.1.2.2, policy qualifier info = CPS + <policy URL>
Authority Information Access = Access Method "On-line Certificate Status Protocol (1.3.6.1.5.5.7.48.1)", Alternative Name = <url>, Access method = "Certification Authority Issuer (1.3.6.1.5.5.7.48.2)", Alternative name = <url to crt file>
Basic Constraints = Subject Type=End Entity, Path Length Constraint=None
SCT List = <3 sets of 64 hex characters, a date, SHA256, ECDSA and 144 hex characters>
Key Usage = Digital Signature, Key Encipherment (a0)
Thumbprint = <40 hex characters>

I’ve also attached a couple of heaptrack outputs, heaptrack-true.txt is for a test which was performing certificate revocation checks and reports ~45MB of leaked memory with a peak RSS of just over 500MB (the container had a limit of 512MB so terminated at this point), heaptrack-false.txt is for a test which did not perform certificate revocation checks, this reports ~5MB of leaked memory and a peak RSS of around 85MB.

heaptrack-false.txt
heaptrack-true.txt

@rzikm
Copy link
Member

rzikm commented Apr 30, 2024

I don't see anything in the heaptrack output suggesting a leak in OpenSSL related integration, the fact that 45MB was reported "leaked" may be because the app was killed while those resources were live. and even 45MB leak would not explain 512 MB RSS.

I see you mentioned running on jammy containers, following issue may be relevant

#95922 (comment)

Can you try setting DOTNET_GCgen0size=1E00000 enviornment variable? And read the rest of the discussion starting from the linked comment for more context and other exploration ideas.

@autoracer42
Copy link
Author

I've just quickly tried upping the container memory limit to 1GB and running the test again (to stop it being removed by Kubernetes), the heaptrack output for this still showed the same high RSS but showed leakage of around 4MB, sorry that caused a bit of confusion earlier.

I've also tested with the environment variable suggested set and this didn't make any difference. For reference we're using Intel based AKS nodes (B4MS in our development cluster) and the getconf output referenced in the other issue is

LEVEL4_CACHE_SIZE                  0
LEVEL4_CACHE_ASSOC
LEVEL4_CACHE_LINESIZE

Most of my testing has been on Debian based containers (the reference to Jammy earlier was a one-time test because it was easy to do), so I had a quick try on an alpine based container and this didn't exhibit the issue, but we're not really in a position to convert our production app to alpine.

The other thing which may be potentially interesting is that I tried downloading the CRLs manually and each is around 8.8MB.

Unfortunately I'd also tried the other things mentioned in the linked issue historically (setting a lower default stack size and disabling TLS resume), prior to raising this issue.

Hopefully this is useful, and let me know if there's anything else I can test.

@bartonjs
Copy link
Member

The other thing which may be potentially interesting is that I tried downloading the CRLs manually and each is around 8.8MB.

Assuming there's nothing exotic going on in that CRL, that's probably the problem. When I last looked into a high memory issue with CRLs, it basically boiled down to

  • Every time there's a serial number in the CRL OpenSSL calls malloc to track it.
    • They're smallish. 16 bytes. so malloc(16).
  • glibc doesn't want to track small allocations, so every allocation gets rounded up to the next 1024
  • That means, practically speaking, a 1000 byte overhead per serial number in the CRL
  • openssl crl -in some_random.crl -text -noout | grep Serial | wc -l on a few CRLs, tacking 3 zeros on the end of that, and comparing to the original file size gives a "finger in the wind" estimate of the memory usage of loading a CRL to be about 20x its original size.
  • Memory Leak after upgrading to .NET Core 1.1 on Linux #7144 (comment) says that for APNS's 100MB CRL the RSS peaked at about 900MB, so maybe it's only 9x, but it's still surprisingly large overhead.

So an 8.8MB CRL would expand to 176MB while it's in use (using my 20x guesstimate). With a 512MB process quota you can't even have that loaded in parallel 3 times before kerploding.

The "round up to 1024" thing is from memory. I didn't just write a test for it it, and don't know if glibc has self-tuning (or environment variable-controlled tuning) to adjust it.

@rzikm
Copy link
Member

rzikm commented May 6, 2024

Given the information from bartonjs's comment, I don't think there is anything we can reasonably do from .NET side about this.

I think you can confirm bartonjs's hypthesis using something like this

        [StructLayout(LayoutKind.Sequential)]
        struct mallinfo2_st
        {
            public ulong arena;
            public ulong ordblks;
            public ulong smblks;
            public ulong hblks;
            public ulong hblkhd;
            public ulong usmblks;
            public ulong fsmblks;
            public ulong uordblks;
            public ulong fordblks;
            public ulong keepcost;
        }

        [DllImport("libc")]
        private static extern mallinfo2_st mallinfo2();

        private static void DumpMallInfo(in mallinfo2_st info)
        {
            Console.WriteLine($"Non-mmapped space allocated (bytes):        {info.arena,11:N0}");
            Console.WriteLine($"Number of free chunks:                      {info.ordblks,11:N0}");
            Console.WriteLine($"Number of free fastbin blocks:              {info.smblks,11:N0}");
            Console.WriteLine($"Number of mmapped regions:                  {info.hblks,11:N0}");
            Console.WriteLine($"Space allocated in mmapped regions (bytes): {info.hblkhd,11:N0}");
            Console.WriteLine($"usmblks:                                    {info.usmblks,11:N0}");
            Console.WriteLine($"Space in freed fastbin blocks (bytes):      {info.fsmblks,11:N0}");
            Console.WriteLine($"Total allocated space (bytes):              {info.uordblks,11:N0}");
            Console.WriteLine($"Total free space (bytes):                   {info.fordblks,11:N0}");
            Console.WriteLine($"Top-most, releasable space (bytes):         {info.keepcost,11:N0}");
            Console.WriteLine();
        }

You should see large memory being reported in fastbin blocks.

You can then try forcing the consolidation and release of those fastbin blocks using mallopt or malloc_trim (similarly as mallinfo2 is being called in the snippet above). Be sure to guard the calls to these functions so that they happen only on Linux.

@autoracer42
Copy link
Author

Thanks for the suggestion (I saw bartonjs's comment and assumed there wasn't much that could be done at this point). I've tried the mallinfo and mallopt / malloc_trim items in the test application, the memory usage with these appears a lot more stable. Putting them after the test app loop (i.e. perform 100 SSL handshakes and then look at the memory usage) immediately after the sessions the memory is ~490 MB, after the frees memory is back down to ~130MB.

One other thought I have had around this issue is that the certificate includes both a CRL distribution point and the Authority Information Access extension with an OCSP responder, however even using a completely fresh Kubernetes pod with TcpDump running before any requests are made (I couldn't think of a better way to work out exactly what was happening on the network) through 100 sessions the only requests made outside of the TLS connection packets are for the CRL. I've had a look at the documentation around the SslStream authentication (including the idea of manually specifying an X509ChainPolicy) but can't see a setting which would be causing a CRL check to be performed in preference to a client-side OCSP check, is there anything I can set on the stream so it performs a client-side OCSP check when an OCSP responder is defined in the certificate before falling back to the CRL as a last resort?

I've looked at OCSP stapling but unfortunately we don't control the target server in our main application so that's a non-starter.

@rzikm
Copy link
Member

rzikm commented May 9, 2024

.NET should support client-side OCSP queries, but from source code it seems that it will always prioritize CRL. @bartonjs will know for sure.

@bartonjs
Copy link
Member

bartonjs commented May 9, 2024

is there anything I can set on the stream so it performs a client-side OCSP check when an OCSP responder is defined in the certificate before falling back to the CRL as a last resort?

No. For Android, macOS, and Windows, the CRL or OCSP decision is always made by the OS. For Linux, the decision is made by .NET, but it's CRL-first (assuming no stapled OCSP).

@rzikm
Copy link
Member

rzikm commented May 13, 2024

Since we can't add any API surface to control the CRL/OCSP priority, I don't think there is more at this moment we can do regarding this issue. Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants