add hooks to debug OpenSSL memory #101626

wfurt · 2024-04-26T20:49:57Z

We had several cases when users complained about large memory use. For than native it is quite difficult to figure out where the memory goes. This PR aims to make that somewhat easier.

OpenSSL provides hooks for memory function so this PR adds switch to optimally hook into that.
The only one caveat that the CRYPTO_set_mem_functions works only if called before any allocations e.g. it needs to be done very early in the process. So I end up putting into initialization process .... even if I originally envisioned it somewhere else.

The simple use pattern is something like

export DOTNET_SYSTEM_NET_SECURITY_OPENSSL_MEMORY_DEBUG=1
var ci = typeof(SslStream).Assembly.GetTypes().First(t => t.Name == "CryptoInitializer");


do some TLS/crypto work


Console.WriteLine($"Bytes known to GC [{GC.GetTotalMemory(false)}], process working set [{process.WorkingSet64}]");
Console.WriteLine("OpenSSL memory {0}", ci.InvokeMember("TotalAllocatedMemory", BindingFlags.GetField | BindingFlags.NonPublic | BindingFlags.Static, null, null, null));

That provides insight how much memory is actually used by OpenSSL.
It allocates little bit more memory to store extra info but it should be reasonably cheap.

If somebody cares about more details they can do something like

ci.InvokeMember("EnableTracking", BindingFlags.InvokeMethod | BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Static, null, null, null);

do some TLS/crypto work

Tuple<IntPtr, int, string>[] allocations = (Tuple<IntPtr, int, string>[])si.InvokeMember("GetIncrementalAllocations", BindingFlags.InvokeMethod | BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Static | BindingFlags.Instance , null, null, null);
for (int j = 0; j < allocations.Length; j++)
{
    (IntPtr ptr, int size, string src) = allocations[j];
    Console.WriteLine("Allocated {0} bytes at 0x{1:x} from {2}", size, ptr, src);
}

this would provide something like

Allocated 81 bytes at 0x7f0c8013d448 from ../crypto/err/err.c:820
Allocated 3 bytes at 0x7f0c8000bd28 from ../crypto/asn1/asn1_lib.c:308
Allocated 81 bytes at 0x7f0c8c13a108 from ../crypto/err/err.c:820
Allocated 40 bytes at 0x7f0c80126df8 from ../crypto/x509/x_name.c:92
Allocated 13 bytes at 0x7f0c8013b438 from ../crypto/asn1/asn1_lib.c:308

dumping large allocation data set is slow and expensive. It is done under local so it blocks all other OpenSSL allocations. I feel this is ok for now but it should be used with caution. I also feel that access through Reflection is OK since this is only last resort debug hook e.g. it does not need stable API and convenient access.

wfurt · 2024-04-27T03:59:13Z

It looks like the build is failing because we are trying to build agains OpenSSL 1.0 that is EOS since 2019.

 -- Found OpenSSL: /crossrootfs/x64/usr/lib/x86_64-linux-gnu/libcrypto.so (found version "1.0.2g")

I was tempted to simply disable the debug feature for that version as the very old OpenSSL has different prototype.
But it seems like we would loose that for all platforms.

Any thoughts @bartonjs on moving the build to at least 1.1.1 that is EOS only since last year and some distributions we support are still using it?

There are probably different ways how to solve the build problems but I feel it is perhaps finally time to ditch 1.0.

jkotas · 2024-04-27T04:41:36Z

the build problems

We build against Ubuntu 16.04 headers and libs currently #83428 . It is likely going to stay that way for .NET 9.

wfurt · 2024-04-28T23:29:41Z

the build problems

We build against Ubuntu 16.04 headers and libs currently #83428 . It is likely going to stay that way for .NET 9.

What is reason for it @jkotas? It seems like even 8.0 does not support 16.04: https://github.com/dotnet/core/blob/main/release-notes/8.0/supported-os.md

jkotas · 2024-04-28T23:51:39Z

It is same deal as Windows 7. It is not supported, but we avoid intentionally breaking it to help some important customers.

...raries/Common/src/Interop/Unix/System.Security.Cryptography.Native/Interop.Initialization.cs

vcsjones · 2024-04-29T13:11:29Z

trying to build agains OpenSSL 1.0 that is EOS since 2019.

OpenSSL version support is not as simple as the OpenSSL support policy. Distros will continue to use EOL versions of OpenSSL but backport fixes under their own LTS support policy.

I don't think .NET actually "officially" supports any Linux distros with 1.0.2 anymore. However 1.0.2k/g has played an important role far past its 2019 EOL and there are a number of Linux distros that still support it today.

vcsjones · 2024-04-29T14:55:12Z

That said, since this is a diagnostic feature, I don't know that it makes sense to go through any particular lengths to get it working with 1.0.

…opensslDebug

…Debug

...on/src/Interop/Android/System.Security.Cryptography.Native.Android/Interop.Initialization.cs

jkotas · 2024-04-30T03:12:15Z

malloc/free can be used in more places than it is ok to run managed code. For example, you can use malloc/free in thread detach callback, but running managed code in thread detach callback is not safe/reliable. What's our confidence level that OpenSSL only ever calls malloc/free in places where it is safe to run managed code? At minimum, this should get a full CI run with this instrumentation unconditionally enabled to see whether it is going to hit any crashes.

wfurt · 2024-04-30T04:35:56Z

That is new in 3.0x and related to providers ... like FIPS. I can explore more. Is it not safe to use just the allocation functions or any managed code @jkotas?

And I used generic tools before. They are difficult to use as it is difficult to focus only one particular part ... like SSL ... and tracking all runtime allocations is expensive. And my intention goes beyond just leaks. We have tight integration with OpenSSL and it would be good to havr more insight to what is happening inside. And the available hooks provide additional and useful information.

I was originally thinking about make it available only in debug builds. That would certainly limit the scope of troubles.
But that would make it really difficult IMHO to answer question if memory growth even comes from OpenSSL or some other native component.

jkotas · 2024-04-30T05:14:31Z

Is it not safe to use just the allocation functions or any managed code @jkotas?

Any managed code.

janvorli · 2024-04-30T06:57:06Z

I think the logging is simple enough to be written in C and placed into the openssl PAL. That should make it easy to get rid of the concerns raised here. We can possibly make it even simpler, just writing the log entries into a log file and creating a simple tool to analyze it instead of processing the information in the runtime.
I think for debugging problems in customer scenarios, having it in the runtime is better than having a separate library. My experience from debugging the memory related issues with customers is that for some of them, things like preloading a lib are sometimes a problem in the production environment.

...raries/Common/src/Interop/Unix/System.Security.Cryptography.Native/Interop.Initialization.cs

filipnavara · 2024-04-30T13:03:33Z

I think for debugging problems in customer scenarios, having it in the runtime is better than having a separate library.

Agreed.

That said, I have a distilled version of the same concept (https://gist.github.com/filipnavara/7bf3791fb795266fe46f4383a075423c) deployed on a .NET 8 app (doesn't care about OpenSSL versioning since it's a one trick pony for diagnosing specific issue in a specific environment). It's likely possible to have that in separate library but it would come with additional complexities if the allocation callbacks needed to be written in C and compiled for multiple platforms. Having the C code in runtime takes care of this tricky part.

We can possibly make it even simpler, just writing the log entries into a log file and creating a simple tool to analyze it instead of processing the information in the runtime.

I'm not sure that would cut it. The app where we would use this diagnostic is operating on a scale where the allocations are multiple gigabytes at any given time and the allocation/free events are frequent, so summary information is what we need. Logging the calls [even in a compressed formats] would produce way too much information.

jkotas · 2024-04-30T13:24:55Z

I have a distilled version of the same concept (https://gist.github.com/filipnavara/7bf3791fb795266fe46f4383a075423c) deployed on a .NET 8 app

Thank you for sharing this example. I like the flexibility of including it as source in the app. For example, you can monitor the total crypto memory consumption via performance counter using your existing monitoring solution as this example shows; you can add logging of stacktraces and only for certain allocation if you need that to diagnose them problem; etc.

wfurt · 2024-04-30T16:20:53Z

I have a distilled version of the same concept (https://gist.github.com/filipnavara/7bf3791fb795266fe46f4383a075423c) deployed on a .NET 8 app

Thank you for sharing this example. I like the flexibility of including it as source in the app. For example, you can monitor the total crypto memory consumption via performance counter using your existing monitoring solution as this example shows; you can add logging of stacktraces and only for certain allocation if you need that to diagnose them problem; etc.

Since it is managed code the callbacks will have same safety problem, right @jkotas? I was thinking about similar solution @janvorli suggested - get the basic counters and hooks in c to make them safe. And find better way how to expose the details - either as event source, plain file writes as @janvorli suggested or whatever we agree to.

wfurt · 2024-04-30T16:31:13Z

one more note that the hooks are sensitive to the fact that it needs to be done before any allocations. That is not problem for the simple console app but it may be difficult for Kestrel where many things may happen IMHO before user code runs.

jkotas · 2024-04-30T17:35:42Z

Since it is managed code the callbacks will have same safety problem, right

Yes - if the app usage pattern results into OpenSSL calling malloc/free in places where it is not safe to run managed code.

That is not problem for the simple console app but it may be difficult for Kestrel where many things may happen IMHO before user code runs.

It can be problem even for a console app - if some code in the app (e.g. 3rd party library) ends initializing OpenSSL before we get a chance to initialize it.

filipnavara · 2024-05-08T17:31:31Z

Since it is managed code the callbacks will have same safety problem, right?

Correct. We didn't encounter any issue in production with the managed callbacks. That said, we definitely use only subset of the OpenSSL functionality (a couple of crypto ciphers, Kestrel, SslStream, HttpClient).

one more note that the hooks are sensitive to the fact that it needs to be done before any allocations. That is not problem for the simple console app but it may be difficult for Kestrel where many things may happen IMHO before user code runs.

We didn't have any problem injecting it early enough in the startup in the app that uses Kestrel.

We had several cases when users complained about large memory use. For than native it is quite difficult to figure out where the memory goes.

Initial results from our experiment show that we don't see any OpenSSL related leaks. We still observe working set growing over time even if it's unaccounted for in the GC heap or other .NET heaps (eg. compiled code).

Notably these graphs don't make the original issue entirely obvious. They just show that OpenSSL memory seems to stay at stable levels. Each instance has a HTTP endpoint for registration and then runs a large number of pollers that check for new email messages over variety of protocols (IMAP, Exchange Web Services, etc.) on numerous servers. We have a mechanism that rebalances all the polling registrations to a different instance so we should end up basically in the same "empty" state as on startup. When we do this and force a GC we see the managed heap going down but the working set never returns to the initial levels and it exceeds it by gigabyte(s).

rzikm · 2024-05-09T13:05:10Z

Sorry for off-topic

We had several cases when users complained about large memory use. For than native it is quite difficult to figure out where the memory goes.

@filipnavara recently I have seen high memory usage due to lots of memory buffers being cached by malloc internally, see #101552

…Debug

rzikm · 2024-05-13T07:20:55Z

src/libraries/Common/src/Interop/Unix/System.Security.Cryptography.Native/Interop.Crypto.cs

+                    Debug.Assert(size == entry[0].Size);
+                    lock (_allocations!)
+                    {
+                        _allocations!.Add(ptr);


If I understood the original comment from @jkotas, then this is still a potential problem

#101626 (comment)

Or does that hold only for the malloc/free calls and not GC-allocated memory?

It is not safe to use any managed code if this can be called from places like thread destructor. #101626 (comment)

right. I split the change now into two parts. The basic counters are implemented in native as @janvorli suggested. That also eliminates need to fiddle with the crypto initialization.

Now for the managed part. I made this whole section #if DEBUG for now to limit the exposure. While this limits use in production it would allow us to experiment more and perhaps hook it to test runs. I'm yet to see case where it actually fails. Since this can be set during run for some particular operation(s) it may avoid the cases we are concern about e.g. threads operations. AFAIK there is API to get loaded providers so we may for example check FIPS or 3rd party modules.

src/native/libs/System.Security.Cryptography.Native/openssl.c

src/libraries/System.Net.Http/src/System.Net.Http.csproj

src/libraries/Common/src/Interop/Unix/System.Security.Cryptography.Native/Interop.Crypto.cs

filipnavara · 2024-05-14T12:54:27Z

recently I have seen high memory usage due to lots of memory buffers being cached by malloc internally, see #101552

OT: Turns out this was immensely useful hint. We added tracking of the malloc metrics from mallinfo2. The data after a few days show that the memory usage growth is the malloc arena and that the size of the free list also grows. There may not be a memory leak after all, just a lot of reserved memory from the native allocator.

…Debug

jkotas · 2024-05-21T10:27:07Z

src/libraries/Common/src/Interop/Unix/System.Security.Cryptography.Native/Interop.Crypto.cs

+            Free = 3,
+        }
+
+        private static readonly unsafe nuint Offset = (nuint)sizeof(MemoryEntry);


Suggested change

private static readonly unsafe nuint Offset = (nuint)sizeof(MemoryEntry);

private static unsafe nuint Offset => (nuint)sizeof(MemoryEntry);

This does not need to be cached in a static

src/libraries/Common/src/Interop/Unix/System.Security.Cryptography.Native/Interop.Crypto.cs

…Debug

rzikm · 2024-05-23T07:24:30Z

...raries/Common/src/Interop/Unix/System.Security.Cryptography.Native/Interop.Initialization.cs

@@ -29,6 +29,7 @@ static OpenSsl()

    internal static partial class CryptoInitializer
    {
+


Suggested change

rzikm · 2024-05-23T07:24:38Z

...raries/Common/src/Interop/Unix/System.Security.Cryptography.Native/Interop.Initialization.cs

@@ -41,6 +42,7 @@ static CryptoInitializer()
                // these libraries will be unable to operate correctly.
                throw new InvalidOperationException();
            }
+


Suggested change

rzikm · 2024-05-23T07:28:33Z

src/libraries/Common/src/Interop/Unix/System.Security.Cryptography.Native/Interop.OpenSsl.cs

+#pragma warning disable CA1823
+        private static readonly bool MemoryDebug = GetMemoryDebug();
+#pragma warning restore CA1823


Since the field is not used anywhere, can we move the call to GetMemoryDebug to cctor?

rzikm · 2024-05-23T07:36:31Z

src/native/libs/System.Security.Cryptography.Native/openssl.c

+struct memoryEntry
+{
+    int size;
+    int line;
+    const char* file;
+};


did we consider the alignment of this on 32-bit platforms? As jkotas said in one of the comments, this has 12B on 32 bit platforms, so after adjusting the returned ptr will not be aligned on 8B boundary.

According to the docs, OpenSSL has OPENSSL_aligned_malloc, which it probably uses internally when it matters, but we should still verify that this does not cause any problems.

rzikm · 2024-05-23T07:37:12Z

src/native/libs/System.Security.Cryptography.Native/pal_ssl.c

@@ -173,7 +171,9 @@ static void DetectCiphersuiteConfiguration(void)
 #endif
 }

+


Suggested change

rzikm · 2024-05-23T07:37:19Z

src/native/libs/System.Security.Cryptography.Native/pal_ssl.c

 void CryptoNative_EnsureLibSslInitialized(void)
+


Suggested change

rzikm · 2024-05-23T07:37:25Z

src/native/libs/System.Security.Cryptography.Native/pal_ssl.h

@@ -130,6 +130,7 @@ typedef void (*SslCtxSetKeylogCallback)(const SSL* ssl, const char *line);
 /*
 Ensures that libssl is correctly initialized and ready to use.
 */
+


Suggested change

rzikm · 2024-05-23T07:39:36Z

src/libraries/Common/src/Interop/Unix/System.Security.Cryptography.Native/Interop.Crypto.cs

+        }
+
+        private static readonly unsafe nuint Offset = (nuint)sizeof(MemoryEntry);
+        private static HashSet<UIntPtr>? _allocations;


Using ConcurrentDictionary<UIntPtr, something> would allow us to remove some of the locks and reduce contention.

add hooks to debug OpenSSL memory

a163dbf

wfurt added the area-System.Net.Security label Apr 26, 2024

wfurt added this to the 9.0.0 milestone Apr 26, 2024

wfurt requested review from bartonjs, janvorli and rzikm April 26, 2024 20:49

wfurt self-assigned this Apr 26, 2024

This was referenced Apr 26, 2024

[WASI][AOT] LLVM ERROR: out of memory #101533

Open

[wasm][firefox] crit: OpenQA.Selenium.WebDriverException: Failed to decode response from marionette #101617

Open

wfurt added 2 commits April 27, 2024 02:05

opensslshim

9b5b3d3

1.x

ea6ac3f

build-analysis bot mentioned this pull request Apr 27, 2024

AbbreviatedMonthGenitiveNames_Get_ReturnsExpected_HybridGlobalization failing with string mismatch #101634

Closed

rzikm reviewed Apr 29, 2024

View reviewed changes

rzikm mentioned this pull request Apr 29, 2024

Removed unused sessions from SSL_CTX internal cache #101684

Merged

wfurt added 5 commits April 29, 2024 19:51

1.0.1

0338b06

Collections

8744c2d

android

401178f

Merge branch 'opensslDebug' of https://github.com/wfurt/runtime into …

803fc94

…opensslDebug

Merge branch 'main' of https://github.com/dotnet/runtime into openssl…

cd55d63

…Debug

vcsjones reviewed Apr 30, 2024

View reviewed changes

...on/src/Interop/Android/System.Security.Cryptography.Native.Android/Interop.Initialization.cs Outdated Show resolved Hide resolved

This was referenced Apr 30, 2024

System.IO.Net5Compat.Tests and System.IO.Tests suddenly exiting with error 137 #100558

Open

SIGKILL (OOM?) while running LibraryImportGenerator.Tests w/o actionable log messages or artifacts dotnet/dnceng#2496

Open

unsafe

638df97

This was referenced Apr 30, 2024

Cannot find 'arm64-v8a' device dotnet/dnceng#2284

Open

Dead lettering tests #101524

Open

rzikm reviewed Apr 30, 2024

View reviewed changes

...raries/Common/src/Interop/Unix/System.Security.Cryptography.Native/Interop.Initialization.cs Outdated Show resolved Hide resolved

wfurt added 3 commits May 11, 2024 04:11

feedback

d42f271

Merge branch 'main' of https://github.com/dotnet/runtime into openssl…

01f556f

…Debug

update

173827a

rzikm reviewed May 13, 2024

View reviewed changes

jkotas reviewed May 13, 2024

View reviewed changes

src/native/libs/System.Security.Cryptography.Native/openssl.c Outdated Show resolved Hide resolved

jkotas reviewed May 13, 2024

View reviewed changes

src/libraries/System.Net.Http/src/System.Net.Http.csproj Outdated Show resolved Hide resolved

jkotas reviewed May 13, 2024

View reviewed changes

src/libraries/Common/src/Interop/Unix/System.Security.Cryptography.Native/Interop.Crypto.cs Outdated Show resolved Hide resolved

jkotas reviewed May 13, 2024

View reviewed changes

src/libraries/Common/src/Interop/Unix/System.Security.Cryptography.Native/Interop.Crypto.cs Outdated Show resolved Hide resolved

wfurt added 2 commits May 21, 2024 00:43

Merge branch 'main' of https://github.com/dotnet/runtime into openssl…

a86c713

…Debug

feedback

c0838ea

jkotas reviewed May 21, 2024

View reviewed changes

src/libraries/Common/src/Interop/Unix/System.Security.Cryptography.Native/Interop.Crypto.cs Outdated Show resolved Hide resolved

wfurt added 2 commits May 23, 2024 04:02

feedback

60e933e

Merge branch 'main' of https://github.com/dotnet/runtime into openssl…

7c03ee3

…Debug

build-analysis bot mentioned this pull request May 23, 2024

System.Net.Http.Functional.Tests.HttpMetricsTest_Http11_Async_HttpMessageInvoker Test failure #96407

Open

rzikm reviewed May 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add hooks to debug OpenSSL memory #101626

add hooks to debug OpenSSL memory #101626

wfurt commented Apr 26, 2024

wfurt commented Apr 27, 2024

jkotas commented Apr 27, 2024

wfurt commented Apr 28, 2024

jkotas commented Apr 28, 2024

vcsjones commented Apr 29, 2024

vcsjones commented Apr 29, 2024

jkotas commented Apr 30, 2024

wfurt commented Apr 30, 2024

jkotas commented Apr 30, 2024

janvorli commented Apr 30, 2024

filipnavara commented Apr 30, 2024 •

edited

jkotas commented Apr 30, 2024

wfurt commented Apr 30, 2024

wfurt commented Apr 30, 2024

jkotas commented Apr 30, 2024

filipnavara commented May 8, 2024 •

edited

rzikm commented May 9, 2024 •

edited

rzikm May 13, 2024

jkotas May 13, 2024

wfurt May 13, 2024

filipnavara commented May 14, 2024

jkotas May 21, 2024

rzikm May 23, 2024

rzikm May 23, 2024

rzikm May 23, 2024

rzikm May 23, 2024

rzikm May 23, 2024

rzikm May 23, 2024

rzikm May 23, 2024

rzikm May 23, 2024

	private static readonly unsafe nuint Offset = (nuint)sizeof(MemoryEntry);
	private static unsafe nuint Offset => (nuint)sizeof(MemoryEntry);

		@@ -29,6 +29,7 @@ static OpenSsl()

		internal static partial class CryptoInitializer
		{

		@@ -173,7 +171,9 @@ static void DetectCiphersuiteConfiguration(void)
		#endif
		}

add hooks to debug OpenSSL memory #101626

Are you sure you want to change the base?

add hooks to debug OpenSSL memory #101626

Conversation

wfurt commented Apr 26, 2024

wfurt commented Apr 27, 2024

jkotas commented Apr 27, 2024

wfurt commented Apr 28, 2024

jkotas commented Apr 28, 2024

vcsjones commented Apr 29, 2024

vcsjones commented Apr 29, 2024

jkotas commented Apr 30, 2024

wfurt commented Apr 30, 2024

jkotas commented Apr 30, 2024

janvorli commented Apr 30, 2024

filipnavara commented Apr 30, 2024 • edited

jkotas commented Apr 30, 2024

wfurt commented Apr 30, 2024

wfurt commented Apr 30, 2024

jkotas commented Apr 30, 2024

filipnavara commented May 8, 2024 • edited

rzikm commented May 9, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

filipnavara commented May 14, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

filipnavara commented Apr 30, 2024 •

edited

filipnavara commented May 8, 2024 •

edited

rzikm commented May 9, 2024 •

edited