Optimise resource consumption & drop support for EOL NETFX #506

xenolightning · 2024-01-24T02:41:23Z

Changes Summary:

Add ThrottledBackgroundMessageProcessor to prevent thread and SNAT exhaustion
Remove targeting unsupported NETFX versions. Only target 4.6.2+
Convert projects to SDK style
Remove old NETFX codebase - and pull into new projects
Remove Client Profile targets

Mindscape.Raygun4Net.NetCore.Common/ThrottledBackgroundMessageProcessor.cs

Mindscape.Raygun4Net.NetCore.Common/RaygunSettings.cs

Mindscape.Raygun4Net.NetCore.Tests/RaygunClientIntegrationTests.cs

Mindscape.Raygun4Net.NetCore.Common/RaygunClientBase.cs

Mindscape.Raygun4Net.NetCore.Common/ThrottledBackgroundMessageProcessor.cs

Obi-Dann · 2024-02-01T20:36:51Z

Mindscape.Raygun4Net.NetCore.Common/RaygunClientBase.cs

@@ -22,7 +23,7 @@ public abstract class RaygunClientBase
      // The default timeout is 100 seconds for the HttpClient, 
      Timeout = TimeSpan.FromSeconds(30)
    };


One other thing that can prevent SNAT and thread exhausting is setting MaxConnectionsPerServer to something less than an unlimited number by default. It can be achieved by passing HttpClientHandler

new HttpClient(new HttpClientHandler { MaxConnectionsPerServer = 50 }) { Timeout = TimeSpan.FromSeconds(30) };

We did initially look at this as a solution.

This issue with this is that is silently drops errors from being sent, and in our testing it doesn't always solve the SNAT issue 😿

Realistically SNAT only seems to be an issue on Azure due to the way it shares host networking and load balancing.

We actually tested this initially and found it didn't have any major contribution of reducing the number of ports taken, for example when set to 10, i was still hitting around ~2000 ports being taken in Azure and of 50k requests, only around ~4k would succeed, while leaving it defaulted it tried to use around ~3000 and ~49k requests succeeded.

PanosNB

Great work @xenolightning ! I am approving this as it looks good; just a few Qs for you

PanosNB · 2024-02-04T23:25:09Z

Mindscape.Raygun4Net.Core/RaygunSettings.cs

+    /// <summary>
+    /// The maximum queue size for background exceptions
+    /// </summary>
+    public int BackgroundMessageQueueMax { get; } = ushort.MaxValue;


No environment configuration possible for this one?

For this particular value, we want it to be configurable using the standard settings approach.

For the RAYGUN_MESSAGE_QUEUE_MAX variable - we wanted to give customers the ability to set the value, but it should be an exceptional case, and should not be configured using the settings object

PanosNB · 2024-02-04T23:28:02Z

Mindscape.Raygun4Net.NetCore.Common/RaygunSettings.cs


    public RaygunSettingsBase()
    {
      ApiEndpoint = new Uri(DefaultApiEndPoint);
+
+      // See if there's an overload defined in an environment variable, and set it accordingly
+      var messageQueueMaxValue = Environment.GetEnvironmentVariable(RaygunMessageQueueMaxVariable);


This continues from my comment above. So we check for environment initialization separately?

Yeah checking this code inline is a bit more difficult because of the try parse, so I put it in the ctor.

We could move all defaults into the ctor if you think

Mindscape.Raygun4Net.NetCore.Common/RaygunSettings.cs

Mindscape.Raygun4Net.NetCore.Common/ThrottledBackgroundMessageProcessor.cs

PanosNB · 2024-02-04T23:34:27Z

Mindscape.Raygun4Net.NetCore.Common/ThrottledBackgroundMessageProcessor.cs

+
+        for (var i = 0; i < numberOfWorkersToStart; i++)
+        {
+          _workerTasks.Add(CreateWorkerTask());


I see, so the thread pool starts empty and then this call on the first SendInBg will fill up all of them (and the first will pick up the task added in ln 49 above). Plus, on every SendInBg call, EnsureWorkers acts as a health-checking mechanism

Correct yep!

This is to optimise the usage of the threads. We shouldn't allocate threads if there's no work to do, so only create the workers on the first message to send.

From there, instead of managing a daemon worker, or a timer (which could fault or crash) we just queue on every message enqueued. This ensures we do the work to ensure there are workers to process something, when we have something to process.

This pairs with the ContinueWith statement - which ensure workers are created if a worker faults for whatever reason

PanosNB · 2024-02-04T23:35:30Z

Mindscape.Raygun4Net.NetCore.Common/ThrottledBackgroundMessageProcessor.cs

+      });
+
+      // When a worker finishes ensure that a new one is is created if required
+      workerTask.ContinueWith(x => { EnsureWorkers(); });


Should this execute in a finally block and all the above in try?

Ignore, I now see this is still running on the customer thread and you are declaring a follow-up action, to invoke EnsureWorkers

Yep, spot on.

This is a continuation that happens after the worker task has completed/faulted. It allows us to ensure that there are workers to process messages if for whatever reason they fault

PanosNB · 2024-02-04T23:38:37Z

Mindscape.Raygun4Net.NetCore.Common/ThrottledBackgroundMessageProcessor.cs

+      }
+      catch (Exception ex)
+      {
+        Debug.WriteLine("Exception in queue worker {0}: {1}", Task.CurrentId, ex);


Hopefully, the passed tasks have their own try-catch logic, so something application-specific, like network errors, do not bubble up here

Yeah - so we mainly call Send() here - which does have it's own internal exception handling. This is for the odd case that something unexpected happens, and we still have a trace log of it, if debug logging is enabled

PanosNB · 2024-02-04T23:40:23Z

Mindscape.Raygun4Net.WebApi/RaygunWebApiClient.cs

+      SendInBackground(() => raygunMessage);
+    }
+
+    public void SendInBackground(Func<RaygunMessage> raygunMessage)


I don't wanna break the spec but in the future, we should consider making this a boolean and make it return the TryAdd result

We could make this private, it's not in the public surface yet.

Alternatively we can make this particular overload return a boolean.

I followed the existing convention for the structure here, we could revise the api surface in a follow up version

PanosNB · 2024-02-04T23:42:28Z

Mindscape.Raygun4Net4/ThrottledBackgroundMessageProcessor.cs

@@ -0,0 +1,130 @@
+#nullable enable


Again this file. I see that certain things must be duplicated :-(

Yeah sadly, the code here is ever so slightly different. There's no async await support in the NETFX code

Need plenty of tests

Maybe fix later

The forces a variable capture in the lambda expression/generated class. This massively reduces the memory used when queuing the messages. RaygunMessage stores everything as strings - which is fine for short lived objects, but when there's a massive queue of items, the extra string memory is less desirable.

Convert all assembly info into csproj directives Unlink global assembly info

… for packing.

…s.Forms

phillip-haydon · 2024-02-07T21:14:10Z

Mindscape.Raygun4Net4/Breadcrumbs/RaygunBreadcrumbs.cs

@@ -2,6 +2,7 @@
 using System.Collections;
 using System.Collections.Generic;
 using System.Diagnostics;
+using System.IO;


👀 we should do a cleanup of usings in next pr.

phillip-haydon · 2024-02-07T21:16:34Z

Mindscape.Raygun4Net.Tests/RaygunBreadcrumbsTests.cs

@@ -27,7 +28,7 @@ public void TearDown()

    [Test]
    public void Set_ClassName_MethodName_And_LineNumber_Automatically_If_Configured()
-    {
+    {      


phillip-haydon · 2024-02-07T21:17:04Z

Mindscape.Raygun4Net.Tests/Model/FakeRaygunClient.cs

+    
+    public IEnumerable<Exception> ExposeStripWrapperExceptions(Exception exception)
    {
-      return CreateWebClient();
+      return StripWrapperExceptions(exception);
    }
-
+    


Whitespace giving me anxiety...

phillip-haydon

Looking good.

Obi-Dann · 2024-02-08T22:41:00Z

Mindscape.Raygun4Net.NetCore.Common/RaygunClientBase.cs

@@ -35,6 +36,7 @@ public abstract class RaygunClientBase
    private bool _handlingRecursiveGrouping;

    protected readonly RaygunSettingsBase _settings;
+    private readonly ThrottledBackgroundMessageProcessor _backgroundMessageProcessor;


Hmm, RaygunAspNetClient is still created per request, it means that users using RaygunAspNet won't issues with SNAT ports fixed because there will be multiple instances of the Clients?
Also, it's Disposable but never Disposed?

Using the middleware, you are correct, because the Client only controls it's own limitations, creating other clients gets around the throttling. This is a larger design choice we want to resolve.

Ideally the middleware shouldn't be creating a new client for each request - but it adds complexity.

Alternatively, the message queue could be a singleton, however that adds another layer of complexity when many clients could be queuing messages.

Calling dispose in a deconstructor would certainly help, currently when the client is GC'd the message queue should also be collected alongside it

Obi-Dann · 2024-02-08T22:43:17Z

Mindscape.Raygun4Net.NetCore.Common/RaygunClientBase.cs

-          await StripAndSend(exception, tags, userCustomData, userInfo);
-        });
-
+          if (!_backgroundMessageProcessor.Enqueue(async () => await BuildMessage(ex, tags, userCustomData, userInfo)))


This can result in an issue depending on when BuildMessage runs. In AspNetCore, BuildMessage attempts to read data from HttpContext. If BuildMessage is executed after HttpContext is disposed, it will either fail or won't get necessary data.
Shouldn't BuildMessage be called before enqueue and Enqueue should receive the built message rather than a Func<RaygunMessage>?

The HttpContext is accessed and the data captured before BuildMessage is called, so there shouldn't be a chance of accessing a disposed context.

It does this only in the AspNetCore specific version of the client.

There's also issues with using ThreadLocal vs AsyncLocal - as raised in another GH issue - which will also cause some problems capturing the correct information.

Using the middleware, the clients that are created, never actually use this overload, and aren't using the background message queue either - which is something we should also fix.

If you don't mind bearing with us as we uncover some of these interesting code nuggets from the past, and find resolutions to them.

Cheers,
Sean

Obi-Dann · 2024-02-08T22:44:13Z

Mindscape.Raygun4Net.WebApi/RaygunWebApiClient.cs

+    {
+      foreach (var e in StripWrapperExceptions(exception))
+      {
+        SendInBackground(() => BuildMessage(e, tags, userCustomData, currentTime));


Potentially similar issue here if BuildMessage relies on resources fron the current thread/context
https://github.com/MindscapeHQ/raygun4net/pull/506/files#r1483673952

xenolightning requested review from phillip-haydon and PanosNB January 24, 2024 02:41