Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise resource consumption & drop support for EOL NETFX #506

Merged
merged 36 commits into from
Feb 7, 2024

Conversation

xenolightning
Copy link
Contributor

@xenolightning xenolightning commented Jan 24, 2024

Changes Summary:

  • Add ThrottledBackgroundMessageProcessor to prevent thread and SNAT exhaustion
  • Remove targeting unsupported NETFX versions. Only target 4.6.2+
  • Convert projects to SDK style
  • Remove old NETFX codebase - and pull into new projects
  • Remove Client Profile targets

@xenolightning xenolightning changed the title WIP - Use a background queue to reduce resource consumption Optimise resource consumption & drop support for EOL NETFX Jan 31, 2024
@@ -22,7 +23,7 @@ public abstract class RaygunClientBase
// The default timeout is 100 seconds for the HttpClient,
Timeout = TimeSpan.FromSeconds(30)
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other thing that can prevent SNAT and thread exhausting is setting MaxConnectionsPerServer to something less than an unlimited number by default. It can be achieved by passing HttpClientHandler

new HttpClient(new HttpClientHandler
  {
      MaxConnectionsPerServer = 50
  }) {
  Timeout = TimeSpan.FromSeconds(30)
  };

Copy link
Contributor Author

@xenolightning xenolightning Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did initially look at this as a solution.

This issue with this is that is silently drops errors from being sent, and in our testing it doesn't always solve the SNAT issue 😿

Realistically SNAT only seems to be an issue on Azure due to the way it shares host networking and load balancing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually tested this initially and found it didn't have any major contribution of reducing the number of ports taken, for example when set to 10, i was still hitting around ~2000 ports being taken in Azure and of 50k requests, only around ~4k would succeed, while leaving it defaulted it tried to use around ~3000 and ~49k requests succeeded.

PanosNB
PanosNB previously approved these changes Feb 4, 2024
Copy link

@PanosNB PanosNB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @xenolightning ! I am approving this as it looks good; just a few Qs for you

/// <summary>
/// The maximum queue size for background exceptions
/// </summary>
public int BackgroundMessageQueueMax { get; } = ushort.MaxValue;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No environment configuration possible for this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this particular value, we want it to be configurable using the standard settings approach.

For the RAYGUN_MESSAGE_QUEUE_MAX variable - we wanted to give customers the ability to set the value, but it should be an exceptional case, and should not be configured using the settings object


public RaygunSettingsBase()
{
ApiEndpoint = new Uri(DefaultApiEndPoint);

// See if there's an overload defined in an environment variable, and set it accordingly
var messageQueueMaxValue = Environment.GetEnvironmentVariable(RaygunMessageQueueMaxVariable);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This continues from my comment above. So we check for environment initialization separately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah checking this code inline is a bit more difficult because of the try parse, so I put it in the ctor.

We could move all defaults into the ctor if you think


for (var i = 0; i < numberOfWorkersToStart; i++)
{
_workerTasks.Add(CreateWorkerTask());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so the thread pool starts empty and then this call on the first SendInBg will fill up all of them (and the first will pick up the task added in ln 49 above). Plus, on every SendInBg call, EnsureWorkers acts as a health-checking mechanism

Copy link
Contributor Author

@xenolightning xenolightning Feb 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct yep!

This is to optimise the usage of the threads. We shouldn't allocate threads if there's no work to do, so only create the workers on the first message to send.

From there, instead of managing a daemon worker, or a timer (which could fault or crash) we just queue on every message enqueued. This ensures we do the work to ensure there are workers to process something, when we have something to process.

This pairs with the ContinueWith statement - which ensure workers are created if a worker faults for whatever reason

});

// When a worker finishes ensure that a new one is is created if required
workerTask.ContinueWith(x => { EnsureWorkers(); });
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this execute in a finally block and all the above in try?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore, I now see this is still running on the customer thread and you are declaring a follow-up action, to invoke EnsureWorkers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, spot on.

This is a continuation that happens after the worker task has completed/faulted. It allows us to ensure that there are workers to process messages if for whatever reason they fault

}
catch (Exception ex)
{
Debug.WriteLine("Exception in queue worker {0}: {1}", Task.CurrentId, ex);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully, the passed tasks have their own try-catch logic, so something application-specific, like network errors, do not bubble up here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - so we mainly call Send() here - which does have it's own internal exception handling. This is for the odd case that something unexpected happens, and we still have a trace log of it, if debug logging is enabled

SendInBackground(() => raygunMessage);
}

public void SendInBackground(Func<RaygunMessage> raygunMessage)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't wanna break the spec but in the future, we should consider making this a boolean and make it return the TryAdd result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make this private, it's not in the public surface yet.

Alternatively we can make this particular overload return a boolean.

I followed the existing convention for the structure here, we could revise the api surface in a follow up version

@@ -0,0 +1,130 @@
#nullable enable
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again this file. I see that certain things must be duplicated :-(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sadly, the code here is ever so slightly different. There's no async await support in the NETFX code

Sean and others added 20 commits February 8, 2024 09:36
The forces a variable capture in the lambda expression/generated class. This massively reduces the memory used when queuing the messages.

RaygunMessage stores everything as strings - which is fine for short lived objects, but when there's a massive queue of items, the extra string memory is less desirable.
Convert all assembly info into csproj directives

Unlink global assembly info
@@ -2,6 +2,7 @@
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀 we should do a cleanup of usings in next pr.

@@ -27,7 +28,7 @@ public void TearDown()

[Test]
public void Set_ClassName_MethodName_And_LineNumber_Automatically_If_Configured()
{
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👎

Comment on lines +22 to +27
public IEnumerable<Exception> ExposeStripWrapperExceptions(Exception exception)
{
return CreateWebClient();
return StripWrapperExceptions(exception);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitespace giving me anxiety...

Copy link
Contributor

@phillip-haydon phillip-haydon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.

@xenolightning xenolightning merged commit ef72494 into master Feb 7, 2024
@xenolightning xenolightning deleted the sean/background-message-queuing branch February 7, 2024 21:24
@@ -35,6 +36,7 @@ public abstract class RaygunClientBase
private bool _handlingRecursiveGrouping;

protected readonly RaygunSettingsBase _settings;
private readonly ThrottledBackgroundMessageProcessor _backgroundMessageProcessor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, RaygunAspNetClient is still created per request, it means that users using RaygunAspNet won't issues with SNAT ports fixed because there will be multiple instances of the Clients?
Also, it's Disposable but never Disposed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the middleware, you are correct, because the Client only controls it's own limitations, creating other clients gets around the throttling. This is a larger design choice we want to resolve.

Ideally the middleware shouldn't be creating a new client for each request - but it adds complexity.

Alternatively, the message queue could be a singleton, however that adds another layer of complexity when many clients could be queuing messages.

Calling dispose in a deconstructor would certainly help, currently when the client is GC'd the message queue should also be collected alongside it

await StripAndSend(exception, tags, userCustomData, userInfo);
});

if (!_backgroundMessageProcessor.Enqueue(async () => await BuildMessage(ex, tags, userCustomData, userInfo)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can result in an issue depending on when BuildMessage runs. In AspNetCore, BuildMessage attempts to read data from HttpContext. If BuildMessage is executed after HttpContext is disposed, it will either fail or won't get necessary data.
Shouldn't BuildMessage be called before enqueue and Enqueue should receive the built message rather than a Func<RaygunMessage>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HttpContext is accessed and the data captured before BuildMessage is called, so there shouldn't be a chance of accessing a disposed context.

It does this only in the AspNetCore specific version of the client.

There's also issues with using ThreadLocal vs AsyncLocal - as raised in another GH issue - which will also cause some problems capturing the correct information.

Using the middleware, the clients that are created, never actually use this overload, and aren't using the background message queue either - which is something we should also fix.

If you don't mind bearing with us as we uncover some of these interesting code nuggets from the past, and find resolutions to them.

Cheers,
Sean

{
foreach (var e in StripWrapperExceptions(exception))
{
SendInBackground(() => BuildMessage(e, tags, userCustomData, currentTime));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially similar issue here if BuildMessage relies on resources fron the current thread/context
https://github.com/MindscapeHQ/raygun4net/pull/506/files#r1483673952

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants