-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise resource consumption & drop support for EOL NETFX #506
Conversation
Mindscape.Raygun4Net.NetCore.Common/ThrottledBackgroundMessageProcessor.cs
Show resolved
Hide resolved
Mindscape.Raygun4Net.NetCore.Common/ThrottledBackgroundMessageProcessor.cs
Outdated
Show resolved
Hide resolved
Mindscape.Raygun4Net.NetCore.Common/ThrottledBackgroundMessageProcessor.cs
Outdated
Show resolved
Hide resolved
Mindscape.Raygun4Net.NetCore.Common/ThrottledBackgroundMessageProcessor.cs
Outdated
Show resolved
Hide resolved
Mindscape.Raygun4Net.NetCore.Tests/RaygunClientIntegrationTests.cs
Outdated
Show resolved
Hide resolved
Mindscape.Raygun4Net.NetCore.Common/ThrottledBackgroundMessageProcessor.cs
Outdated
Show resolved
Hide resolved
db2dac7
to
8ac3d03
Compare
@@ -22,7 +23,7 @@ public abstract class RaygunClientBase | |||
// The default timeout is 100 seconds for the HttpClient, | |||
Timeout = TimeSpan.FromSeconds(30) | |||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other thing that can prevent SNAT and thread exhausting is setting MaxConnectionsPerServer
to something less than an unlimited number by default. It can be achieved by passing HttpClientHandler
new HttpClient(new HttpClientHandler
{
MaxConnectionsPerServer = 50
}) {
Timeout = TimeSpan.FromSeconds(30)
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did initially look at this as a solution.
This issue with this is that is silently drops errors from being sent, and in our testing it doesn't always solve the SNAT issue 😿
Realistically SNAT only seems to be an issue on Azure due to the way it shares host networking and load balancing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually tested this initially and found it didn't have any major contribution of reducing the number of ports taken, for example when set to 10, i was still hitting around ~2000 ports being taken in Azure and of 50k requests, only around ~4k would succeed, while leaving it defaulted it tried to use around ~3000 and ~49k requests succeeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @xenolightning ! I am approving this as it looks good; just a few Qs for you
/// <summary> | ||
/// The maximum queue size for background exceptions | ||
/// </summary> | ||
public int BackgroundMessageQueueMax { get; } = ushort.MaxValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No environment configuration possible for this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this particular value, we want it to be configurable using the standard settings approach.
For the RAYGUN_MESSAGE_QUEUE_MAX variable - we wanted to give customers the ability to set the value, but it should be an exceptional case, and should not be configured using the settings object
|
||
public RaygunSettingsBase() | ||
{ | ||
ApiEndpoint = new Uri(DefaultApiEndPoint); | ||
|
||
// See if there's an overload defined in an environment variable, and set it accordingly | ||
var messageQueueMaxValue = Environment.GetEnvironmentVariable(RaygunMessageQueueMaxVariable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This continues from my comment above. So we check for environment initialization separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah checking this code inline is a bit more difficult because of the try parse, so I put it in the ctor.
We could move all defaults into the ctor if you think
Mindscape.Raygun4Net.NetCore.Common/ThrottledBackgroundMessageProcessor.cs
Show resolved
Hide resolved
|
||
for (var i = 0; i < numberOfWorkersToStart; i++) | ||
{ | ||
_workerTasks.Add(CreateWorkerTask()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so the thread pool starts empty and then this call on the first SendInBg will fill up all of them (and the first will pick up the task added in ln 49 above). Plus, on every SendInBg call, EnsureWorkers acts as a health-checking mechanism
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct yep!
This is to optimise the usage of the threads. We shouldn't allocate threads if there's no work to do, so only create the workers on the first message to send.
From there, instead of managing a daemon worker, or a timer (which could fault or crash) we just queue on every message enqueued. This ensures we do the work to ensure there are workers to process something, when we have something to process.
This pairs with the ContinueWith statement - which ensure workers are created if a worker faults for whatever reason
}); | ||
|
||
// When a worker finishes ensure that a new one is is created if required | ||
workerTask.ContinueWith(x => { EnsureWorkers(); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this execute in a finally block and all the above in try?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore, I now see this is still running on the customer thread and you are declaring a follow-up action, to invoke EnsureWorkers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, spot on.
This is a continuation that happens after the worker task has completed/faulted. It allows us to ensure that there are workers to process messages if for whatever reason they fault
} | ||
catch (Exception ex) | ||
{ | ||
Debug.WriteLine("Exception in queue worker {0}: {1}", Task.CurrentId, ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully, the passed tasks have their own try-catch logic, so something application-specific, like network errors, do not bubble up here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - so we mainly call Send()
here - which does have it's own internal exception handling. This is for the odd case that something unexpected happens, and we still have a trace log of it, if debug logging is enabled
SendInBackground(() => raygunMessage); | ||
} | ||
|
||
public void SendInBackground(Func<RaygunMessage> raygunMessage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't wanna break the spec but in the future, we should consider making this a boolean and make it return the TryAdd result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make this private, it's not in the public surface yet.
Alternatively we can make this particular overload return a boolean.
I followed the existing convention for the structure here, we could revise the api surface in a follow up version
@@ -0,0 +1,130 @@ | |||
#nullable enable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again this file. I see that certain things must be duplicated :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sadly, the code here is ever so slightly different. There's no async await support in the NETFX code
Need plenty of tests
Maybe fix later
The forces a variable capture in the lambda expression/generated class. This massively reduces the memory used when queuing the messages. RaygunMessage stores everything as strings - which is fine for short lived objects, but when there's a massive queue of items, the extra string memory is less desirable.
Convert all assembly info into csproj directives Unlink global assembly info
4212adb
to
7a9146c
Compare
@@ -2,6 +2,7 @@ | |||
using System.Collections; | |||
using System.Collections.Generic; | |||
using System.Diagnostics; | |||
using System.IO; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👀 we should do a cleanup of usings in next pr.
@@ -27,7 +28,7 @@ public void TearDown() | |||
|
|||
[Test] | |||
public void Set_ClassName_MethodName_And_LineNumber_Automatically_If_Configured() | |||
{ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👎
public IEnumerable<Exception> ExposeStripWrapperExceptions(Exception exception) | ||
{ | ||
return CreateWebClient(); | ||
return StripWrapperExceptions(exception); | ||
} | ||
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whitespace giving me anxiety...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good.
@@ -35,6 +36,7 @@ public abstract class RaygunClientBase | |||
private bool _handlingRecursiveGrouping; | |||
|
|||
protected readonly RaygunSettingsBase _settings; | |||
private readonly ThrottledBackgroundMessageProcessor _backgroundMessageProcessor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, RaygunAspNetClient is still created per request, it means that users using RaygunAspNet won't issues with SNAT ports fixed because there will be multiple instances of the Clients?
Also, it's Disposable but never Disposed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the middleware, you are correct, because the Client only controls it's own limitations, creating other clients gets around the throttling. This is a larger design choice we want to resolve.
Ideally the middleware shouldn't be creating a new client for each request - but it adds complexity.
Alternatively, the message queue could be a singleton, however that adds another layer of complexity when many clients could be queuing messages.
Calling dispose in a deconstructor would certainly help, currently when the client is GC'd the message queue should also be collected alongside it
await StripAndSend(exception, tags, userCustomData, userInfo); | ||
}); | ||
|
||
if (!_backgroundMessageProcessor.Enqueue(async () => await BuildMessage(ex, tags, userCustomData, userInfo))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can result in an issue depending on when BuildMessage runs. In AspNetCore, BuildMessage attempts to read data from HttpContext. If BuildMessage
is executed after HttpContext is disposed, it will either fail or won't get necessary data.
Shouldn't BuildMessage
be called before enqueue and Enqueue
should receive the built message rather than a Func<RaygunMessage>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HttpContext is accessed and the data captured before BuildMessage
is called, so there shouldn't be a chance of accessing a disposed context.
It does this only in the AspNetCore specific version of the client.
There's also issues with using ThreadLocal vs AsyncLocal - as raised in another GH issue - which will also cause some problems capturing the correct information.
Using the middleware, the clients that are created, never actually use this overload, and aren't using the background message queue either - which is something we should also fix.
If you don't mind bearing with us as we uncover some of these interesting code nuggets from the past, and find resolutions to them.
Cheers,
Sean
{ | ||
foreach (var e in StripWrapperExceptions(exception)) | ||
{ | ||
SendInBackground(() => BuildMessage(e, tags, userCustomData, currentTime)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially similar issue here if BuildMessage
relies on resources fron the current thread/context
https://github.com/MindscapeHQ/raygun4net/pull/506/files#r1483673952
Changes Summary: