-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use AutoResetEvent as oppose to ManualResetEventSlim #8575
Use AutoResetEvent as oppose to ManualResetEventSlim #8575
Conversation
@stephentoub does this surprise you? |
ManualResetEventSlim spins by default prior to falling back to a kernel-based blocking wait. AutoResetEvent is just a thin wrapper for a kernel sync primitive, so there's no spinning before it in managed world. If you want to disable that spinning with MRES, just pass 0 as the second argument to its ctor, e.g. make these https://github.com/dotnet/msbuild/pull/8575/files#diff-4e77c665c92becc2499e6b1ee5e7b673df485cba054b8c0ce31cd4974c9bbe70L1266-L1268 use |
So i guess the spinning here implies that the lock is typically held for significant period of time, so spinning is just waste. @rokonec Maybe try the other ctor on MRES? I wonder if, separate to this change, there is scope to reduce the time spent holding the lock. |
@stephentoub @danmoseley For context this isn't a "lock" per se but rather a sort of throttle. This is an implementation of async logging where writers fill a buffer and 1 consumer drains the queue on a long-running background thread. Once the buffer fills, additional writers need to wait for there to be capacity. My recommendation to @rokonec would be to use a bounded Channel since it seems to fit this scenario much better (previously this code used an Something along the line of // Initialization
var channelOptions = new BoundedChannelOptions(maxMessagesToBuffer)
{
// If full, block
FullMode = BoundedChannelFullMode.Wait,
// Enable single-reader optimizations
SingleReader = true,
};
_processLogMessageChannel = Channel.CreateBounded<LoggingEvent>(channelOptions);
_processingTask = Task.Run(
async () =>
{
while (await _processLogMessageChannel.Reader.WaitToReadAsync().ConfigureAwait(false))
{
while (_processLogMessageChannel.Reader.TryRead(out LoggingEvent? loggingEvent))
{
// Do the thing
}
}
});
// Enqueuing
public async Task EnqueueAsync(LoggingEvent loggingEvent)
{
ValueTask writeTask = _processLogMessageChannel.Writer.WriteAsync(loggingEvent);
if (!writeTask.IsCompleted)
{
// If full, block until it's written.
await writeTask;
}
}
// Draining
public async Task DrainAsync()
{
_processLogMessageChannel.Writer.Complete();
await _processLogMessageChannel.Reader.Completion;
await _processingTask;
} |
@stephentoub with .net core MRES I can't repro this issue. Unfortunately Windows MRES implementation is quite different. See: https://referencesource.microsoft.com/#mscorlib/system/threading/ManualResetEventSlim.cs,542 weird combination logic spinning, yielding and sleeping.
|
@dfederm as for Channel it is interesting idea, I will play with it when I have some time, however, since msbuild is quite low and need to support source building we are trying to avoid external decencies if possible. Additionally any external dependencies can cause assembly hell in VS and often has to be handled by assembly redirects. The purpose of this PR is to fix the bug, and the code provided do so. Since this will have to go into servicing VS 17.4 I am trying to make minimal most safe changes. I am open for advices how to best implement that |
For a servicing fix, seems reasonable then if this small change mitigates the issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It looks like you don't need to call _enqueueEvent.Reset()
on line 1297 anymore.
Also, is it really expected to be logging at such a high rate that a 200k-item queue fills up?
@ladipro asked:
Using /bl trigger "diag" logging verbosity which causes lot of messages flowing through system. For orchard I started to see this only for |
should diag verbosity cause a larger queue? |
This is a combination of a few things, which all make sense individually:
When combined, that can mean that on a big machine + a big repo + But I that even with all of that considered together the overall thing still makes sense, with the backpressure that we can now provide from the logger back to the event production.
I would rather the change queue size for all configurations than tweak it by logger verbosity personally. The existing values aren't written in stone but we did see OOM failures caused by the prior (broken) bounding approach, and I'd like any future fixes to that to be easy to diagnose + tweak. |
6533aa5
to
f0f039c
Compare
f0f039c
to
2eac915
Compare
From @rokonec's screenshot it looks like the average cost of one event is ~60 bytes. Even if it was 100 I'd say the queue should be much larger than 200k. In massive builds spending hundreds of megs is totally justifiable if the alternative is to block the execution. Maybe with the exception of running in a 32-bit process where we want to stay frugal. |
Hello! I noticed that you're targeting one of our servicing branches. Please consider updating the version. |
Hello! I noticed that you're targeting one of our servicing branches. Please consider updating the version. |
Hello! I noticed that you're targeting one of our servicing branches. Please consider updating the version. |
* Concurrency bug fix - BuildManager instances acquire its own BuildTelemetry instance (#8561) * BuildManager instances acquire its own BuildTelemetry instance (#8444) Fixes https://devdiv.visualstudio.com/DevDiv/_workitems/edit/1708215 Context In VS there are multiple instances of BuildManager called asynchronously. For DTB and normal build and maybe other which I have not identified yet. Changes Made BuildManager instances acquire its own BuildTelemetry instance as oppose to sharing single BuildTelemetry instance in non thread safe manner. Testing Locally # Conflicts: # src/Build/BackEnd/Client/MSBuildClient.cs - resolved with minimal and safe approach * Bumping version * Turn off static graph restore. (#8498) Our CI builds fails because of bug NuGet/Home#12373. It is fixed in NuGet/NuGet.Client#5010. We are waiting for it to flow to CI machines. Meanwhile this PR applies a workaround. Note: This PR needs to be reverted once it happens. --------- Co-authored-by: AR-May <67507805+AR-May@users.noreply.github.com> * Use AutoResetEvent as oppose to ManualResetEventSlim (#8575) Summary Customer, mainly internal like XStore, with huge repos, using msbuild /graph /bl on powerful development and build computers, might experience 15x plus regression in evaluation time. It has been identified as performance bug in our logging event pub/sub mechanism. When ingest queue reaches its bound, at .net472 ManualResetEventSlim causes way too many thread.Yields flooding the system with thread context switches. This hypothesis has been verified by PerfMon perfcounter System.ContextSwitches. Alhougt counterintuitive, AutoResetEvent , ManualResetEvent or even SpinLocking produced better behavior and with those the issue no longer reproduce. Customer Impact In case of XStore it was about 7 minutes in build time. Regression? Yes, introduced in VS 17.4. Testing Manual validation by @rokonec and automated tests. Using local repro to verify changes has fixed it. Risk Low Note It effect only VS MSBuild.exe. In dotnet build ManualResetEventSlim works better. --------- Co-authored-by: Roman Konecny <rokonecn@microsoft.com> Co-authored-by: AR-May <67507805+AR-May@users.noreply.github.com>
Summary
Customer, mainly internal like XStore, with huge repos, using
msbuild /graph /bl
on powerful development and build computers, might experience 15x plus regression in evaluation time.It has been identified as performance bug in our logging event pub/sub mechanism. When ingest queue reaches its bound, at .net472
ManualResetEventSlim
causes way too many thread.Yields flooding the system with thread context switches.This hypothesis has been verified by PerfMon perfcounter
System.ContextSwitches
.Alhougt counterintuitive,
AutoResetEvent
,ManualResetEvent
or evenSpinLocking
produced better behavior and with those the issue no longer reproduce.Customer Impact
In case of XStore it was about 7 minutes in build time.
Regression?
Yes, introduced in VS 17.4.
Testing
Manual validation by @rokonec and automated tests. Using local repro to verify changes has fixed it.
Risk
Low
Note
It effect only VS MSBuild.exe. In dotnet build
ManualResetEventSlim
works better.