Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.5 Backlog: Use AutoResetEvent for backlog thread lingering #2008

Merged
merged 2 commits into from Feb 24, 2022

Conversation

NickCraver
Copy link
Collaborator

@NickCraver NickCraver commented Feb 24, 2022

This prevents so many threads from starting/stopping as we finish flushing a backlog. In short: starting a Thread is expensive, really expensive in the grand scheme of things. By ending the thread immediately when a backlog finished flushing, it had a decent change to start back up immediately to get the next item if a backlog was triggered by the lock transfer.

The act of finishing the backlog itself was happening inside the lock and exiting could take a moment causing an immediate re-queue of a follow-up item. This meant: lots of threads starting under parallel high contention load leading to higher CPU, more thread starts, and more allocations from thread starts.

Here's a memory view:

Before After
before after
before after

(note the 570k object[] instances - those are almost entirely thread starts)

Here are thread views:

Before After
before after

(note the scrollbar size)

This was initially discovered from testing some of the heavy parallel scenarios vs 1.2.6. For an example scenario with heavy ops load (pictured in profiles above):

Run 1.2.6 2.2.88 main (v2.5.x) After PR
1 82,426 ms 91,135 ms 99,262 ms 84,562 ms
2 82,335 ms 90,674 ms 100,462 ms 86,211 ms
3 82,059 ms 91,041 ms 99,968 ms 86,283 ms
4 82,435 ms 90,645 ms 102,968 ms 86,153 ms

So note that this is an improvement over 2.2.88 even with the new backlog functionality. We're not quite at the 1.2.6 levels of performance but we're a) closer, and b) a lot of things are more correct in 2.x, and there's a cost to that. I'm very happy with the wins here.

All of that is timings, but CPU usage for the same load is dramatically lower as well though this will depend on workload.

Example code:

using StackExchange.Redis;
using System.Diagnostics;

Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();

var taskList = new List<Task>();
var options = ConfigurationOptions.Parse("localhost");
var connection = ConnectionMultiplexer.Connect(options);
for (int i = 0; i < 10; i++)
{
	var i1 = i;
	var task = new Task(() => Run(i1, connection));
	task.Start();
	taskList.Add(task);
}

Task.WaitAll(taskList.ToArray());
stopwatch.Stop();
Console.WriteLine($"Done. {stopwatch.ElapsedMilliseconds} ms");

static void Run(int taskId, ConnectionMultiplexer connection)
{
	Console.WriteLine($"{taskId} Started");
	var database = connection.GetDatabase(0);

	for (int i = 0; i < 100000; i++)
	{
		database.StringSet(i.ToString(), i.ToString());
	}

	Console.WriteLine($"{taskId} Insert completed");

	for (int i = 0; i < 100000; i++)
	{
		var result = database.StringGet(i.ToString());
	}
	Console.WriteLine($"{taskId} Completed");
}

Anyway...yeah, this was a problem. An AutoReset event is the best way I can think of to solve it. Throwing this up for review, maybe we have an even better idea of how to solve it.

This prevents so many threads from starting/stopping as we finish flushing a backlog.
@NickCraver NickCraver marked this pull request as ready for review February 24, 2022 03:25
@NickCraver NickCraver changed the title v2.5 Backlog: Use Autoreset for backlog thread lingering v2.5 Backlog: Use AutoResetEvent for backlog thread lingering Feb 24, 2022
@NickCraver NickCraver merged commit d59d34e into main Feb 24, 2022
@NickCraver NickCraver deleted the craver/backlog-autoreset branch February 24, 2022 15:04
@@ -909,7 +913,7 @@ private async Task ProcessBacklogAsync()
// TODO: vNext handoff this backlog to another primary ("can handle everything") connection
// and remove any per-server commands. This means we need to track a bit of whether something
// was server-endpoint-specific in PrepareToPushMessageToBridge (was the server ref null or not)
await ProcessBridgeBacklogAsync(); // Needs handoff
await ProcessBridgeBacklogAsync().ConfigureAwait(false); // Needs handoff
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use .ForAwait() instead of .ConfigureAwait(false)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new backlog code added quite a few other awaited calls without .ForAwait(). Do those need to be updated too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't actually relevant either way - I was testing an allocation thing - can be removed entirely (we kick it off without a context in a thread with none). Can tidy as a follow-up though!

/// This allows us to keep the thread around for a full flush and prevent "feathering the throttle" trying
/// to flush it. In short, we don't start and stop so many threads with a bit of linger.
/// </summary>
private readonly AutoResetEvent _backlogAutoReset = new AutoResetEvent(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where this is disposed?

NickCraver added a commit that referenced this pull request Feb 24, 2022
A few tweaks to the changes #2008 for disposing and normalization.
@NickCraver NickCraver mentioned this pull request Feb 24, 2022
NickCraver added a commit that referenced this pull request Feb 24, 2022
A few tweaks to the changes #2008 for disposing and normalization.
NickCraver added a commit that referenced this pull request Feb 26, 2022
Alrighty, #2008 did something exceedingly stupid: it lingered *inside the lock*, jamming the connection and backlog during linger. Instead, this moves the thread preservation up in a much cleaner way and doesn't occupy the lock.

Also adds a SpinningDown status so we can see it proper in exceptions, always.
NickCraver added a commit that referenced this pull request Feb 26, 2022
In troubleshooting these 2 tests, I realized what's happening: a really dumb placement mistake in #2008. Now, instead of locking inside the damn lock, it loops outside a bit cleaner and higher up. Performance wins are the same but it's a lot sander and doesn't block both the backlog and the writer for another 5 seconds. Now only the thread lingers and it'll try to get the lock when running another pass, if it gets any in the next 5 seconds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants