v2.5 Backlog: Use AutoResetEvent for backlog thread lingering #2008

NickCraver · 2022-02-24T03:18:43Z

This prevents so many threads from starting/stopping as we finish flushing a backlog. In short: starting a Thread is expensive, really expensive in the grand scheme of things. By ending the thread immediately when a backlog finished flushing, it had a decent change to start back up immediately to get the next item if a backlog was triggered by the lock transfer.

The act of finishing the backlog itself was happening inside the lock and exiting could take a moment causing an immediate re-queue of a follow-up item. This meant: lots of threads starting under parallel high contention load leading to higher CPU, more thread starts, and more allocations from thread starts.

Here's a memory view:

Before	After

(note the 570k object[] instances - those are almost entirely thread starts)

Here are thread views:

Before	After

(note the scrollbar size)

This was initially discovered from testing some of the heavy parallel scenarios vs 1.2.6. For an example scenario with heavy ops load (pictured in profiles above):

Run	1.2.6	2.2.88	`main` (v2.5.x)	After PR
1	82,426 ms	91,135 ms	99,262 ms	84,562 ms
2	82,335 ms	90,674 ms	100,462 ms	86,211 ms
3	82,059 ms	91,041 ms	99,968 ms	86,283 ms
4	82,435 ms	90,645 ms	102,968 ms	86,153 ms

So note that this is an improvement over 2.2.88 even with the new backlog functionality. We're not quite at the 1.2.6 levels of performance but we're a) closer, and b) a lot of things are more correct in 2.x, and there's a cost to that. I'm very happy with the wins here.

All of that is timings, but CPU usage for the same load is dramatically lower as well though this will depend on workload.

Example code:

using StackExchange.Redis;
using System.Diagnostics;

Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();

var taskList = new List<Task>();
var options = ConfigurationOptions.Parse("localhost");
var connection = ConnectionMultiplexer.Connect(options);
for (int i = 0; i < 10; i++)
{
	var i1 = i;
	var task = new Task(() => Run(i1, connection));
	task.Start();
	taskList.Add(task);
}

Task.WaitAll(taskList.ToArray());
stopwatch.Stop();
Console.WriteLine($"Done. {stopwatch.ElapsedMilliseconds} ms");

static void Run(int taskId, ConnectionMultiplexer connection)
{
	Console.WriteLine($"{taskId} Started");
	var database = connection.GetDatabase(0);

	for (int i = 0; i < 100000; i++)
	{
		database.StringSet(i.ToString(), i.ToString());
	}

	Console.WriteLine($"{taskId} Insert completed");

	for (int i = 0; i < 100000; i++)
	{
		var result = database.StringGet(i.ToString());
	}
	Console.WriteLine($"{taskId} Completed");
}

Anyway...yeah, this was a problem. An AutoReset event is the best way I can think of to solve it. Throwing this up for review, maybe we have an even better idea of how to solve it.

This prevents so many threads from starting/stopping as we finish flushing a backlog.

philon-msft · 2022-02-24T15:09:33Z

src/StackExchange.Redis/PhysicalBridge.cs

@@ -909,7 +913,7 @@ private async Task ProcessBacklogAsync()
                    // TODO: vNext handoff this backlog to another primary ("can handle everything") connection
                    // and remove any per-server commands. This means we need to track a bit of whether something
                    // was server-endpoint-specific in PrepareToPushMessageToBridge (was the server ref null or not)
-                    await ProcessBridgeBacklogAsync(); // Needs handoff
+                    await ProcessBridgeBacklogAsync().ConfigureAwait(false); // Needs handoff


Use .ForAwait() instead of .ConfigureAwait(false)?

The new backlog code added quite a few other awaited calls without .ForAwait(). Do those need to be updated too?

This isn't actually relevant either way - I was testing an allocation thing - can be removed entirely (we kick it off without a context in a thread with none). Can tidy as a follow-up though!

kevin-montrose · 2022-02-24T15:12:56Z

src/StackExchange.Redis/PhysicalBridge.cs

+        /// This allows us to keep the thread around for a full flush and prevent "feathering the throttle" trying
+        /// to flush it. In short, we don't start and stop so many threads with a bit of linger.
+        /// </summary>
+        private readonly AutoResetEvent _backlogAutoReset = new AutoResetEvent(false);


I don't see where this is disposed?

A few tweaks to the changes #2008 for disposing and normalization.

Alrighty, #2008 did something exceedingly stupid: it lingered *inside the lock*, jamming the connection and backlog during linger. Instead, this moves the thread preservation up in a much cleaner way and doesn't occupy the lock. Also adds a SpinningDown status so we can see it proper in exceptions, always.

In troubleshooting these 2 tests, I realized what's happening: a really dumb placement mistake in #2008. Now, instead of locking inside the damn lock, it loops outside a bit cleaner and higher up. Performance wins are the same but it's a lot sander and doesn't block both the backlog and the writer for another 5 seconds. Now only the thread lingers and it'll try to get the lock when running another pass, if it gets any in the next 5 seconds.

v2.5 Backlog: Use Autoreset for backlog thread lingering

d029edc

This prevents so many threads from starting/stopping as we finish flushing a backlog.

NickCraver added the 🐌 performance label Feb 24, 2022

NickCraver mentioned this pull request Feb 24, 2022

(Another) performance issue after migrating from v1 to v2 #2000

Closed

NickCraver marked this pull request as ready for review February 24, 2022 03:25

NickCraver requested review from mgravell and philon-msft February 24, 2022 03:25

NickCraver changed the title ~~v2.5 Backlog: Use Autoreset for backlog thread lingering~~ v2.5 Backlog: Use AutoResetEvent for backlog thread lingering Feb 24, 2022

Add release notes

5ab2208

mgravell approved these changes Feb 24, 2022

View reviewed changes

NickCraver merged commit d59d34e into main Feb 24, 2022

NickCraver deleted the craver/backlog-autoreset branch February 24, 2022 15:04

philon-msft reviewed Feb 24, 2022

View reviewed changes

kevin-montrose reviewed Feb 24, 2022

View reviewed changes

NickCraver added a commit that referenced this pull request Feb 24, 2022

#2008 Follow-ups

8ff5d21

A few tweaks to the changes #2008 for disposing and normalization.

NickCraver mentioned this pull request Feb 24, 2022

#2008 Follow-ups #2009

Merged

NickCraver added a commit that referenced this pull request Feb 24, 2022

#2008 Follow-ups (#2009)

6718fea

A few tweaks to the changes #2008 for disposing and normalization.

NickCraver mentioned this pull request Feb 26, 2022

Backlog locking fixes - more #2008 follow-up #2015

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.5 Backlog: Use AutoResetEvent for backlog thread lingering #2008

v2.5 Backlog: Use AutoResetEvent for backlog thread lingering #2008

NickCraver commented Feb 24, 2022 •

edited

philon-msft Feb 24, 2022

philon-msft Feb 24, 2022

NickCraver Feb 24, 2022

kevin-montrose Feb 24, 2022

v2.5 Backlog: Use AutoResetEvent for backlog thread lingering #2008

v2.5 Backlog: Use AutoResetEvent for backlog thread lingering #2008

Conversation

NickCraver commented Feb 24, 2022 • edited

philon-msft Feb 24, 2022

Choose a reason for hiding this comment

philon-msft Feb 24, 2022

Choose a reason for hiding this comment

NickCraver Feb 24, 2022

Choose a reason for hiding this comment

kevin-montrose Feb 24, 2022

Choose a reason for hiding this comment

NickCraver commented Feb 24, 2022 •

edited