Parallel.For performance degrades significantly after 2.14 billion milliseconds of system uptime #87543

brandonagr · 2023-06-14T05:35:34Z

Description

When Environment.TickCount wraps around after 24.9 days (int.MaxValue number of milliseconds) of system uptime, Parallel.For loops will start exhibiting poor performance, generating way more tasks than expected

We have a production service that is using Parallel.For to process 100s of millions of items, after 24.9 days of uptime we have noticed our calculation process going half as fast and not using as much CPU. A typical log entry for a loop might look like this

Count: 13,039,697
Time: 0.463
Workers: 64
Top Work: 0.462 256,561, 0.461 274,515, 0.460 140,719

After 24.9 days of uptime it takes twice as long and looks like:

Count: 13,039,697
Time: 0.871
Workers: 733,401
Top Work: 0.836 1,309,099, 0.008 16, 0.008 16

Note how all worker tasks except for one only complete 16 iterations of the work

I suspect that TaskReplicator.GenerateCooperativeMultitakingTaskTimeout() is returning a negative timeout
https://github.com/dotnet/runtime/blob/v7.0.7/src/libraries/System.Threading.Tasks.Parallel/src/System/Threading/Tasks/TaskReplicator.cs#L170

Which causes all loops inside the TaskReplicate.Run action to break out of the do while loop immediately
https://github.com/dotnet/runtime/blob/v7.0.7/src/libraries/System.Threading.Tasks.Parallel/src/System/Threading/Tasks/Parallel.cs#L1091

For tasks that are processing 100k items, instead of 10s of worker tasks being used, thousands of worker tasks are spawned, resulting in significantly degraded performance.

Reproduction Steps

On a system where Environment.TickCount is negative (after 24.9 days of system uptime), can run the following test console that outputs how Parallel.For is splitting up the work and see that way more workers are spawned than expected and each one only processes 16 items (other than the root task which does the bulk of all work)

Run this net7 console app

using System.Diagnostics;

int maxDop = 32;
if (args.Length != 0)
{
	if (!int.TryParse(args[0], out maxDop))
	{
		maxDop = 32;
	}
}

Console.WriteLine("Running withMaxDop {0}", maxDop);
EvaluationAndDedupeWorkerData.Process(100_000, maxDop);

class EvaluationAndDedupeWorkerData
{
	public int Iterations;
	public readonly Stopwatch Level;
	public readonly List<KeyValuePair<long, long>> DataPointCalcTime = new();

	public EvaluationAndDedupeWorkerData()
	{
		Iterations = 0;
		Level = Stopwatch.StartNew();
	}

	public static void Process(int count, int maxDop)
	{
		var workAnalysis = new List<Tuple<double, int>>();
		var sw = Stopwatch.StartNew();

		Parallel.For(0, count, new ParallelOptions { MaxDegreeOfParallelism = maxDop },
			() => {
				return new EvaluationAndDedupeWorkerData();
			},
			(evalIndex, _, local) => {

				local.Iterations++;

				int dummy = 0;
				while (dummy < 100_000)
				{
					dummy++;
				}

				return local;
			},
			local => {
				lock (workAnalysis)
				{
					workAnalysis.Add(Tuple.Create(local.Level.Elapsed.TotalSeconds, local.Iterations));
				}
			});

		var workString = string.Join(", ", workAnalysis.OrderByDescending(x => x.Item1).Select(x => $"{x.Item1:0.000} {x.Item2:N0}").Take(5));
		Console.WriteLine($"Count: {count:N0} maxDop: {maxDop} time: {sw.Elapsed.TotalSeconds:0.000} workers: {workAnalysis.Count:N0} Top Work: {workString}");
	}
}

Expected behavior

Count: 100,000 maxDop: 32 time: 1.304 workers: 57 Top Work: 1.272 4,587, 1.060 4,047, 0.968 4,692, 0.944 3,541, 0.859 3,339

The list of top work is the time spent by each worker task and the number of iterations completed, it's expected that the numbers would be somewhat evenly distributed

Actual behavior

Count: 100,000 maxDop: 32 time: 2.304 workers: 4,352 Top Work: 1.272 24,587, 0.060 16, 0.010 16, ...

Instead of the iterations being evenly spread over workers

Regression?

This code appears to be unchanged in 8+ years, so maybe it's always been broken in this way

Known Workarounds

Reboot server so that it doesn't have a negative Environment.TickCount

Configuration

Was tested with net 7.05 and 7.0.7, running on windows server 2022

Other information

No response

The text was updated successfully, but these errors were encountered:

brandonagr · 2023-06-14T05:43:33Z

Framework 4.8 appears to have a different implementation for the timer that doesn't use Environment.TickCount

https://referencesource.microsoft.com/#mscorlib/system/threading/Tasks/Parallel.cs,1226
https://referencesource.microsoft.com/#mscorlib/system/threading/Tasks/Parallel.cs,3580

ghost · 2023-06-14T07:13:47Z

Tagging subscribers to this area: @dotnet/area-system-threading-tasks
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

When Environment.TickCount wraps around after 24.9 days (int.MaxValue number of milliseconds) of system uptime, Parallel.For loops will start exhibiting poor performance, generating way more tasks than expected

We have a production service that is using Parallel.For to process 100s of millions of items, after 24.9 days of uptime we have noticed our calculation process going half as fast and not using as much CPU. A typical log entry for a loop might look like this

Count: 13,039,697
Time: 0.463
Workers: 64
Top Work: 0.462 256,561, 0.461 274,515, 0.460 140,719

After 24.9 days of uptime it takes twice as long and looks like:

Count: 13,039,697
Time: 0.871
Workers: 733,401
Top Work: 0.836 1,309,099, 0.008 16, 0.008 16

Note how all worker tasks except for one only complete 16 iterations of the work

I suspect that TaskReplicator.GenerateCooperativeMultitakingTaskTimeout() is returning a negative timeout
https://github.com/dotnet/runtime/blob/v7.0.7/src/libraries/System.Threading.Tasks.Parallel/src/System/Threading/Tasks/TaskReplicator.cs#L170

Which causes all loops inside the TaskReplicate.Run action to break out of the do while loop immediately
https://github.com/dotnet/runtime/blob/v7.0.7/src/libraries/System.Threading.Tasks.Parallel/src/System/Threading/Tasks/Parallel.cs#L1091

For tasks that are processing 100k items, instead of 10s of worker tasks being used, thousands of worker tasks are spawned, resulting in significantly degraded performance.

Reproduction Steps

On a system where Environment.TickCount is negative (after 24.9 days of system uptime), can run the following test console that outputs how Parallel.For is splitting up the work and see that way more workers are spawned than expected and each one only processes 16 items (other than the root task which does the bulk of all work)

Run this net7 console app

using System.Diagnostics;

int maxDop = 32;
if (args.Length != 0)
{
	if (!int.TryParse(args[0], out maxDop))
	{
		maxDop = 32;
	}
}

Console.WriteLine("Running withMaxDop {0}", maxDop);
EvaluationAndDedupeWorkerData.Process(100_000, maxDop);

class EvaluationAndDedupeWorkerData
{
	public int Iterations;
	public readonly Stopwatch Level;
	public readonly List<KeyValuePair<long, long>> DataPointCalcTime = new();

	public EvaluationAndDedupeWorkerData()
	{
		Iterations = 0;
		Level = Stopwatch.StartNew();
	}

	public static void Process(int count, int maxDop)
	{
		var workAnalysis = new List<Tuple<double, int>>();
		var sw = Stopwatch.StartNew();

		Parallel.For(0, count, new ParallelOptions { MaxDegreeOfParallelism = maxDop },
			() => {
				return new EvaluationAndDedupeWorkerData();
			},
			(evalIndex, _, local) => {

				local.Iterations++;

				int dummy = 0;
				while (dummy < 100_000)
				{
					dummy++;
				}

				return local;
			},
			local => {
				lock (workAnalysis)
				{
					workAnalysis.Add(Tuple.Create(local.Level.Elapsed.TotalSeconds, local.Iterations));
				}
			});

		var workString = string.Join(", ", workAnalysis.OrderByDescending(x => x.Item1).Select(x => $"{x.Item1:0.000} {x.Item2:N0}").Take(5));
		Console.WriteLine($"Count: {count:N0} maxDop: {maxDop} time: {sw.Elapsed.TotalSeconds:0.000} workers: {workAnalysis.Count:N0} Top Work: {workString}");
	}
}

Expected behavior

Count: 100,000 maxDop: 32 time: 1.304 workers: 57 Top Work: 1.272 4,587, 1.060 4,047, 0.968 4,692, 0.944 3,541, 0.859 3,339

The list of top work is the time spent by each worker task and the number of iterations completed, it's expected that the numbers would be somewhat evenly distributed

Actual behavior

Count: 100,000 maxDop: 32 time: 2.304 workers: 4,352 Top Work: 1.272 24,587, 0.060 16, 0.010 16, ...

Instead of the iterations being evenly spread over workers

Regression?

This code appears to be unchanged in 8+ years, so maybe it's always been broken in this way

Known Workarounds

Reboot server so that it doesn't have a negative Environment.TickCount

Configuration

Was tested with net 7.05 and 7.0.7, running on windows server 2022

Other information

No response

Author:	brandonagr
Assignees:	-
Labels:	`area-System.Threading.Tasks`, `untriaged`
Milestone:	-

huoyaoyuan · 2023-06-14T07:16:02Z

It should be an easy fix to use Environment.TickCount64 instead.

AmbachtIT · 2023-06-18T10:38:11Z

I'm just here to compliment whoever managed to find and reproduce this work. That's a great find!

danmoseley · 2023-06-18T14:19:39Z

Presumably it starts working better after 50 days...😕

ghost added the untriaged New issue has not been triaged by the area owner label Jun 14, 2023

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jun 14, 2023

huoyaoyuan added area-System.Threading.Tasks and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jun 14, 2023

stephentoub mentioned this issue Jun 20, 2023

Change Parallel to use Environment.TickCount64 #87800

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 20, 2023

stephentoub closed this as completed in #87800 Jun 20, 2023

ghost removed in-pr There is an active PR which will close this issue when it is merged untriaged New issue has not been triaged by the area owner labels Jun 20, 2023

dotnet locked as resolved and limited conversation to collaborators Jul 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel.For performance degrades significantly after 2.14 billion milliseconds of system uptime #87543

Parallel.For performance degrades significantly after 2.14 billion milliseconds of system uptime #87543

brandonagr commented Jun 14, 2023

brandonagr commented Jun 14, 2023

ghost commented Jun 14, 2023

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

huoyaoyuan commented Jun 14, 2023

AmbachtIT commented Jun 18, 2023

danmoseley commented Jun 18, 2023 •

edited

Parallel.For performance degrades significantly after 2.14 billion milliseconds of system uptime #87543

Parallel.For performance degrades significantly after 2.14 billion milliseconds of system uptime #87543

Comments

brandonagr commented Jun 14, 2023

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

brandonagr commented Jun 14, 2023

ghost commented Jun 14, 2023

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

huoyaoyuan commented Jun 14, 2023

AmbachtIT commented Jun 18, 2023

danmoseley commented Jun 18, 2023 • edited

danmoseley commented Jun 18, 2023 •

edited