Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel.For performance degrades significantly after 2.14 billion milliseconds of system uptime #87543

Closed
brandonagr opened this issue Jun 14, 2023 · 5 comments · Fixed by #87800

Comments

@brandonagr
Copy link

Description

When Environment.TickCount wraps around after 24.9 days (int.MaxValue number of milliseconds) of system uptime, Parallel.For loops will start exhibiting poor performance, generating way more tasks than expected

We have a production service that is using Parallel.For to process 100s of millions of items, after 24.9 days of uptime we have noticed our calculation process going half as fast and not using as much CPU. A typical log entry for a loop might look like this

Count: 13,039,697
Time: 0.463
Workers: 64
Top Work: 0.462 256,561, 0.461 274,515, 0.460 140,719

After 24.9 days of uptime it takes twice as long and looks like:

Count: 13,039,697
Time: 0.871
Workers: 733,401
Top Work: 0.836 1,309,099, 0.008 16, 0.008 16

Note how all worker tasks except for one only complete 16 iterations of the work

I suspect that TaskReplicator.GenerateCooperativeMultitakingTaskTimeout() is returning a negative timeout
https://github.com/dotnet/runtime/blob/v7.0.7/src/libraries/System.Threading.Tasks.Parallel/src/System/Threading/Tasks/TaskReplicator.cs#L170

Which causes all loops inside the TaskReplicate.Run action to break out of the do while loop immediately
https://github.com/dotnet/runtime/blob/v7.0.7/src/libraries/System.Threading.Tasks.Parallel/src/System/Threading/Tasks/Parallel.cs#L1091

For tasks that are processing 100k items, instead of 10s of worker tasks being used, thousands of worker tasks are spawned, resulting in significantly degraded performance.

Reproduction Steps

On a system where Environment.TickCount is negative (after 24.9 days of system uptime), can run the following test console that outputs how Parallel.For is splitting up the work and see that way more workers are spawned than expected and each one only processes 16 items (other than the root task which does the bulk of all work)

Run this net7 console app

using System.Diagnostics;

int maxDop = 32;
if (args.Length != 0)
{
	if (!int.TryParse(args[0], out maxDop))
	{
		maxDop = 32;
	}
}

Console.WriteLine("Running withMaxDop {0}", maxDop);
EvaluationAndDedupeWorkerData.Process(100_000, maxDop);

class EvaluationAndDedupeWorkerData
{
	public int Iterations;
	public readonly Stopwatch Level;
	public readonly List<KeyValuePair<long, long>> DataPointCalcTime = new();

	public EvaluationAndDedupeWorkerData()
	{
		Iterations = 0;
		Level = Stopwatch.StartNew();
	}

	public static void Process(int count, int maxDop)
	{
		var workAnalysis = new List<Tuple<double, int>>();
		var sw = Stopwatch.StartNew();

		Parallel.For(0, count, new ParallelOptions { MaxDegreeOfParallelism = maxDop },
			() => {
				return new EvaluationAndDedupeWorkerData();
			},
			(evalIndex, _, local) => {

				local.Iterations++;

				int dummy = 0;
				while (dummy < 100_000)
				{
					dummy++;
				}

				return local;
			},
			local => {
				lock (workAnalysis)
				{
					workAnalysis.Add(Tuple.Create(local.Level.Elapsed.TotalSeconds, local.Iterations));
				}
			});

		var workString = string.Join(", ", workAnalysis.OrderByDescending(x => x.Item1).Select(x => $"{x.Item1:0.000} {x.Item2:N0}").Take(5));
		Console.WriteLine($"Count: {count:N0} maxDop: {maxDop} time: {sw.Elapsed.TotalSeconds:0.000} workers: {workAnalysis.Count:N0} Top Work: {workString}");
	}
}

Expected behavior

Count: 100,000 maxDop: 32 time: 1.304 workers: 57 Top Work: 1.272 4,587, 1.060 4,047, 0.968 4,692, 0.944 3,541, 0.859 3,339

The list of top work is the time spent by each worker task and the number of iterations completed, it's expected that the numbers would be somewhat evenly distributed

Actual behavior

Count: 100,000 maxDop: 32 time: 2.304 workers: 4,352 Top Work: 1.272 24,587, 0.060 16, 0.010 16, ...

Instead of the iterations being evenly spread over workers

Regression?

This code appears to be unchanged in 8+ years, so maybe it's always been broken in this way

Known Workarounds

Reboot server so that it doesn't have a negative Environment.TickCount

Configuration

Was tested with net 7.05 and 7.0.7, running on windows server 2022

Other information

No response

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jun 14, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jun 14, 2023
@brandonagr
Copy link
Author

Framework 4.8 appears to have a different implementation for the timer that doesn't use Environment.TickCount

https://referencesource.microsoft.com/#mscorlib/system/threading/Tasks/Parallel.cs,1226
https://referencesource.microsoft.com/#mscorlib/system/threading/Tasks/Parallel.cs,3580

@huoyaoyuan huoyaoyuan added area-System.Threading.Tasks and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jun 14, 2023
@ghost
Copy link

ghost commented Jun 14, 2023

Tagging subscribers to this area: @dotnet/area-system-threading-tasks
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

When Environment.TickCount wraps around after 24.9 days (int.MaxValue number of milliseconds) of system uptime, Parallel.For loops will start exhibiting poor performance, generating way more tasks than expected

We have a production service that is using Parallel.For to process 100s of millions of items, after 24.9 days of uptime we have noticed our calculation process going half as fast and not using as much CPU. A typical log entry for a loop might look like this

Count: 13,039,697
Time: 0.463
Workers: 64
Top Work: 0.462 256,561, 0.461 274,515, 0.460 140,719

After 24.9 days of uptime it takes twice as long and looks like:

Count: 13,039,697
Time: 0.871
Workers: 733,401
Top Work: 0.836 1,309,099, 0.008 16, 0.008 16

Note how all worker tasks except for one only complete 16 iterations of the work

I suspect that TaskReplicator.GenerateCooperativeMultitakingTaskTimeout() is returning a negative timeout
https://github.com/dotnet/runtime/blob/v7.0.7/src/libraries/System.Threading.Tasks.Parallel/src/System/Threading/Tasks/TaskReplicator.cs#L170

Which causes all loops inside the TaskReplicate.Run action to break out of the do while loop immediately
https://github.com/dotnet/runtime/blob/v7.0.7/src/libraries/System.Threading.Tasks.Parallel/src/System/Threading/Tasks/Parallel.cs#L1091

For tasks that are processing 100k items, instead of 10s of worker tasks being used, thousands of worker tasks are spawned, resulting in significantly degraded performance.

Reproduction Steps

On a system where Environment.TickCount is negative (after 24.9 days of system uptime), can run the following test console that outputs how Parallel.For is splitting up the work and see that way more workers are spawned than expected and each one only processes 16 items (other than the root task which does the bulk of all work)

Run this net7 console app

using System.Diagnostics;

int maxDop = 32;
if (args.Length != 0)
{
	if (!int.TryParse(args[0], out maxDop))
	{
		maxDop = 32;
	}
}

Console.WriteLine("Running withMaxDop {0}", maxDop);
EvaluationAndDedupeWorkerData.Process(100_000, maxDop);

class EvaluationAndDedupeWorkerData
{
	public int Iterations;
	public readonly Stopwatch Level;
	public readonly List<KeyValuePair<long, long>> DataPointCalcTime = new();

	public EvaluationAndDedupeWorkerData()
	{
		Iterations = 0;
		Level = Stopwatch.StartNew();
	}

	public static void Process(int count, int maxDop)
	{
		var workAnalysis = new List<Tuple<double, int>>();
		var sw = Stopwatch.StartNew();

		Parallel.For(0, count, new ParallelOptions { MaxDegreeOfParallelism = maxDop },
			() => {
				return new EvaluationAndDedupeWorkerData();
			},
			(evalIndex, _, local) => {

				local.Iterations++;

				int dummy = 0;
				while (dummy < 100_000)
				{
					dummy++;
				}

				return local;
			},
			local => {
				lock (workAnalysis)
				{
					workAnalysis.Add(Tuple.Create(local.Level.Elapsed.TotalSeconds, local.Iterations));
				}
			});

		var workString = string.Join(", ", workAnalysis.OrderByDescending(x => x.Item1).Select(x => $"{x.Item1:0.000} {x.Item2:N0}").Take(5));
		Console.WriteLine($"Count: {count:N0} maxDop: {maxDop} time: {sw.Elapsed.TotalSeconds:0.000} workers: {workAnalysis.Count:N0} Top Work: {workString}");
	}
}

Expected behavior

Count: 100,000 maxDop: 32 time: 1.304 workers: 57 Top Work: 1.272 4,587, 1.060 4,047, 0.968 4,692, 0.944 3,541, 0.859 3,339

The list of top work is the time spent by each worker task and the number of iterations completed, it's expected that the numbers would be somewhat evenly distributed

Actual behavior

Count: 100,000 maxDop: 32 time: 2.304 workers: 4,352 Top Work: 1.272 24,587, 0.060 16, 0.010 16, ...

Instead of the iterations being evenly spread over workers

Regression?

This code appears to be unchanged in 8+ years, so maybe it's always been broken in this way

Known Workarounds

Reboot server so that it doesn't have a negative Environment.TickCount

Configuration

Was tested with net 7.05 and 7.0.7, running on windows server 2022

Other information

No response

Author: brandonagr
Assignees: -
Labels:

area-System.Threading.Tasks, untriaged

Milestone: -

@huoyaoyuan
Copy link
Member

It should be an easy fix to use Environment.TickCount64 instead.

@AmbachtIT
Copy link

I'm just here to compliment whoever managed to find and reproduce this work. That's a great find!

@danmoseley
Copy link
Member

danmoseley commented Jun 18, 2023

Presumably it starts working better after 50 days...😕

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 20, 2023
@ghost ghost removed in-pr There is an active PR which will close this issue when it is merged untriaged New issue has not been triaged by the area owner labels Jun 20, 2023
@dotnet dotnet locked as resolved and limited conversation to collaborators Jul 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants