New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel.For performance degrades significantly after 2.14 billion milliseconds of system uptime #87543
Comments
Framework 4.8 appears to have a different implementation for the timer that doesn't use Environment.TickCount https://referencesource.microsoft.com/#mscorlib/system/threading/Tasks/Parallel.cs,1226 |
Tagging subscribers to this area: @dotnet/area-system-threading-tasks Issue DetailsDescriptionWhen Environment.TickCount wraps around after 24.9 days (int.MaxValue number of milliseconds) of system uptime, Parallel.For loops will start exhibiting poor performance, generating way more tasks than expected We have a production service that is using Parallel.For to process 100s of millions of items, after 24.9 days of uptime we have noticed our calculation process going half as fast and not using as much CPU. A typical log entry for a loop might look like this
After 24.9 days of uptime it takes twice as long and looks like:
Note how all worker tasks except for one only complete 16 iterations of the work I suspect that TaskReplicator.GenerateCooperativeMultitakingTaskTimeout() is returning a negative timeout Which causes all loops inside the TaskReplicate.Run action to break out of the do while loop immediately For tasks that are processing 100k items, instead of 10s of worker tasks being used, thousands of worker tasks are spawned, resulting in significantly degraded performance. Reproduction StepsOn a system where Environment.TickCount is negative (after 24.9 days of system uptime), can run the following test console that outputs how Parallel.For is splitting up the work and see that way more workers are spawned than expected and each one only processes 16 items (other than the root task which does the bulk of all work) Run this net7 console app
Expected behavior
The list of top work is the time spent by each worker task and the number of iterations completed, it's expected that the numbers would be somewhat evenly distributed Actual behavior
Instead of the iterations being evenly spread over workers Regression?This code appears to be unchanged in 8+ years, so maybe it's always been broken in this way Known WorkaroundsReboot server so that it doesn't have a negative Environment.TickCount ConfigurationWas tested with net 7.05 and 7.0.7, running on windows server 2022 Other informationNo response
|
It should be an easy fix to use |
I'm just here to compliment whoever managed to find and reproduce this work. That's a great find! |
Presumably it starts working better after 50 days...😕 |
Description
When Environment.TickCount wraps around after 24.9 days (int.MaxValue number of milliseconds) of system uptime, Parallel.For loops will start exhibiting poor performance, generating way more tasks than expected
We have a production service that is using Parallel.For to process 100s of millions of items, after 24.9 days of uptime we have noticed our calculation process going half as fast and not using as much CPU. A typical log entry for a loop might look like this
After 24.9 days of uptime it takes twice as long and looks like:
Note how all worker tasks except for one only complete 16 iterations of the work
I suspect that TaskReplicator.GenerateCooperativeMultitakingTaskTimeout() is returning a negative timeout
https://github.com/dotnet/runtime/blob/v7.0.7/src/libraries/System.Threading.Tasks.Parallel/src/System/Threading/Tasks/TaskReplicator.cs#L170
Which causes all loops inside the TaskReplicate.Run action to break out of the do while loop immediately
https://github.com/dotnet/runtime/blob/v7.0.7/src/libraries/System.Threading.Tasks.Parallel/src/System/Threading/Tasks/Parallel.cs#L1091
For tasks that are processing 100k items, instead of 10s of worker tasks being used, thousands of worker tasks are spawned, resulting in significantly degraded performance.
Reproduction Steps
On a system where Environment.TickCount is negative (after 24.9 days of system uptime), can run the following test console that outputs how Parallel.For is splitting up the work and see that way more workers are spawned than expected and each one only processes 16 items (other than the root task which does the bulk of all work)
Run this net7 console app
Expected behavior
Count: 100,000 maxDop: 32 time: 1.304 workers: 57 Top Work: 1.272 4,587, 1.060 4,047, 0.968 4,692, 0.944 3,541, 0.859 3,339
The list of top work is the time spent by each worker task and the number of iterations completed, it's expected that the numbers would be somewhat evenly distributed
Actual behavior
Count: 100,000 maxDop: 32 time: 2.304 workers: 4,352 Top Work: 1.272 24,587, 0.060 16, 0.010 16, ...
Instead of the iterations being evenly spread over workers
Regression?
This code appears to be unchanged in 8+ years, so maybe it's always been broken in this way
Known Workarounds
Reboot server so that it doesn't have a negative Environment.TickCount
Configuration
Was tested with net 7.05 and 7.0.7, running on windows server 2022
Other information
No response
The text was updated successfully, but these errors were encountered: