[Perf] Widespread regressions after threadpool change #44211

DrewScoggins · 2020-11-03T18:15:15Z

After #43841, we are seeing widespread (about 40 tests) regress. Some of these are threading tests, but some seem to have little to do with threading, like DrewScoggins/performance-2#3177. Below I have a link to a query that is all of the regressions and a few improvements that are linked to this change. We are seeing this across Windows x64, x86 and Ubuntu x64, and I will be updating this issue as we get the Windows x86, and Ubuntu x64 regressions categorized.

Regression Query - Windows x64

Improvement Query - Windows x64

Dotnet-GitSync-Bot · 2020-11-03T18:15:18Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

kunalspathak · 2020-11-03T18:40:55Z

cc: @kouvel , @mangod9

mangod9 · 2020-11-03T19:13:14Z

Kount will take a look, this possibly might require to update the baselines since a startup hit might be causing some increase in per iteration times?

kunalspathak · 2020-11-03T19:14:57Z

Kount will take a look, this possibly might require to update the baselines since a startup hit might be causing some increase in per iteration times?

@adamsitnik

AndyAyersMS · 2020-11-03T19:29:58Z

BDN tests should generally be pretty well-insulated from startup effects. We also considered that the change might impact BDN itself, but that seems unlikely.

As Drew noted, a number of the regressions are in tests that don't seem to have anything to do with threading; this is quite puzzling. We also see instructions retired increases, so these regressions don't seem to be code alignment artifacts. Could be data alignment, but the regressions have been pretty stable. So it seems to suggest there must have been some path length changes in code that's not exclusively threading related.

kouvel · 2020-11-06T17:20:20Z

Part of the problem appears to be slower stack walking on the thread pool thread when performing a GC. Even though most of these tests don't use the thread pool, tiered compilation currently uses the thread pool to do its background work, and thread pool thread when waiting for work would have managed frames with GC refs. I could switch tiering to not use the thread pool, I was thinking about that anyway as it would give more control to choosing much CPU time to use in a stretch for rejits just using sleeps and without adding overhead from the thread pool queues and dispatching logic.

mangod9 · 2020-11-19T03:12:53Z

would it be possible to validate that theory of "stackwalking on the threadpool thread" causing the regression, by disabling the portable threadpool via config?

kouvel · 2020-11-19T04:01:16Z

Yes it's been verified like that on a few cases, and it looks like that would be the cause for most if not all of the tests, I just haven't verified it on all of them yet, that's pending

kouvel · 2020-11-19T04:02:37Z

Although verifying like that would not tell whether it's the same issue for all of the tests. Would need profiles for each test.

mangod9 · 2020-11-19T04:04:57Z

yeah validating against a few tests is fine, all are not required. We can create a tracking item to switch tiering to not use the TP. In the interim is there a way to rebase the tests to the new "baseline" -- @DrewScoggins ?

DrewScoggins · 2020-11-19T20:23:43Z

I am not sure exactly what you mean. We can take the PR and run it through the perf lab and generate a report to see if we are seeing improvements across the tests that we saw regressions in before. Or we can just test it locally on a few tests then check it in and then generate reports from the lab to make sure we got everything back.

mangod9 · 2020-11-19T22:16:36Z

I meant the fix would probably take some time to implement, so wondering if we can rebase the perf baseline, so it catches any further regressions which might happen on top on this.

DrewScoggins · 2020-11-19T22:18:21Z

Oh, I understand now. The system already does this. So if we see any further regressions in this test we would flag them as a regression and report them.

sebastienros · 2020-12-07T20:37:40Z

Just to add some data, most aspnet benchmarks are also down on windows with this change. Some also on linux but not as much.

kouvel · 2020-12-07T21:57:00Z

Filed a separate issue for the ASP.NET TE Windows perf issues, #45716. I don't think it's the same cause as this one.

kouvel · 2020-12-10T08:45:10Z

The following seem to yield different results in different runs even in the same config mode (though one result is more common than the other). May be depending on JIT timing, or due to other known issues like alignment of methods or loops, they're not related to #43841. JIT timing can be fairly stable and following other changes it can change to different timing but still fairly stable. Not considering them as regressions for now.

[Perf -19%] System.Threading.Tasks.Tests.Perf_AsyncMethods (2) DrewScoggins/performance-2#3190 - [Perf -19%] System.Threading.Tasks.Tests.Perf_AsyncMethods (2)
[Perf -6%] System.Collections.Concurrent.Count<Int32>.Queue_EnqueueCountDequeue DrewScoggins/performance-2#3194 - [Perf -6%] System.Collections.Concurrent.Count.Queue_EnqueueCountDequeue

Keeping this issue for the issue of slower stack walking mentioned above, filed a separate issue (#45885) for a few regressions that aren't caused by the slower stack walking.

- Makes it easier to manage how much time is spend for performing background work like rejitting and allows yielding more frequently with just Sleep without incurring thread pool overhead, which is useful in CPU-limited cases - A min/max range is determined for how long background work will be done before yielding the thread. The max is the same as before, 50 ms. For now the min is `processor count` ms (capped to the max), such that in CPU-limited cases the thread would yield more frequently in order to not monopolize too much of the limited CPU resources for background work, and in cases with a larger number of processors where the background work is typically less intrusive to foreground work it would yield less frequently. - At the same time, progress should be made on background work such that steady-state perf would be reached in reasonable time. Yielding too frequently can slow down the background work too much. The sleep duration is measured to identify oversubscribed situations to yield less frequently and make faster progress on the background work. - Due to less time spent rejitting in some CPU-limited cases, steady-state performance may be reached a bit later in favor of fewer spikes along the way - When the portable thread pool is enabled, a side effect of using a managed worker thread for tiering background work was that several GC-heavy microbenchmarks regressed. Tiering was the only thing using the thread pool in those tests and stack-walking the managed thread was slower due to the presence of GC refs. It's not too concerning, the benchmarks are just measuring something different from before, but in any case this change also resolves that issue. Fixes dotnet#44211.

Use a separate thread for tiered compilation background work - Makes it easier to manage how much time is spend for performing background work like rejitting and allows yielding more frequently with just Sleep without incurring thread pool overhead, which is useful in CPU-limited cases - A min/max range is determined for how long background work will be done before yielding the thread. The max is the same as before, 50 ms. For now the min is `processor count` ms (capped to the max), such that in CPU-limited cases the thread would yield more frequently in order to not monopolize too much of the limited CPU resources for background work, and in cases with a larger number of processors where the background work is typically less intrusive to foreground work it would yield less frequently. - At the same time, progress should be made on background work such that steady-state perf would be reached in reasonable time. Yielding too frequently can slow down the background work too much. The sleep duration is measured to identify oversubscribed situations to yield less frequently and make faster progress on the background work. - Due to less time spent rejitting in some CPU-limited cases, steady-state performance may be reached a bit later in favor of fewer spikes along the way - When the portable thread pool is enabled, a side effect of using a managed worker thread for tiering background work was that several GC-heavy microbenchmarks regressed. Tiering was the only thing using the thread pool in those tests and stack-walking the managed thread was slower due to the presence of GC refs. It's not too concerning, the benchmarks are just measuring something different from before, but in any case this change also resolves that issue. Fixes #44211.

DrewScoggins added arch-x86 os-linux Linux OS (any supported distro) os-windows tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark arch-x64 labels Nov 3, 2020

Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Nov 3, 2020

DrewScoggins changed the title ~~[Perf] Widespread regressions from threadpool change~~ [Perf] Widespread regressions after threadpool change Nov 3, 2020

mangod9 added area-System.Threading and removed untriaged New issue has not been triaged by the area owner labels Nov 3, 2020

mangod9 added this to the 6.0.0 milestone Nov 3, 2020

mangod9 assigned kouvel Nov 3, 2020

mangod9 added this to In progress + Investigations in Core-Runtime .net 9 Nov 3, 2020

kouvel mentioned this issue Nov 30, 2020

[Perf -11%] System.Collections.Tests.Add_Remove_SteadyState<Int32>.ConcurrentQueue #43905

Closed

kouvel mentioned this issue Dec 7, 2020

[PortableThreadPool] ASP.NET TE benchmarks on Windows are slower #45716

Closed

kouvel mentioned this issue Dec 10, 2020

[PortableThreadPool] Some microbenchmarks got slower after enabling it #45885

Closed

kouvel mentioned this issue Dec 10, 2020

Use a separate thread for tiered compilation background work #45901

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Dec 10, 2020

kouvel mentioned this issue Jan 7, 2021

[Perf][Mono] 50 % Regression System.Collections.Concurrent.AddRemoveFromSameThreads.ConcurrentQueue, other Queue Benchmarks #46684

Closed

kouvel closed this as completed in #45901 Jan 26, 2021

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 26, 2021

ghost locked as resolved and limited conversation to collaborators Feb 25, 2021

kouvel moved this from In progress + Investigations to Complete in Core-Runtime .net 9 Feb 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Perf] Widespread regressions after threadpool change #44211

[Perf] Widespread regressions after threadpool change #44211

DrewScoggins commented Nov 3, 2020 •

edited

Dotnet-GitSync-Bot commented Nov 3, 2020

kunalspathak commented Nov 3, 2020

mangod9 commented Nov 3, 2020

kunalspathak commented Nov 3, 2020

AndyAyersMS commented Nov 3, 2020

kouvel commented Nov 6, 2020 •

edited

mangod9 commented Nov 19, 2020 •

edited

kouvel commented Nov 19, 2020

kouvel commented Nov 19, 2020

mangod9 commented Nov 19, 2020

DrewScoggins commented Nov 19, 2020

mangod9 commented Nov 19, 2020

DrewScoggins commented Nov 19, 2020

sebastienros commented Dec 7, 2020

kouvel commented Dec 7, 2020

kouvel commented Dec 10, 2020 •

edited

[Perf] Widespread regressions after threadpool change #44211

[Perf] Widespread regressions after threadpool change #44211

Comments

DrewScoggins commented Nov 3, 2020 • edited

Dotnet-GitSync-Bot commented Nov 3, 2020

kunalspathak commented Nov 3, 2020

mangod9 commented Nov 3, 2020

kunalspathak commented Nov 3, 2020

AndyAyersMS commented Nov 3, 2020

kouvel commented Nov 6, 2020 • edited

mangod9 commented Nov 19, 2020 • edited

kouvel commented Nov 19, 2020

kouvel commented Nov 19, 2020

mangod9 commented Nov 19, 2020

DrewScoggins commented Nov 19, 2020

mangod9 commented Nov 19, 2020

DrewScoggins commented Nov 19, 2020

sebastienros commented Dec 7, 2020

kouvel commented Dec 7, 2020

kouvel commented Dec 10, 2020 • edited

DrewScoggins commented Nov 3, 2020 •

edited

kouvel commented Nov 6, 2020 •

edited

mangod9 commented Nov 19, 2020 •

edited

kouvel commented Dec 10, 2020 •

edited