Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queryable.Where causing performance issue in .NET 6 #66076

Closed
syncsiva opened this issue Mar 2, 2022 · 15 comments · Fixed by #66259
Closed

Queryable.Where causing performance issue in .NET 6 #66076

syncsiva opened this issue Mar 2, 2022 · 15 comments · Fixed by #66259
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue

Comments

@syncsiva
Copy link

syncsiva commented Mar 2, 2022

Description

Find time taken for same code between .NET5 and .NET6 in simple console app.

.NET 5
Time taken 210.2811 milliseconds

.NET 6
Time taken 79274.2204 milliseconds (more than 7 secs)

Originally we found this problem in Blazor and later re-created in simple console app.

Console App.zip

Replication steps:

  1. Download and run the console app
  2. Time taken will be displayed on console itself.

Configuration

  • Which version of .NET is the code running on? .NET 6
  • What OS version, and what distro if applicable? Window 10 (Version 21H1 OS Build 19043, 1526)
  • What is the architecture (x64, x86, ARM, ARM64)? x64

Regression?

Yes. Just change the target framework to .NET 5 in the provided console app sample. It took only 200 milli secs.

Data

.NET 5
Time taken 210.2811 milliseconds

.NET 6
Time taken 79274.2204 milliseconds (more than 7 secs)

@syncsiva syncsiva added the tenet-performance Performance related issue label Mar 2, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Console untriaged New issue has not been triaged by the area owner labels Mar 2, 2022
@ghost
Copy link

ghost commented Mar 2, 2022

Tagging subscribers to this area: @dotnet/area-system-console
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Find time taken for same code between .NET5 and .NET6 in simple console app.

.NET 5
Time taken 210.2811 milliseconds

.NET 6
Time taken 79274.2204 milliseconds (more than 7 secs)

Originally we found this problem in Blazor and later re-created in simple console app.

Console App.zip

Replication steps:

  1. Download and run the console app
  2. Time taken will be displayed on console itself.

Configuration

  • Which version of .NET is the code running on? .NET 6
  • What OS version, and what distro if applicable? Window 10 (Version 21H1 OS Build 19043, 1526)
  • What is the architecture (x64, x86, ARM, ARM64)? x64

Regression?

Yes. Just change the target framework to .NET 5 in the provided console app sample. It took only 200 milli secs.

Data

.NET 5
Time taken 210.2811 milliseconds

.NET 6
Time taken 79274.2204 milliseconds (more than 7 secs)

Author: syncsiva
Assignees: -
Labels:

area-System.Console, tenet-performance, untriaged

Milestone: -

@ghost
Copy link

ghost commented Mar 2, 2022

Tagging subscribers to this area: @dotnet/area-system-linq
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Find time taken for same code between .NET5 and .NET6 in simple console app.

.NET 5
Time taken 210.2811 milliseconds

.NET 6
Time taken 79274.2204 milliseconds (more than 7 secs)

Originally we found this problem in Blazor and later re-created in simple console app.

Console App.zip

Replication steps:

  1. Download and run the console app
  2. Time taken will be displayed on console itself.

Configuration

  • Which version of .NET is the code running on? .NET 6
  • What OS version, and what distro if applicable? Window 10 (Version 21H1 OS Build 19043, 1526)
  • What is the architecture (x64, x86, ARM, ARM64)? x64

Regression?

Yes. Just change the target framework to .NET 5 in the provided console app sample. It took only 200 milli secs.

Data

.NET 5
Time taken 210.2811 milliseconds

.NET 6
Time taken 79274.2204 milliseconds (more than 7 secs)

Author: syncsiva
Assignees: -
Labels:

area-System.Linq, tenet-performance, untriaged

Milestone: -

@Neo-vortex
Copy link
Contributor

It takes Time taken 185.1606 milliseconds
for me on this:

Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.102
  [Host]     : .NET 6.0.2 (6.0.222.6406), X64 RyuJIT

Can you provide a simpler PoC ?

@syncsiva
Copy link
Author

syncsiva commented Mar 3, 2022

@Neo-vortex while running the provided console project in my machine, it's taking 54658 in .NET 6 and 396 milliseconds in .NET 5. Please refer below screenshots.

Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz   1.80 GHz
.NET SDK 6.0.101

.NET 6
image

.NET 5
image

@jkotas jkotas added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 3, 2022
@ghost
Copy link

ghost commented Mar 3, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Find time taken for same code between .NET5 and .NET6 in simple console app.

.NET 5
Time taken 210.2811 milliseconds

.NET 6
Time taken 79274.2204 milliseconds (more than 7 secs)

Originally we found this problem in Blazor and later re-created in simple console app.

Console App.zip

Replication steps:

  1. Download and run the console app
  2. Time taken will be displayed on console itself.

Configuration

  • Which version of .NET is the code running on? .NET 6
  • What OS version, and what distro if applicable? Window 10 (Version 21H1 OS Build 19043, 1526)
  • What is the architecture (x64, x86, ARM, ARM64)? x64

Regression?

Yes. Just change the target framework to .NET 5 in the provided console app sample. It took only 200 milli secs.

Data

.NET 5
Time taken 210.2811 milliseconds

.NET 6
Time taken 79274.2204 milliseconds (more than 7 secs)

Author: syncsiva
Assignees: -
Labels:

area-System.Linq, tenet-performance, area-CodeGen-coreclr, untriaged

Milestone: -

@jkotas
Copy link
Member

jkotas commented Mar 3, 2022

dynamicClass.lambda_method2(pMT: 00007FFBC8F73448,class System.Object) takes 100's seconds to JIT. It took 100's miliseconds (ie 1000x less) to JIT in .NET 5.

It is large method (~250kB of native code in both .NET 5 and .NET 6).

@jkotas
Copy link
Member

jkotas commented Mar 3, 2022

This is the hot callstack in .NET 6

clrjit!Compiler::optReachable+0x5f [D:\a\_work\1\s\src\coreclr\jit\redundantbranchopts.cpp @ 656] 
clrjit!Compiler::optRedundantBranch+0xc9 [D:\a\_work\1\s\src\coreclr\jit\redundantbranchopts.cpp @ 155] 
clrjit!Compiler::optRedundantBranches::__l2::OptRedundantBranchesDomTreeVisitor::PostOrderVisit+0x4f [D:\a\_work\1\s\src\coreclr\jit\redundantbranchopts.cpp @ 49] 
clrjit!DomTreeVisitor<`Compiler::optRedundantBranches'::`2'::OptRedundantBranchesDomTreeVisitor>::WalkTree+0x79 [D:\a\_work\1\s\src\coreclr\jit\compiler.h @ 11409] 

@B1Z0N
Copy link
Contributor

B1Z0N commented Mar 3, 2022

I've reproduced and localized the issue: if in Program.cs::58 you switch from as IEnumerable<object> to as IEnumerable<EmployeeData> it works just as fast as in net5.0.

OS: Arch Linux x86_64
Kernel: Kernel: 5.16.11-arch1-1
CPU: Intel i7-8550U (8) @ 4.000GHz

@B1Z0N
Copy link
Contributor

B1Z0N commented Mar 3, 2022

Also, I don't know if it was obvious or not, I'm kinda new to the project, but there is no difference in IL code between net5.0 and net6.0 of FIltering.cs file . So the reason perhaps in JIT as @jkotas already said. Could you please hint me into tools you've been using to get hot callstack and to profile JIT?

I was using the standard Rider IL viewer. Are there any other good viewers for Linux?

@jkotas
Copy link
Member

jkotas commented Mar 3, 2022

Could you please hint me into tools you've been using to get hot callstack and to profile JIT?

https://github.com/microsoft/perfview

@JulieLeeMSFT
Copy link
Member

cc @dotnet/jit-contrib.

@AndyAyersMS
Copy link
Member

optRedundantBranches is new in .NET 6. I'll take a look.

@AndyAyersMS AndyAyersMS self-assigned this Mar 4, 2022
@AndyAyersMS AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Mar 4, 2022
@B1Z0N
Copy link
Contributor

B1Z0N commented Mar 4, 2022

If it might help, I've reduced proof of concept to 50 lines of code.

@Neo-vortex @AndyAyersMS

@kunalspathak
Copy link
Member

#46237

@AndyAyersMS
Copy link
Member

Method has a long, skinny dominator tree with every branch potentially redundant. This leads to quadratic behavior in the redundant branch optimizer.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 6, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 6, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Apr 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants