Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in PerfLabTests.LowLevelPerf.GenericClassWithIntGenericInstanceField #13770

Closed
DrewScoggins opened this issue Nov 9, 2019 · 8 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance-benchmarks Issue from performance benchmark
Milestone

Comments

@DrewScoggins
Copy link
Member

There is a 115% regression in PerfLabTests.LowLevelPerf.GenericClassWithIntGenericInstanceField between 3.0->3.1

image

category:cq
theme:benchmarks
skill-level:intermediate
cost:medium

@danmoseley
Copy link
Member

danmoseley commented Nov 9, 2019

For issues tracking "step function" regressions like this one can you please include a link to the commits that occurred at the point of the regression?

@DrewScoggins
Copy link
Member Author

@BruceForstall
Copy link
Member

cc @AndyAyersMS

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@AndyAyersMS
Copy link
Member

Seems like we are bouncing between 60us and 90us here in 5.0, but the 3.0 baseline was 43us.

So apparent regression is more likely (60 - 43) / 43 = 40%, Probably worth an investiagtion.

image

@AndyAyersMS
Copy link
Member

@DrewScoggins is the lab running with tiering enabled? If so I'm puzzled why the 5.0 numbers fluctuate the way they do.

Locally the 3.0/3.1 results seem bi-stable too:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.836 (1909/November2018Update/19H2)
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100-preview.4.20258.7
  [Host]        : .NET Core 3.1.4 (CoreCLR 4.700.20.20201, CoreFX 4.700.20.21406), X64 RyuJIT
  .NET Core 3.0 : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT
  .NET Core 3.1 : .NET Core 3.1.4 (CoreCLR 4.700.20.20201, CoreFX 4.700.20.21406), X64 RyuJIT
  .NET Core 5.0 : .NET Core 5.0.0 (CoreCLR 5.0.20.25106, CoreFX 5.0.20.25106), X64 RyuJIT


|                                  Method |           Job |       Runtime |     Mean |    Error |   StdDev |   Median | Code Size |
|---------------------------------------- |-------------- |-------------- |---------:|---------:|---------:|---------:|----------:|
| GenericClassWithIntGenericInstanceField | .NET Core 3.0 | .NET Core 3.0 | 55.04 us | 0.938 us | 2.154 us | 54.74 us |      39 B |
| GenericClassWithIntGenericInstanceField | .NET Core 3.1 | .NET Core 3.1 | 36.28 us | 1.122 us | 3.291 us | 34.42 us |      39 B |
| GenericClassWithIntGenericInstanceField | .NET Core 5.0 | .NET Core 5.0 | 55.83 us | 1.074 us | 0.897 us | 56.02 us |      39 B |

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.836 (1909/November2018Update/19H2)
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100-preview.4.20258.7
  [Host]        : .NET Core 3.1.4 (CoreCLR 4.700.20.20201, CoreFX 4.700.20.21406), X64 RyuJIT
  .NET Core 3.0 : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT
  .NET Core 3.1 : .NET Core 3.1.4 (CoreCLR 4.700.20.20201, CoreFX 4.700.20.21406), X64 RyuJIT
  .NET Core 5.0 : .NET Core 5.0.0 (CoreCLR 5.0.20.25106, CoreFX 5.0.20.25106), X64 RyuJIT


|                                  Method |           Job |       Runtime |     Mean |    Error |   StdDev | Code Size |
|---------------------------------------- |-------------- |-------------- |---------:|---------:|---------:|----------:|
| GenericClassWithIntGenericInstanceField | .NET Core 3.0 | .NET Core 3.0 | 34.12 us | 0.604 us | 1.043 us |      39 B |
| GenericClassWithIntGenericInstanceField | .NET Core 3.1 | .NET Core 3.1 | 53.70 us | 1.062 us | 1.803 us |      39 B |
| GenericClassWithIntGenericInstanceField | .NET Core 5.0 | .NET Core 5.0 | 54.38 us | 0.524 us | 0.465 us |      39 B |

and the codegen for all 3 versions is identical:

.NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT

; X.GenericClassWithIntGenericInstanceField()
       xor       eax,eax
       mov       edx,[7FFDDE98B24C]
       test      edx,edx
       jle       short M00_L01
M00_L00:
       mov       rcx,19D269D2C98
       mov       rcx,[rcx]
       mov       dword ptr [rcx+0C],1
       inc       eax
       cmp       eax,edx
       jl        short M00_L00
M00_L01:
       ret
; Total bytes of code 39

.NET Core 3.1.4 (CoreCLR 4.700.20.20201, CoreFX 4.700.20.21406), X64 RyuJIT

; X.GenericClassWithIntGenericInstanceField()
       xor       eax,eax
       mov       edx,[7FFDE321B7AC]
       test      edx,edx
       jle       short M00_L01
M00_L00:
       mov       rcx,22067122C98
       mov       rcx,[rcx]
       mov       dword ptr [rcx+0C],1
       inc       eax
       cmp       eax,edx
       jl        short M00_L00
M00_L01:
       ret
; Total bytes of code 39

.NET Core 5.0.0 (CoreCLR 5.0.20.25106, CoreFX 5.0.20.25106), X64 RyuJIT

; X.GenericClassWithIntGenericInstanceField()
       xor       eax,eax
       mov       edx,[7FFDE9D01EE4]
       test      edx,edx
       jle       short M00_L01
M00_L00:
       mov       rcx,1FBD7F82D18
       mov       rcx,[rcx]
       mov       dword ptr [rcx+0C],1
       inc       eax
       cmp       eax,edx
       jl        short M00_L00
M00_L01:
       ret
; Total bytes of code 39

By adding some otherwise useless code before the benchmark loop I can get all the version to run quickly:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.836 (1909/November2018Update/19H2)
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100-preview.4.20258.7
  [Host]        : .NET Core 3.1.4 (CoreCLR 4.700.20.20201, CoreFX 4.700.20.21406), X64 RyuJIT
  .NET Core 3.0 : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT
  .NET Core 3.1 : .NET Core 3.1.4 (CoreCLR 4.700.20.20201, CoreFX 4.700.20.21406), X64 RyuJIT
  .NET Core 5.0 : .NET Core 5.0.0 (CoreCLR 5.0.20.25106, CoreFX 5.0.20.25106), X64 RyuJIT


|                                  Method |           Job |       Runtime |     Mean |    Error |   StdDev |   Median | Code Size |
|---------------------------------------- |-------------- |-------------- |---------:|---------:|---------:|---------:|----------:|
| GenericClassWithIntGenericInstanceField | .NET Core 3.0 | .NET Core 3.0 | 34.95 us | 0.668 us | 0.714 us | 35.04 us |      60 B |
| GenericClassWithIntGenericInstanceField | .NET Core 3.1 | .NET Core 3.1 | 37.61 us | 1.198 us | 3.398 us | 36.35 us |      60 B |
| GenericClassWithIntGenericInstanceField | .NET Core 5.0 | .NET Core 5.0 | 38.80 us | 1.525 us | 4.350 us | 37.13 us |      60 B |

because now the loop body fits into a naturally aligned 32 byte range:

.NET Core 5.0.0 (CoreCLR 5.0.20.25106, CoreFX 5.0.20.25106), X64 RyuJIT

; X.GenericClassWithIntGenericInstanceField()
       7FFDE53F7D60 mov       dword ptr [rcx+8],1
       7FFDE53F7D67 mov       dword ptr [rcx+0C],3E8
       7FFDE53F7D6E mov       dword ptr [rcx+10],3
       7FFDE53F7D75 xor       eax,eax
       7FFDE53F7D77 mov       edx,[7FFDE5401EE4]
       7FFDE53F7D7D test      edx,edx
       7FFDE53F7D7F jle       short M00_L01
M00_L00:
       7FFDE53F7D81 mov       rcx,1E05C132D18
       7FFDE53F7D8B mov       rcx,[rcx]
       7FFDE53F7D8E mov       dword ptr [rcx+0C],1
       7FFDE53F7D95 inc       eax
       7FFDE53F7D97 cmp       eax,edx
       7FFDE53F7D99 jl        short M00_L00
M00_L01:
       7FFDE53F7D9B ret
; Total bytes of code 60

What's unfortunate here is that the default behavior for 5.0 will "ensure" slow execution by 32 byte aligning the method start, but doing so is a necessary prerequisite to one day perhaps aligning loop tops, as in #8108.

I don't see anything actionable here for 5.0.

@DrewScoggins
Copy link
Member Author

We use the default of .Net Core, which I believe is running with tiered jitting on.

@AndyAyersMS
Copy link
Member

That's a bit puzzling. Any way we can verify this? Do we collect any event logs from the benchmark runs?

@AndyAyersMS
Copy link
Member

I'm going to close this. We can follow up on why we're still seeing what look like alignment artifacts in the lab data separately.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance-benchmarks Issue from performance benchmark
Projects
None yet
Development

No branches or pull requests

5 participants