-
Notifications
You must be signed in to change notification settings - Fork 5k
[Perf] Linux/x64: 3 Regressions on 11/2/2023 12:57:39 AM #94475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Likely caused by #94247. |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsRun Information
Regressions in LinqBenchmarks
ReproGeneral Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'LinqBenchmarks*' PayloadsLinqBenchmarks.Order00LinqQueryXETL FilesHistogramJIT DisasmsLinqBenchmarks.Order00LinqMethodXETL FilesHistogramJIT DisasmsDocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Linq.Tests.Perf_Enumerable
ReproGeneral Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Linq.Tests.Perf_Enumerable*' PayloadsSystem.Linq.Tests.Perf_Enumerable.SingleWithPredicate_LastElementMatches(input: List)ETL FilesHistogramJIT DisasmsDocsProfiling workflow for dotnet/runtime repository
|
Collated Reports
|
Can repro locally. Tried diffing vs latest main but things have changed quite a bit, so will look at the diffs wrt the narrower range above. |
Morphing in RPO causes a struct local to be marked as exposed before we see a block copy so we change our copy expansion strategy from field by field to block. base codegen IN00c8: 0003C8 mov edi, dword ptr [rbp-0x88]
IN00c9: 0003CE mov dword ptr [rbp-0xA8], edi
IN00ca: 0003D4 mov edi, dword ptr [rbp-0x84]
IN00cb: 0003DA mov dword ptr [rbp-0xA4], edi
IN00cc: 0003E0 mov rdi, qword ptr [rbp-0x80]
IN00cd: 0003E4 mov qword ptr [rbp-0xA0], rdi
IN00ce: 0003EB mov edi, dword ptr [rbp-0x98]
IN00cf: 0003F1 mov dword ptr [rbp-0xB8], edi
IN00d0: 0003F7 mov edi, dword ptr [rbp-0x94]
IN00d1: 0003FD mov dword ptr [rbp-0xB4], edi
IN00d2: 000403 mov rdi, qword ptr [rbp-0x90]
IN00d3: 00040A mov qword ptr [rbp-0xB0], rdi
IN00d4: 000411 mov esi, dword ptr [rbp-0xB4]
; gcrRegs -[rsi]
IN00d5: 000417 or rdi, rsi
IN00d6: 00041A jne G_M4544_IG44
diff codegen IN00c8: 0003C8 vmovups xmm0, xmmword ptr [rbp-0x88]
IN00c9: 0003D0 vmovups xmmword ptr [rbp-0xA8], xmm0
IN00ca: 0003D8 vmovups xmm0, xmmword ptr [rbp-0x98]
IN00cb: 0003E0 vmovups xmmword ptr [rbp-0xB8], xmm0
IN00cc: 0003E8 mov rdi, qword ptr [rbp-0xB0]
IN00cd: 0003EF mov esi, dword ptr [rbp-0xB4]
; gcrRegs -[rsi]
IN00ce: 0003F5 or rdi, rsi
IN00cf: 0003F8 jne G_M4544_IG44 Looks like the latter incurs a store-forwarding stall (likely on the @jakobbotsch this seems similar to the ldp issue, wonder if we should try something similar here. But late mitigation would be awkward (recommendation is to do a wider aligned load, and extract the part needed). |
Similar case: #96524 (comment) If we handle one it would be nice to see if we can handle both cases. |
Yeah, it could be the stall is earlier -- we have narrow stores and then a wide load: IN00c0: 00039B mov dword ptr [V103 rbp-0x98], ebx
IN00c1: 0003A1 mov ebx, dword ptr [rdi+0x04]
IN00c2: 0003A4 mov dword ptr [V104 rbp-0x94], ebx
IN00c3: 0003AA mov rdi, qword ptr [rdi+0x08]
IN00c4: 0003AE mov qword ptr [V105 rbp-0x90], rdi
IN00c5: 0003B5 mov rdi, 0x7FB304C17A58 ; System.Collections.Generic.GenericComparer`1[System.Decimal]
IN00c6: 0003BF cmp qword ptr [rsi], rdi
IN00c7: 0003C2 jne G_M4544_IG83
IN00c8: 0003C8 vmovups xmm0, xmmword ptr [V71 rbp-0x88]
IN00c9: 0003D0 vmovups xmmword ptr [V74 rbp-0xA8], xmm0
IN00ca: 0003D8 vmovups xmm0, xmmword ptr [V72 rbp-0x98]
IN00cb: 0003E0 vmovups xmmword ptr [V75 rbp-0xB8], xmm0
|
Fixing this is probably too ambitious for .NET 9 at this point, so moving to 10.0. |
The Linq benchmarks have since improved dramatically (via 34545d7) so not clear if they are still suffering from this. Other benchmarks also seem to be improved. So will close. |
Uh oh!
There was an error while loading. Please reload this page.
Run Information
Regressions in LinqBenchmarks
Test Report
Repro
General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md
Payloads
Baseline
Compare
LinqBenchmarks.Order00LinqQueryX
ETL Files
Histogram
JIT Disasms
LinqBenchmarks.Order00LinqMethodX
ETL Files
Histogram
JIT Disasms
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
Run Information
Regressions in System.Linq.Tests.Perf_Enumerable
Test Report
Repro
General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md
Payloads
Baseline
Compare
System.Linq.Tests.Perf_Enumerable.SingleWithPredicate_LastElementMatches(input: List)
ETL Files
Histogram
JIT Disasms
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
The text was updated successfully, but these errors were encountered: