-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Perf] Regressions in System.Buffers.Tests.ReadOnlySequenceTests<Char> #52312
Comments
Tagging subscribers to this area: @GrabYourPitchforks, @dotnet/area-system-buffers Issue DetailsRun Information
Regressions in System.Buffers.Tests.ReadOnlySequenceTests<Char>
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Buffers.Tests.ReadOnlySequenceTests<Char>*' PayloadsHistogramSystem.Buffers.Tests.ReadOnlySequenceTests<Char>.FirstSpanTenSegments
System.Buffers.Tests.ReadOnlySequenceTests<Char>.FirstSpanArray
System.Buffers.Tests.ReadOnlySequenceTests<Char>.FirstSpanMemory
DocsProfiling workflow for dotnet/runtime repository
|
I checked the history for https://github.com/dotnet/runtime/commits/main/src/libraries/System.Memory and don't see any changes between the baseline & compare commits. Could this be caused by #51593? |
It might be worth checking if #51190 affects any of the generated code for these benchmarks. |
#51190 would definitely affect codegen, but it was about removing |
All of these tests have improved with this change, #52708. You can look at the tests by clicking on the links in the table, and see. But I am going to go ahead and close this since we are no longer seeing the regression. |
#52708 did improve many benchmarks and that does not mean it fixed the problem that caused the regression in this issue. We should understand what caused the regression in first place because it might be still present, but just that it got covered up by the inlining changes. |
@GrabYourPitchforks - I am assigning this to you for now. Once you have data and you think it is a codegen issue, feel free to assign back to me. |
Using a baseline of 1983573 and a compare of de591a8, there's something strange going on with the method prolog for this function. I changed that function to vary the number of
A few interesting items here. First, there's a jump of ;; baseline
00007ff8`73c00230 4157 push r15
00007ff8`73c00232 4156 push r14
00007ff8`73c00234 4155 push r13
00007ff8`73c00236 4154 push r12
00007ff8`73c00238 57 push rdi
00007ff8`73c00239 56 push rsi
00007ff8`73c0023a 55 push rbp
00007ff8`73c0023b 53 push rbx
00007ff8`73c0023c 4881ec18010000 sub rsp,118h
00007ff8`73c00243 33c0 xor eax,eax
00007ff8`73c00245 4889442428 mov qword ptr [rsp+28h],rax
00007ff8`73c0024a c5d857e4 vxorps xmm4,xmm4,xmm4
00007ff8`73c0024e c5f97f642430 vmovdqa xmmword ptr [rsp+30h],xmm4
00007ff8`73c00254 c5f97f642440 vmovdqa xmmword ptr [rsp+40h],xmm4
00007ff8`73c0025a 48b840ffffffffffffff mov rax,0FFFFFFFFFFFFFF40h
00007ff8`73c00264 c5f97fa40410010000 vmovdqa xmmword ptr [rsp+rax+110h],xmm4
00007ff8`73c0026d c5f97fa40420010000 vmovdqa xmmword ptr [rsp+rax+120h],xmm4
00007ff8`73c00276 c5f97fa40430010000 vmovdqa xmmword ptr [rsp+rax+130h],xmm4
00007ff8`73c0027f 4883c030 add rax,30h
00007ff8`73c00283 75df jne ConsoleAppBenchmark!ConsoleAppBenchmark.ReadOnlySequenceTests`1[[System.Char, System.Private.CoreLib]].FirstSpan(System.Buffers.ReadOnlySequence`1<Char>)+0x4424 (00007ff8`73c00264)
00007ff8`73c00285 4889842410010000 mov qword ptr [rsp+110h],rax
00007ff8`73c0028d 488bf2 mov rsi,rdx
00007ff8`73c00290 4c8b06 mov r8,qword ptr [rsi]
00007ff8`73c00293 4d85c0 test r8,r8
00007ff8`73c00296 7519 jne ConsoleAppBenchmark!ConsoleAppBenchmark.ReadOnlySequenceTests`1[[System.Char, System.Private.CoreLib]].FirstSpan(System.Buffers.ReadOnlySequence`1<Char>)+0x4471 (00007ff8`73c002b1)
;; compare
00007ff8`73e10220 4157 push r15
00007ff8`73e10222 4156 push r14
00007ff8`73e10224 4155 push r13
00007ff8`73e10226 4154 push r12
00007ff8`73e10228 57 push rdi
00007ff8`73e10229 56 push rsi
00007ff8`73e1022a 55 push rbp
00007ff8`73e1022b 53 push rbx
00007ff8`73e1022c 4881ecf8000000 sub rsp,0F8h
00007ff8`73e10233 33c0 xor eax,eax
00007ff8`73e10235 4889442428 mov qword ptr [rsp+28h],rax
00007ff8`73e1023a c5d857e4 vxorps xmm4,xmm4,xmm4
00007ff8`73e1023e 48b840ffffffffffffff mov rax,0FFFFFFFFFFFFFF40h
00007ff8`73e10248 c5f97fa404f0000000 vmovdqa xmmword ptr [rsp+rax+0F0h],xmm4
00007ff8`73e10251 c5f97fa40400010000 vmovdqa xmmword ptr [rsp+rax+100h],xmm4
00007ff8`73e1025a c5f97fa40410010000 vmovdqa xmmword ptr [rsp+rax+110h],xmm4
00007ff8`73e10263 4883c030 add rax,30h
00007ff8`73e10267 75df jne ConsoleAppBenchmark!ConsoleAppBenchmark.ReadOnlySequenceTests`1[[System.Char, System.Private.CoreLib]].FirstSpan(System.Buffers.ReadOnlySequence`1<Char>)+0x4408 (00007ff8`73e10248)
00007ff8`73e10269 48898424f0000000 mov qword ptr [rsp+0F0h],rax
00007ff8`73e10271 488bf2 mov rsi,rdx
00007ff8`73e10274 4c8b06 mov r8,qword ptr [rsi]
00007ff8`73e10277 4d85c0 test r8,r8
00007ff8`73e1027a 7519 jne ConsoleAppBenchmark!ConsoleAppBenchmark.ReadOnlySequenceTests`1[[System.Char, System.Private.CoreLib]].FirstSpan(System.Buffers.ReadOnlySequence`1<Char>)+0x4455 (00007ff8`73e10295) Up until sequence count = 7, the codegen looks fine for both the baseline and the compare. The only difference between the two methods is that some Kunul, punting this back your way because I don't know how to investigate this further. |
I cherry-picked each commit in 508e560...466deef and compared the disassembly generated. It turns out that there are lot of regressions in |
@kunalspathak Right, this matches what I saw as well. The codegen for |
Since most of the For such variables, when we inlined the methods, we started introducing extra temps (18 extras, to be precise) as seen in the variable assignments screenshot below: Because of these, we hit the local variable limit and stop promoting the structs after one point causing the inefficient code gen. The regression is amplified because there are 16 calls to To summarize:
I will close this issue tomorrow unless anyone else have any other comments. |
The default tracked variable limit is 1024. It turns out this is configurable with COMPlus_JitMaxLocalsToTrack. Extensive measurements were made when it was upped to this (from something much smaller -- 64 I think), and larger than this there was little CQ benefit but growing throughput cost. |
I incremented it in #52708 Btw, setting the limit to 1024 decreases the size of prejitted CoreLib by 20kb (I can share the diffs) |
Run Information
Regressions in System.Buffers.Tests.ReadOnlySequenceTests<Char>
Historical Data in Reporting System
Repro
Payloads
Baseline
Compare
Histogram
System.Buffers.Tests.ReadOnlySequenceTests<Char>.FirstSpanTenSegments
System.Buffers.Tests.ReadOnlySequenceTests<Char>.FirstSpanArray
System.Buffers.Tests.ReadOnlySequenceTests<Char>.FirstSpanMemory
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
The text was updated successfully, but these errors were encountered: