-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regressions in System.Numerics.Tests.Perf_BigInteger.Parse #74158
Comments
This comment was marked as off-topic.
This comment was marked as off-topic.
Lots of this seems to be noise, but the Seems likely this is from #67448. |
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsRun Information
Regressions in System.Numerics.Tests.Perf_BigInteger
Reprogit clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Numerics.Tests.Perf_BigInteger*' PayloadsHistogramSystem.Numerics.Tests.Perf_BigInteger.ToByteArray(numberString: 12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890)
Description of detection logic
Description of detection logic
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository EDIT: Removed remainder of the original report, as the rest were noisy benchmarks
|
|
Implicated commit range is cf2187b...c5005e0 These tests are all now showing signs of being multi-stable. Looking at the recent swings in Looks like for the first two we are notably regressed vs 6.0. |
I can reproduce: BenchmarkDotNet=v0.13.1.1847-nightly, OS=ubuntu 20.04
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=7.0.100-rc.2.22426.5
[Host] : .NET 7.0.0 (7.0.22.42212), X64 RyuJIT AVX2
Job-HETZKM : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Job-JVOXQQ : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT AVX2
PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20
MinIterationCount=15 WarmupCount=1
Hottest methods according to PGO data: System.Globalization.FormatProvider+Number.ParseNumber: 1534479778
System.Numerics.BigNumber.<NumberToBigInteger>g__ProcessChunk|10_4: 435828576
System.Text.StringBuilder.Append: 299632722
System.ReadOnlySpan`1[System.Char].get_Item: 199756370
System.Runtime.CompilerServices.CastHelpers.IsInstanceOfClass: 109083046
System.Globalization.FormatProvider+Number.MatchChars: 108957144
System.Numerics.BigNumber.<NumberToBigInteger>g__MultiplyAdd|10_3: 99877382
System.Numerics.BigNumber.<NumberToBigInteger>g__NumberBufferToBigInteger|10_2: 90797620
System.Globalization.FormatProvider+Number.IsWhite: 81717858
System.Globalization.FormatProvider+Number.AllowHyphenDuringParsing: 81717858
System.Span`1[System.UInt32].get_Item: 72638096
System.Numerics.BigNumber.NumberToBigInteger: 63558334
|
Looks like we are just barely within the inlining threshold before, and even though the new IL is smaller, some heuristics do not kick in which means we end up just barely above the threshold after: Invoking compiler for the inlinee method StringBuilder:Append(ushort):StringBuilder:this :
IL to import:
IL_0000 02 ldarg.0
IL_0001 7b 47 0e 00 04 ldfld 0x4000E47
IL_0006 0a stloc.0
IL_0007 02 ldarg.0
IL_0008 7b 45 0e 00 04 ldfld 0x4000E45
IL_000d 0b stloc.1
-IL_000e 07 ldloc.1
-IL_000f 8e ldlen
-IL_0010 69 conv.i4
-IL_0011 06 ldloc.0
-IL_0012 36 14 ble.un.s 20 (IL_0028)
+IL_000e 06 ldloc.0
+IL_000f 07 ldloc.1
+IL_0010 8e ldlen
+IL_0011 69 conv.i4
+IL_0012 34 0f bge.un.s 15 (IL_0023)
IL_0014 07 ldloc.1
IL_0015 06 ldloc.0
IL_0016 03 ldarg.1
IL_0017 9d stelem.i2
IL_0018 02 ldarg.0
-IL_0019 02 ldarg.0
-IL_001a 7b 47 0e 00 04 ldfld 0x4000E47
-IL_001f 17 ldc.i4.1
-IL_0020 58 add
-IL_0021 7d 47 0e 00 04 stfld 0x4000E47
-IL_0026 2b 09 br.s 9 (IL_0031)
-IL_0028 02 ldarg.0
-IL_0029 03 ldarg.1
-IL_002a 17 ldc.i4.1
-IL_002b 28 41 39 00 06 call 0x6003941
-IL_0030 26 pop
-IL_0031 02 ldarg.0
-IL_0032 2a ret
+IL_0019 06 ldloc.0
+IL_001a 17 ldc.i4.1
+IL_001b 58 add
+IL_001c 7d 47 0e 00 04 stfld 0x4000E47
+IL_0021 2b 09 br.s 9 (IL_002c)
+IL_0023 02 ldarg.0
+IL_0024 03 ldarg.1
+IL_0025 17 ldc.i4.1
+IL_0026 28 41 39 00 06 call 0x6003941
+IL_002b 26 pop
+IL_002c 02 ldarg.0
+IL_002d 2a ret
-INLINER impTokenLookupContextHandle for StringBuilder:Append(ushort):StringBuilder:this is 0x00007FAA6B33C7E1.
+INLINER impTokenLookupContextHandle for StringBuilder:Append(ushort):StringBuilder:this is 0x00007FC8350AC7E1.
*************** In compInitDebuggingInfo() for StringBuilder:Append(ushort):StringBuilder:this
info.compStmtOffsetsCount = 0
info.compStmtOffsetsImplicit = 0005h ( STACK_EMPTY CALL_SITE )
*************** In fgFindBasicBlocks() for StringBuilder:Append(ushort):StringBuilder:this
weight= 31 : state 191 [ ldarg.0 -> ldfld ]
weight= 6 : state 11 [ stloc.0 ]
weight= 31 : state 191 [ ldarg.0 -> ldfld ]
-weight= -7 : state 200 [ stloc.1 -> ldloc.1 ]
+weight= 34 : state 12 [ stloc.1 ]
+weight= 12 : state 7 [ ldloc.0 ]
+weight= 9 : state 8 [ ldloc.1 ]
weight= 7 : state 119 [ ldlen ]
weight= 2 : state 93 [ conv.i4 ]
-weight= 12 : state 7 [ ldloc.0 ]
-weight=147 : state 54 [ ble.un.s ]
+weight= 85 : state 52 [ bge.un.s ]
weight= 9 : state 8 [ ldloc.1 ]
weight= 12 : state 7 [ ldloc.0 ]
weight= 16 : state 4 [ ldarg.1 ]
weight= 23 : state 134 [ stelem.i2 ]
weight= 10 : state 3 [ ldarg.0 ]
-weight= 31 : state 191 [ ldarg.0 -> ldfld ]
+weight= 12 : state 7 [ ldloc.0 ]
weight= 28 : state 24 [ ldc.i4.1 ]
weight=-12 : state 76 [ add ]
weight= 31 : state 111 [ stfld ]
weight= 44 : state 43 [ br.s ]
weight= 10 : state 3 [ ldarg.0 ]
weight= 16 : state 4 [ ldarg.1 ]
weight= 28 : state 24 [ ldc.i4.1 ]
weight= 79 : state 40 [ call ]
weight=-24 : state 39 [ pop ]
weight= 10 : state 3 [ ldarg.0 ]
weight= 19 : state 42 [ ret ]
-4 ldfld or stfld over arguments which are structs. Multiplier increased to 1.
+2 ldfld or stfld over arguments which are structs. Multiplier increased to 1.
Inline candidate has arg that feeds range check. Multiplier increased to 2.
-Inline candidate has 1 binary expressions with constants. Multiplier increased to 2.5.
-Inline candidate callsite is in a loop. Multiplier increased to 5.5.
-Caller has 115 locals. Multiplier decreased to 4.88232.
-calleeNativeSizeEstimate=559
+Inline candidate callsite is in a loop. Multiplier increased to 5.
+Caller has 115 locals. Multiplier decreased to 4.43848.
+calleeNativeSizeEstimate=528
callsiteNativeSizeEstimate=115
-benefit multiplier=4.88232
-threshold=561
-Native estimate for function size is within threshold for inlining 55.9 <= 56.1 (multiplier = 4.88232)
+benefit multiplier=4.43848
+threshold=510
+Native estimate for function size exceeds threshold for inlining 52.8 > 51 (multiplier = 4.43848)
+
+
+Inline expansion aborted, inline not profitable cc @EgorBo, can you see anything simple we can do for the heuristics here on the JIT side? |
The previous
seems odd. It is coming from this: runtime/src/coreclr/jit/fgbasic.cpp Lines 1414 to 1422 in 104fe14
But arg0 here is not just an argument; it is a field access on an argument: this.m_ChunkLength Not sure if this is intentional. After the change, it is reusing a local so it does not get this benefit multiplier. In any case adjusting the heuristics for 7.0 does not seem realistic, but changing the C# source code back to make the JIT happy is not appealing either. |
@jkotas @jakobbotsch, then what should be the next step? Should we revert #67448? |
The change has a bad interaction with inlining heuristics. Fixes dotnet#74158. Partial revert of dotnet#67448.
Sure, we can revert the offending two lines from #67448. It is not very appealing as @jakobbotsch said. However, we have hundreds of places in libraries that are tweaked in unnatural ways to make the JIT happy, one more or less is not a big deal. |
Run Information
Regressions in System.Numerics.Tests.Perf_BigInteger
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.Numerics.Tests.Perf_BigInteger.ToByteArray(numberString: 12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890)
Description of detection logic
Description of detection logic
Description of detection logic
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
EDIT: Removed remainder of the original report, as the rest were noisy benchmarks
The text was updated successfully, but these errors were encountered: