-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: generated code for generic static method with Nullable<T>
can be optimized
#9677
Comments
Since nullables are value classes, there is an additional cost to passing them as an explicit by-value argument vs as an implicit ("true" this) argument -- in the explicit case a copy must be made. The ABI may then require that this explicit by-value value class actually be passed implicitly by reference. Today the jit exposes the ABI impact very early in its processing. If the jit then inlines this call, it will try to undo the impact of the implicit by-reference arg passing and the copy, but sometimes it can't get rid of it all. So that's likely what you see here. We have ambitions of improving the jit's ability to deal with structs in cases like this but haven't gotten around to it yet. It is a fairly large undertaking. When I have a chance, I'll take a deeper look to see if there is something going on here that we might be able to address more easily. |
Took a deeper look and yes it's what I described above. Not sure if a jit dump will prove instructive but this is how the jit models the code after inlining:
The first blob says to copy the value of One might hope that we could do a bit of lookahead when importing like we do at times for |
@AndyAyersMS Thanks for your analysis. Should we close this issue for now or should we leave it open? |
It's a useful example to keep in mind as we think about future jit work., so I'm inclined to leave it open. I can't say when or if we'll work on it though. The convenient C# syntax for nullables may lead programmers to not fully appreciate that they have some extra performance overhead. While we can't get rid of it all -- as there really is extra work to do for nullables -- we can work to minimize the differences. Likewise, the natural looking extension method syntax gives the impression that extension methods on value types might be just as efficient as built-in methods on value types. And extension methods on reference types are in fact just as efficient as built-in methods. So again this is an area where we might work to reduce the amount of performance surprise, even if we can't get rid of it all together. |
Nullable<T>
can be optimizedNullable<T>
can be optimized
Codegen today looks much better: ; Assembly listing for method MustHaveValueBenchmark.IntBenchmarks:MustHaveValueBaseVersion():System.Nullable`1[int]:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 2 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 this [V00,T00] ( 4, 4 ) ref -> rcx this class-hnd single-def
; V01 OutArgs [V01 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
; V02 tmp1 [V02,T01] ( 3, 0 ) ref -> rsi class-hnd exact single-def "NewObj constructor temp"
;* V03 tmp2 [V03 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
; V04 tmp3 [V04,T02] ( 2, 0 ) ref -> rdx single-def "argument with side effect"
;
; Lcl frame size = 32
G_M45272_IG01:
push rsi
sub rsp, 32
;; size=5 bbWeight=1 PerfScore 1.25
G_M45272_IG02:
cmp byte ptr [rcx+08H], 0
je SHORT G_M45272_IG04
mov rax, qword ptr [rcx+08H]
;; size=10 bbWeight=1 PerfScore 6.00
G_M45272_IG03:
add rsp, 32
pop rsi
ret
;; size=6 bbWeight=1 PerfScore 1.75
G_M45272_IG04:
mov rcx, 0xD1FFAB1E ; MustHaveValueBenchmark.NullableHasNoValueException
call CORINFO_HELP_NEWSFAST
mov rsi, rax
mov ecx, 1
mov rdx, 0xD1FFAB1E
call CORINFO_HELP_STRCNS
mov rdx, rax
mov rcx, rsi
xor r8, r8
call [System.ArgumentNullException:.ctor(System.String,System.String):this]
mov rcx, rsi
call CORINFO_HELP_THROW
int3
;; size=62 bbWeight=0 PerfScore 0.00
; Total bytes of code 83, prolog size 5, PerfScore 17.30, instruction count 21, allocated bytes for code 83 (MethodHash=52794f27) for method MustHaveValueBenchmark.IntBenchmarks:MustHaveValueBaseVersion():System.Nullable`1[int]:this
; ============================================================ ; Assembly listing for method MustHaveValueBenchmark.IntBenchmarks:MustHaveValueExtensionMethod():System.Nullable`1[int]:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 1 inlinees without PGO data
; Final local variable assignments
;
; V00 this [V00,T00] ( 3, 3 ) ref -> rcx this class-hnd single-def
; V01 OutArgs [V01 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
; V02 tmp1 [V02 ] ( 3, 6 ) struct ( 8) [rsp+20H] do-not-enreg[S] ld-addr-op "Inlining Arg"
; V03 tmp2 [V03,T01] ( 3, 5 ) bool -> [rsp+20H] do-not-enreg[] V02.hasValue(offs=0x00) P-DEP "field V02.hasValue (fldOffset=0x0)"
; V04 tmp3 [V04,T02] ( 2, 4 ) int -> [rsp+24H] do-not-enreg[] V02.value(offs=0x04) P-DEP "field V02.value (fldOffset=0x4)"
;
; Lcl frame size = 40
G_M2833_IG01:
sub rsp, 40
;; size=4 bbWeight=1 PerfScore 0.25
G_M2833_IG02:
mov rcx, qword ptr [rcx+08H]
mov qword ptr [rsp+20H], rcx
cmp byte ptr [rsp+20H], 0
jne SHORT G_M2833_IG04
;; size=16 bbWeight=1 PerfScore 6.00
G_M2833_IG03:
mov rcx, 0xD1FFAB1E ; 'NullableWithValue'
xor rdx, rdx
call [MustHaveValueBenchmark.Throw:NullableHasNoValue(System.String,System.String)]
;; size=18 bbWeight=0.50 PerfScore 1.75
G_M2833_IG04:
mov rax, qword ptr [rsp+20H]
;; size=5 bbWeight=1 PerfScore 1.00
G_M2833_IG05:
add rsp, 40
ret
;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code 48, prolog size 4, PerfScore 15.05, instruction count 11, allocated bytes for code 48 (MethodHash=65dcf4ee) for method MustHaveValueBenchmark.IntBenchmarks:MustHaveValueExtensionMethod():System.Nullable`1[int]:this
; ============================================================ We still end up with one copy but getting rid of that does not seem possible given that |
Consider the following Benchmark .NET benchmarks (gist here):
The two benchmarks execute the same guard clause checking that a
Nullable<T>
has a value. The first is written in imperative style, the second one is written as a generic extension method.The results for .NET Core 2.1.0-preview2-26130-04 are the following:
As we can see, the extension method is ca. 4.5 times slower than the imperative version.
@redknightlois and I think that the code generated by the JIT can be optimized so that the performance of the extension method version increases considerably. The imperative version returns fast:
The extension method does not return fast:
Is there a way to optimize the JIT output so that the extension method is about as fast as the imperative version?
category:cq
theme:structs
skill-level:expert
cost:large
The text was updated successfully, but these errors were encountered: